Server Crash every 5 minutes

Status
Not open for further replies.
Servertools has nothing to do with it. Turning off EAC solves it.

These are the hosts its trying to connect. Nobody would think all of them go down. But hey, welcome to amazon web services. Put your stuff in the cloud, they said. Its safe, they said.
Oh yeah, wouldn't be the first time I've ran into all sorts of weirdness when clocks roll over, DST changes, etc. And for all we know, this could be TFP failing to upgrade some integration library that their servers are expecting to have an API change for :shrug:.

I will confirm that disabling EAC seems to stop this problem from happening.

 
I can confirm same issue here:

eac_server.so [x64] :: OnLoad()

mono_fdhandle_insert: duplicate File fd 0

Receiving unhandled NULL exception

#0 0x007f7e99f17a28 in abort

#1 0x007f7e9787812c in mono_dl_fallback_unregister

#2 0x007f7e978884d8 in monoeg_g_logv

#3 0x007f7e9788856b in monoeg_g_log

#4 0x007f7e9786b34c in mono_reflection_get_custom_attrs_data

#5 0x007f7e977b8b6d in mono_unity_jit_cleanup

#6 0x007f7e977e3c5e in mono_install_unhandled_exception_hook

#7 0x00000041328b85 in (wrapper managed-to-native) System.IO.MonoIO:Open (char*,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,System.IO.FileOptions,System.IO.MonoIOError&)

Distro Details

=================================

Distro: CentOS Linux 7 (Core)

Arch: x86_64

Kernel: 3.10.0-1062.9.1.el7.x86_64

Uptime: 0d, 11h, 19m

tmux: tmux 1.8

glibc: 2.17

Server Resource

=================================

CPU

Model: Intel Xeon E3-12xx v2 (Ivy Bridge, IBRS)

Cores: 4

Frequency: 2999.998MHz

Avg Load: 0.37, 0.39, 0.41

Memory

Mem: total used free cached available

Physical: 7.7GB 2.5GB 5.0GB 5.1GB 5.0GB

Swap: 0B 0B 0B

Storage

Filesystem: /dev/sda1

Total: 30G

Used: 27G

Available: 3.2G

7 Days To Die Server Details

=================================

Maxplayers: 16

Game mode: GameModeSurvival

Game world: Navezgane

Master server: true

Status: ONLINE

Command-line Parameters

=================================

./7DaysToDieServer.x86_64 -logfile /home/sdtdhost/log/server/output_log__2020-01-05__04-10-42.txt -quit -batchmode -nographics -dedicated -configfile=/home/sdtdhost/serverfiles/sdtdserver.xml

 
My server started doing this 2 days ago.

And I can see that it seems to happens when some people connects to the server.

Ubuntu 18.04.3 LTS latest updates added reboot done 2 days ago.

 
These were my values (on a out-of-the-box) RedHat/CentOS 7 server before any changes:

$ cat /proc/sys/net/ipv4/tcp_syn_retries

6

$ cat /proc/sys/net/ipv4/tcp_synack_retries

5

Which I changed to:

$ echo 3 > /proc/sys/net/ipv4/tcp_syn_retries

$ echo 3 > /proc/sys/net/ipv4/tcp_synack_retries

how or where can I change this or see if it is?

 
I had moved the server Linux VM to another disk and lowered the memory from 16Gb to 8GB and ran in the same problem. After thinking it was due to file corruption and swaping everything around It was as simple as giving it back it's initial memory.
I think this null pointer problem is simply due to lack of memory to allocate something.. maybe? Bottom line Increase the memory of the server and it might (cross fingers) solve the problem.
No idea why did that solve the problem for you but that's neither the issue nor the solution (I have about 100GBs of free memory.) There is a message saying "mono_fdhandle_insert: duplicate File fd 0"

In other words, it's trying to open a stream that's already open (whether it's memory stream, file, connection, whatever.)

We might be observing two different issues here (depending on whether we run linux or windows?).But since I reduced the SYN retries on my servers, I've not seen this issue anymore (with EAC enabled).

I'd assume the developers would be grateful if we could pinpoint this issue to a soft timeout/cleanup they have to look into.

These were my values (on a out-of-the-box) RedHat/CentOS 7 server before any changes:

$ cat /proc/sys/net/ipv4/tcp_syn_retries

6

$ cat /proc/sys/net/ipv4/tcp_synack_retries

5

Which I changed to:

$ echo 3 > /proc/sys/net/ipv4/tcp_syn_retries

$ echo 3 > /proc/sys/net/ipv4/tcp_synack_retries

Don't worry, those setting will not be persistent (they'll be restored at a server reboot) unless you defined them in sysctl.

This basically changes the total SYN timeout from about 180 seconds to 40 seconds, which seems sufficient to circumvent the bug.
Solved for me as well. (Fedora Linux = also RedHat type.)

 
how or where can I change this or see if it is?
Even if adjusting syn/ack retries in the network stack fixes the problem, it's not an acceptable workaround IMO. That setting affects things system-wide and if you have other things on that server will impact it. The issue here I believe is the 7DTD code not handling an exception when it can't reach out to the EAC servers when EAC is enabled. This may be intentional, as someone would be able to circumvent the integrity of the server if it can't communicate when the global ban list, but I think a more "graceful" way of stopping the server would be a better approach if this was handled. E.g. a system-wide-message informing users EAC is unavailable and if it can't be reached in say, a minute or two the server will gracefully shutdown.

Also, this seems like EAC is experiencing issues on their end, as I've seen other games have this problem (and gracefully inform the user/admins). Essentially I believe this is just TFP forgetting to implement some safety code on their side to prevent the server from blatantly exiting when it can't open a socket to the central hydra host (in AWS) ELB.

 
Sadly the only current solution for linux dedicated servers that I've found that works is disabling EAC. But why should we have to disable something that is supposed to help protect us?

 
Sadly the only current solution for linux dedicated servers that I've found that works is disabling EAC. But why should we have to disable something that is supposed to help protect us?
piLON's suggestion seems to have taken care of it on the server I was testing this error on. EAC is still enabled on it.

 
Yeah. Personally I felt that disabling EAC was the worst work-around I could think of, but I merely used it to prove my hypothesis (comment #4 in this thread).

The problem is obviously in the code or libraries of 7daystodie and will hopefully be fixed, but reducing the SYN retries (as I suggested) will (in 99.9% of the cases) not affect anything else nowadays.

But sure, to be able to apply those changes, you need to have root access to the linux server/shell, or ask the owner of the server to apply them.

Btw, my server is The World's End, hence I felt quite eager to find the best work-around to the problem.

 
Okay, you win. I'm just someone who've been in the Linux open source community for 23 years and just wanted to help my fellow players. :-/

 
Okay, you win. I'm just someone who've been in the Linux open source community for 23 years and just wanted to help my fellow players. :-/
That wasn't directed towards you. I was just saying that I switched and now the problem is gone.

 
The SYN workaround is fine if your server is dedicated to 7 Days or you have root yeah. I run my own server on an Ubuntu bionic 18.04.3 release. EAC is important for running a public server so disabling it is a bad idea if it can be avoided. My comment was more directed that I am really surprised TFP haven't released some sort of hotfix sooner regarding this issue since I am sure many are being afflicted by this if it is indeed related to some EasyAntiCheat availability problems.

 
piLON, how do i reduce SYN retries on ubuntu 16? i tried following the path you sugguested but the file is empty when edited with nano or vi command.

 
Same problem on my Debian 10 linux container (lxc).

Have not tried the SYN-hack, but I had to turn EAC off.

Are the developers even aware of this issue at all? Are they reading this?

 
I've been running the Syn hack for several days. The test server falls over less, but it still falls over. Side effect was during heavy use, users were experiencing some decent latency. Have returned to normal operations.

To point though, my servers aren't falling over every 5 minutes. We only encountered this about once or twice a day with a 12-hour restart cycle.

 
Status
Not open for further replies.
Back
Top