LFS Forum - Ban this guy please [hacker using no username]

#201 - cargame.nl

Sat 25 Aug 2012, 20:49

Quote from Scawen :
Have you (or anyone else) had that problem before? I am finding it hard to see how the test patch changes could have introduced that as a new bug. And where I look for the answer depends a bit on if it's a new or old bug.

Yes, but that was ages ago. Somewhere in 2011.

And certainly not on the S1 server which normally runs smoother due to less heavy demand. That both servers come into this state has never happened before. Also I would like to say that the LFS.exe dedi's run for months without interruption. Only on rare occasions I restart (like these).

But I will do another run with B3 tomorrow then just to rule out the possibility of coincidence.

#202 - MadCatX

Sun 26 Aug 2012, 02:23

Quote from Scawen :That problem with the timer going backwards seems to cause more problems than just InSim timing out. It could affect anything time related across the whole program, and apparently does :

http://bugs.winehq.org/show_bug.cgi?id=7710 (see comment 4)

So a quick fix isn't the thing to do, I need to consider carefully how to make LFS deal cleanly and safely with large forward and backward leaps in time reported by the operating system.

That is on my list but at the moment I'm concerned by how the B3 host could have got into that bugged state that doesn't allow anyone to join.

I believe I managed to fix this in WINE. It's not pretty but it should make the GetTickCount() and timeGetTime() (as far as I've gathered WINE doesn't differentiate between these functions) behave. If anybody is willing to test it and has positive results with it, I'll consider cleaning the patches up and trying to get them upstream.

BTW, during testing I encountered a rather peculiar behavior of sound. When the system clock was set backwards, all sound in LFS stopped and it wouldn't come back until the system clock reached the time when I set the clock to past. Is this a glitch in LFS or are the some more WINE fixes to be done? Other than that everything was OK, no InSim timeouts or freezes.

To apply the patches grab the latest WINE source, cd into its directory and run


patch -p1 < (name of the patch file)

for both files.

Attached files

#203 - Scawen

Sun 26 Aug 2012, 06:48

Quote from cargame.nl :But I will do another run with B3 tomorrow then just to rule out the possibility of coincidence.

Thanks

#204 - chucknorris

Sun 26 Aug 2012, 07:18

Quote from MadCatX :I believe I managed to fix this in WINE. It's not pretty but it should make the GetTickCount() and timeGetTime() (as far as I

snip

Could you explain with a few words what you have actually changed?

Edit: [TC] CityDriving One is now on B3. I went through a couple of guides which may or may not solve the virtual clock issue. Results will follow...

#205 - MadCatX

Sun 26 Aug 2012, 09:11

Quote from chucknorris :Could you explain with a few words what you have actually changed?

Sure. As far as I understand it, GetTickCount() returns number of milliseconds since Windows booted. This is done by counting TSC (or another time source) ticks. According to MS documentation, it's safe do to (timeNow - timeBefore) which suggests that the time returned by this function can never go backwards.
All WINE implementations of GetTickCount() rely on NtQuerySystemTime(), so WINE simply stores the time when the wineserver started and returns (now - server_started_time). NtQuerySystemTime() in WINE uses gettimeofday() to get current time. The time returned by gettimeofday() is subject to adjustments by the user, so when you (or ntp client for instance) set the clock backwards, GetTickCount() returns a lower value than it's previous call. This is obviously incorrect and IMHO LFS shouldn't compensate for that.
I modified GetTickCount() implementation to use clock_gettime(CLOCK_MONOTONIC) which is guaranteed to never go backwards.

- I know that LFS probably uses timeGetTime(), but WINE doesn't make a difference between these two.
- GetTickCount() returns it's value as 32bit unsigned integer so it eventually loops each ~49,7 days. LFS dedi must probably deal with this somehow.

-

(Dim.Ka) DELETED by Flame CZE : irrelevant

Sun 26 Aug 2012, 09:18

#206 - Baldi

Sun 26 Aug 2012, 10:13

Quote from Scawen :...so we can get back to the fun stuff and a faster development rate in the S3 period.

http://www.youtube.com/watch?v=2cfOi5BdxW8

#207 - Hoshimodo

Sun 26 Aug 2012, 11:07

Quote from Scawen :...so we can get back to the fun stuff and a faster development rate in the S3 period.

Quote from Baldi :http://www.youtube.com/watch?v=2cfOi5BdxW8

I lol'd so hard!

It's not that difficult to outpace a zero-development rate :-)

#208 - cargame.nl

Sun 26 Aug 2012, 11:16

Quote from Scawen :Thanks

S1 server is back in 06B3 mode again. I let it run a few hours while I do some girl stuff (shopping). If nothing strange happens I switch S2 also to 06B3. But to be honest I expect a problem

.. Anyway, will see...

#209 - Omar1

Sun 26 Aug 2012, 11:48

Quote from Dim.Ka :http://www.youtube.com/watch?v=mSZbJFQ_414 Michelin tires

damn it why did you post that? i was about to go on LFS and now im watching it because its acually very interesting

#210 - chucknorris

Sun 26 Aug 2012, 15:18

Results after 6,5 hrs:

305 user connections
1 case of JOOS - RACE PLY COL SET CAR WHL ENG RES (which is quite normal)

Nothing else suspicious.

#211 - MadCatX

Sun 26 Aug 2012, 15:36

Was it with the WINE patches applied or did you find another workaround for the dodgy system clock?

#212 - chucknorris

Sun 26 Aug 2012, 16:02

Quote from MadCatX :Was it with the WINE patches applied or did you find another workaround for the dodgy system clock?

I'm currently trying with some config changes to the clocksource and using a ntpd. Seems working so far. I'm trying to avoid building my own Wine as I got no dev packages installed. Plus some other downsides, i.e. no auto-update.

#213 - cargame.nl

Sun 26 Aug 2012, 17:29

Yeah no problems here also so far.. Okay.. Hmm...

#214 - chucknorris

Mon 27 Aug 2012, 06:46

after 22 hours:

752 connects

1x JOOS - RACE PLY COL SET CAR WHL ENG RES
1x JOOS - PLY COL SET CAR WHL ENG
1x JOOS - SET CAR WHL ENG
3x JOOS : Resync RES (1)

Nothing unusual.

#215 - cargame.nl

Mon 27 Aug 2012, 10:45

Yeah, I transferred almost everything back to 06B3 now.

I get a lot more

Quote :Resync RES (1)

though.

Whatever it means. But this was also the case with all other execs so nothing unusual.

#216 - Scawen

Mon 27 Aug 2012, 11:44

Resync RES is a correction for a sync error that would be very common.

If people are crossing the finish line while a new player is connecting, the new player may miss a finishing result, so the guest is out of sync with the host. But instead of kicking the new player with a JOOS, the fault is corrected.

---

About the BLANK : OVERFLOW - xxx I can reproduce it easily with a debug version and it can sometimes (but not every time) result in new players getting JOOS - COUNT every time they connect. Yesterday with cargame S1, the first time OVERFLOW was seen, the host was not put into a bugged state. That was one second before the OVERFLOW appeared on cargame S2, putting it into the bugged state. The second OVERFLOW on cargame S1 also put it into the bugged state.

The way I'm reproducing it is by simulating the clock going forwards in time, so the host overfills its packet buffer. It should cause OVERFLOW if the clock goes forwards by around 41 seconds (or more). There's one simple "fix" I should do that at least makes the host do an emergency restart if that does happen, so it doesn't sit there out of action. I think I can also make it much less likely to happen by limiting forward leaps in time to 10 seconds or so. I do need to allow for hangs of a few seconds caused by the operating system failing to give any CPU time to the LFS host occasionally.

But the root cause of this problem is out of LFS's control. The two possibilities I can think of are :

1) It's on WINE and the system clock has jumped forward around 41 seconds or more so timeGetTime erroneously reports a leap in time.
2) LFS has not been given any CPU time for around 41 seconds or more.

Both of these situations appear the same to LFS, which cannot feel the passage of time other than through timeGetTime's return value.

#217 - [Audi TT]

Mon 27 Aug 2012, 11:57

Scawen, as you an idea?
http://www.lfsforum.net/showthread.php?t=80093

Thanks.

#218 - cargame.nl

Mon 27 Aug 2012, 12:01

While its true Im running LFS dedis with Wine there however is...;

Quote :[root@xxxx ~]# service ntpd status
ntpd is stopped

No automatic time correction on this particular server. (Due to the problems others report here earlier and in my eyes its not that important for this server. I set time manually two times a year => DST)

Lets keep it on a rare event which collided with this update patch testing. Hopefully the thinking about this resulted still in something of use.

#219 - chucknorris

Mon 27 Aug 2012, 13:53

Quote from cargame.nl :No automatic time correction on this particular server. (Due to the problems others report here earlier and in my eyes its not that important for this server. I set time manually two times a year => DST)

Well, at least my problems with timekeeping seem to have gone completely. No more jumps forwards or backwards. I managed to detach my systemclock from the Xen host and maintaining time with ntpd. Runs absolutely perfect.
"watch ntpdate -q xxx" shows a constant deviation of 1-2 ms. No jumps, no Insim timeouts.

-

(denis-takumi) DELETED by denis-takumi : i must read first =)

Mon 27 Aug 2012, 15:59

#220 - cargame.nl

Tue 28 Aug 2012, 11:18

Here we go again....

This time on a mostly inactive server (GTR).

Quote :
Aug 27 08:20:15 Got master packet
Aug 27 08:22:09 InSim timeout : InSim Relay
Aug 27 08:22:15 Time out : Lost connection to master server
Aug 27 08:23:05 Auto reconnection to master
---SNIP--
Aug 27 12:30:08 ~[ »dB«BuCZeK : JOOS - COUNT MPSYS CONNS

Actually this is yesterday but got reported today for the first time.

Note that I run now four 06B3's on the same server and only this instance failed. Weird.

Attached files

#221 - cargame.nl

Tue 28 Aug 2012, 11:24

S2;

Quote :
Aug 27 08:19:33 Got master packet
Aug 27 08:19:58 Good split 2 - -TRRt- William : 1:17.62 (XFG)
Aug 27 08:20:43 Good lap - -TRRt- William : 2:02.05 (XFG)
Aug 27 08:20:43 /msg AIRW - new XFG PB by wfr0021: 2:02.05 (-0:12.07)
Aug 27 08:20:43 AIRW - new XFG PB by wfr0021: 2:02.05 (-0:12.07)
Aug 27 08:20:43 Improvement points - -TRRt- William : 1
Aug 27 08:20:47 Fastest lap : 2:02.05 by -TRRt- William^L (XFG)
Aug 27 08:20:48 Best lap - -TRRt- William : 2:02.05 (XFG)
Aug 27 08:20:53 Got master packet
Aug 27 08:21:25 Top split 1 - -TRRt- William : 0:41.80 (XFG)
Aug 27 08:22:00 Great split 2 - -TRRt- William : 1:17.22 (XFG)
Aug 27 08:22:12 InSim timeout : InSim Relay
Aug 27 08:22:44 Great lap - -TRRt- William : 2:01.29 (XFG)
Aug 27 08:22:45 Good lap points - -TRRt- William : 2
Aug 27 08:22:45 /msg AIRW - new XFG PB by wfr0021: 2:01.29 (-0:00.76)
Aug 27 08:22:45 AIRW - new XFG PB by wfr0021: 2:01.29 (-0:00.76)
Aug 27 08:22:45 Improvement points - -TRRt- William : 1
Aug 27 08:22:48 Fastest lap : 2:01.29 by -TRRt- William^L (XFG)
Aug 27 08:22:49 Best lap - -TRRt- William : 2:01.29 (XFG)
Aug 27 08:22:53 Time out : Lost connection to master server
Aug 27 08:23:04 Fastest lap : 1:58.12 by fred619^L (RB4)
Aug 27 08:23:26 Top split 1 - -TRRt- William : 0:41.93 (XFG)
Aug 27 08:23:57 Auto reconnection to master
Aug 27 08:24:04 Great split 2 - -TRRt- William : 1:17.02 (XFG)
Aug 27 08:24:05 cargame.nl S2 : ^L!si 99
Aug 27 08:24:05 Configuration item accepted...
Aug 27 08:24:05 Configuration item accepted...
Aug 27 08:24:05 Configuration item accepted...
Aug 27 08:24:05 Configuration item accepted...
Aug 27 08:24:05 Configuration item accepted...
Aug 27 08:24:05 LFS server settings have been reset to version 99...
Aug 27 08:24:13 InSim - TCP : InSim Relay

S1;

Quote :
Aug 27 08:20:09 Got master packet
Aug 27 08:22:09 Time out : Lost connection to master server
Aug 27 08:22:15 InSim timeout : InSim Relay
Aug 27 08:23:13 Auto reconnection to master
Aug 27 08:24:16 FATAL NET ERROR : TIMEDOUT
Aug 27 08:24:16 InSim - TCP : InSim Relay
Aug 27 08:24:16 cargame.nl S1 : ^L!si 99
Aug 27 08:24:16 BLANK : OVERFLOW - cargame.nl S1
Aug 27 08:24:16 marc^L timed out

S0;

Quote :
Aug 27 08:21:02 Got master packet
Aug 27 08:22:09 InSim timeout : InSim Relay
Aug 27 08:23:02 Time out : Lost connection to master server

#222 - chucknorris

Tue 28 Aug 2012, 12:30

Hm, your logfile shows

Quote :

Aug 25 13:42:46 Auto reconnection to master
Aug 25 13:43:49 FATAL NET ERROR : CONNRESET
Aug 25 13:43:49 FATAL NET ERROR : CONNRESET
Aug 25 13:43:49 BLANK : OVERFLOW - cargame.nl GTR
..
Aug 25 13:44:12 › ^JÏ^sM.Nielsen : JOOS - COUNT

as the first incident. Not the 27th as you posted.

It appears like nobody was able to join after that error occurred .

#223 - cargame.nl

Tue 28 Aug 2012, 13:46

OK thanks, didnt realized that that server was affected at all on the 25th. Never mind then

Still not that that good to see these errors but hmm as long as the show keeps running......

.

#224 - Scawen

Tue 28 Aug 2012, 17:53

Here's another dedicated host update, 0.6B4

The main things about this one are - Time steps are limited to between 0 and 6000 milliseconds. So the host should deal much better with timers that are adjusted (e.g. on Wine). Because of the timer step limitation, I think that buffer overflows on the host must now be extremely rare. If somehow there is one, the host will now auto-restart so it will not remain in the bugged state that sometimes resulted from after a buffer overflow.

DEDICATED SERVER - ONLY FOR HOSTING!

www.lfs.net/file_lfs.php?name=LFS_S2_DEDI_6B4.zip

Changes from 0.6B to B3 :

Improved checks on packets received from joining guests
Improved checks on validity of user names while connecting
User names now confirmed with master server after connection
Cleaned up and removed some unused code

Changes in B4 :

Two more hacking checks on packets sent by connecting guests
Removed message "Got master packet" from network debug output
No buffer overflows from hangs or operating system time changes
Auto restart after buffer overflow on host to avoid bugged state

You only need the exe (from the zip) if you are updating an existing installation.

#225 - sicotange

Tue 28 Aug 2012, 18:15

I have nothing worrying to report in relation to my Windows server (WS • Metropolis) using 0.6B3.

Am I right to assume that:
• The probability to encounter the error on a full server is much higher.
• The issue concerns both Windows & Linux servers.