The online racing simulator
more info

I now disabled PF on my home freebsd install and did 3000 connections to the redhat server (E -> A)

No problems at all!

I then did the connection test towards my freebsd server in amsterdam (with PF on) and there i got problems again.

OK, so this must be PF related. Now to find out why the states are causing this problem... and how to fix it. I thought I had tried all the sysctl and PF states variables already, but maybe I have overlooked something...
Quote from Victor :[net.inet.tcp.rexmit_min] that value only takes 10-folds, so 10, 20, 30, etc.

Not on our freebsd, check my sysctl net values, anyway it is not the solution.

Quote from Victor :
I now disabled PF on my home freebsd install and did 3000 connections to the redhat server (E -> A)

No problems at all!

I then did the connection test towards my freebsd server in amsterdam (with PF on) and there i got problems again.

For me, the weird thing is:
E (freebsd, pf disabled) -> A (redhat) -- no problems
A (redhat) -> C (freebsd amsterdam, pf enabled, from your previous post with tests) -- no problems
E (freebsd, pf disabled) -> C (freebsd amsterdam, pf enabled) -- problems again!, Why??

Quote :E (freebsd, pf disabled) -> C (freebsd amsterdam, pf enabled) -- problems again!, Why??

yeah good question - but if I create a stateless filter rule for my connection test, on the amsterdam box, then all is 'fine' again.

I'm starting to get a bit dizzy :faint:
ok i finally figured out the debugging option of PF might be helpful too I've put it into misc mode.

I've put PF here back on.
When looking at /var/log/messages when i do the test, I get loads of these :

May 17 00:41:03 lnx kernel: pf: BAD state: TCP 192.168.1.100:61777 192.168.1.100:61777 213.40.196.93:80 [lo=3200806316 high=3200812108 win=33304 modulator=0 wscale=1] [lo=936377993 high=936444600 win=5792 modulator=0 wscale=0] 9:9 S seq=3223601976 ack=936377993 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:03 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:03 lnx kernel: pf: BAD state: TCP 192.168.1.100:53020 192.168.1.100:53020 213.40.196.93:80 [lo=3440918211 high=3440924003 win=33304 modulator=0 wscale=1] [lo=944883930 high=944950537 win=5792 modulator=0 wscale=0] 9:9 S seq=3453415866 ack=944883930 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:03 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:04 lnx kernel: pf: BAD state: TCP 192.168.1.100:61112 192.168.1.100:61112 213.40.196.93:80 [lo=2880554281 high=2880560073 win=33304 modulator=0 wscale=1] [lo=950102446 high=950169053 win=5792 modulator=0 wscale=0] 9:9 S seq=2895159713 ack=950102446 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:04 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:05 lnx kernel: pf: BAD state: TCP 192.168.1.100:62364 192.168.1.100:62364 213.40.196.93:80 [lo=2592795149 high=2592800941 win=33304 modulator=0 wscale=1] [lo=935531122 high=935597729 win=5792 modulator=0 wscale=0] 9:9 S seq=2603319085 ack=935531122 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:05 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:06 lnx kernel: pf: BAD state: TCP 192.168.1.100:61777 192.168.1.100:61777 213.40.196.93:80 [lo=3200806316 high=3200812108 win=33304 modulator=0 wscale=1] [lo=936377993 high=936444600 win=5792 modulator=0 wscale=0] 9:9 S seq=3223601976 ack=936377993 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:06 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:06 lnx kernel: pf: BAD state: TCP 192.168.1.100:53020 192.168.1.100:53020 213.40.196.93:80 [lo=3440918211 high=3440924003 win=33304 modulator=0 wscale=1] [lo=944883930 high=944950537 win=5792 modulator=0 wscale=0] 9:9 S seq=3453415866 ack=944883930 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:06 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:07 lnx kernel: pf: BAD state: TCP 192.168.1.100:61112 192.168.1.100:61112 213.40.196.93:80 [lo=2880554281 high=2880560073 win=33304 modulator=0 wscale=1] [lo=950102446 high=950169053 win=5792 modulator=0 wscale=0] 9:9 S seq=2895159713 ack=950102446 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:07 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:08 lnx kernel: pf: BAD state: TCP 192.168.1.100:57459 192.168.1.100:57459 213.40.196.93:80 [lo=1065900272 high=1065906064 win=33304 modulator=0 wscale=1] [lo=957857707 high=957924314 win=5792 modulator=0 wscale=0] 9:9 S seq=1076524357 ack=957857707 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:08 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:09 lnx kernel: pf: BAD state: TCP 192.168.1.100:61777 192.168.1.100:61777 213.40.196.93:80 [lo=3200806316 high=3200812108 win=33304 modulator=0 wscale=1] [lo=936377993 high=936444600 win=5792 modulator=0 wscale=0] 9:9 S seq=3223601976 ack=936377993 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:09 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:09 lnx kernel: pf: BAD state: TCP 192.168.1.100:53020 192.168.1.100:53020 213.40.196.93:80 [lo=3440918211 high=3440924003 win=33304 modulator=0 wscale=1] [lo=944883930 high=944950537 win=5792 modulator=0 wscale=0] 9:9 S seq=3453415866 ack=944883930 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:09 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:10 lnx kernel: pf: BAD state: TCP 192.168.1.100:61112 192.168.1.100:61112 213.40.196.93:80 [lo=2880554281 high=2880560073 win=33304 modulator=0 wscale=1] [lo=950102446 high=950169053 win=5792 modulator=0 wscale=0] 9:9 S seq=2895159713 ack=950102446 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:10 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:11 lnx kernel: pf: BAD state: TCP 192.168.1.100:57459 192.168.1.100:57459 213.40.196.93:80 [lo=1065900272 high=1065906064 win=33304 modulator=0 wscale=1] [lo=957857707 high=957924314 win=5792 modulator=0 wscale=0] 9:9 S seq=1076524357 ack=957857707 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:11 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:13 lnx kernel: pf: BAD state: TCP 192.168.1.100:53020 192.168.1.100:53020 213.40.196.93:80 [lo=3440918211 high=3440924003 win=33304 modulator=0 wscale=1] [lo=944883930 high=944950537 win=5792 modulator=0 wscale=0] 9:9 S seq=3453415866 ack=944883930 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:13 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:13 lnx kernel: pf: BAD state: TCP 192.168.1.100:61112 192.168.1.100:61112 213.40.196.93:80 [lo=2880554281 high=2880560073 win=33304 modulator=0 wscale=1] [lo=950102446 high=950169053 win=5792 modulator=0 wscale=0] 9:9 S seq=2895159713 ack=950102446 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:13 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:14 lnx kernel: pf: BAD state: TCP 192.168.1.100:57459 192.168.1.100:57459 213.40.196.93:80 [lo=1065900272 high=1065906064 win=33304 modulator=0 wscale=1] [lo=957857707 high=957924314 win=5792 modulator=0 wscale=0] 9:9 S seq=1076524357 ack=957857707 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:14 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:16 lnx kernel: pf: BAD state: TCP 192.168.1.100:53020 192.168.1.100:53020 213.40.196.93:80 [lo=3440918211 high=3440924003 win=33304 modulator=0 wscale=1] [lo=944883930 high=944950537 win=5792 modulator=0 wscale=0] 9:9 S seq=3453415866 ack=944883930 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:16 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:18 lnx kernel: pf: BAD state: TCP 192.168.1.100:57459 192.168.1.100:57459 213.40.196.93:80 [lo=1065900272 high=1065906064 win=33304 modulator=0 wscale=1] [lo=957857707 high=957924314 win=5792 modulator=0 wscale=0] 9:9 S seq=1076524357 ack=957857707 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:18 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:19 lnx kernel: pf: BAD state: TCP 192.168.1.100:61112 192.168.1.100:61112 213.40.196.93:80 [lo=2880554281 high=2880560073 win=33304 modulator=0 wscale=1] [lo=950102446 high=950169053 win=5792 modulator=0 wscale=0] 9:9 S seq=2895159713 ack=950102446 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:19 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:21 lnx kernel: pf: BAD state: TCP 192.168.1.100:57459 192.168.1.100:57459 213.40.196.93:80 [lo=1065900272 high=1065906064 win=33304 modulator=0 wscale=1] [lo=957857707 high=957924314 win=5792 modulator=0 wscale=0] 9:9 S seq=1076524357 ack=957857707 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:21 lnx kernel: pf: State failure on: 1 | 5
May 17 00:41:24 lnx kernel: pf: BAD state: TCP 192.168.1.100:57459 192.168.1.100:57459 213.40.196.93:80 [lo=1065900272 high=1065906064 win=33304 modulator=0 wscale=1] [lo=957857707 high=957924314 win=5792 modulator=0 wscale=0] 9:9 S seq=1076524357 ack=957857707 len=0 ackskew=0 pkts=4:2 dir=out,fwd
May 17 00:41:24 lnx kernel: pf: State failure on: 1 | 5

Note the source ports - a lot of duplicates - pretty odd. I wonder if this means that the connect () function appears to reuse old sockets too quickly.

Anyway, I don't think the problem should be due to having too little ports availble.

My sysctl range vars :
net.inet.ip.portrange.lowfirst: 1023
net.inet.ip.portrange.lowlast: 600
net.inet.ip.portrange.first: 49152
net.inet.ip.portrange.last: 65535
net.inet.ip.portrange.hifirst: 49152
net.inet.ip.portrange.hilast: 65535
net.inet.ip.portrange.reservedhigh: 1023
net.inet.ip.portrange.reservedlow: 0
net.inet.ip.portrange.randomized: 1
net.inet.ip.portrange.randomcps: 10
net.inet.ip.portrange.randomtime: 45

These are the same as yours - sooooo ... humm
i should note that i now see that these errors are not continuous always. In the start of the test, there are none, until the first 'operation not permitted' comes along. Then a couple of identical BAD STATE entries follow, with a small delay between them. Then towards the test, they seem to appear more often.

I've tries setting the first and hifirst range variables in sysctl to 16000 - this seems to reduce the amount or errors - need to test more.
It doesn't remove all errors though, so we're not there yet.
But it's now very obvious something dodgy is going on with the states.

EDIT i think i have to make logging for all packets and then track all the source ports used and see if any of them appear also as bad states. Just to see if the connect() function indeed wants to reuse ports it has used not long ago.
Shouldn't these source ports be chosen concurrently? every time +1, unless that newly selected port is already in use? It looks like they're randomly selected almost.
Quote from Victor :It looks like they're randomly selected almost.

Quote from Victor :net.inet.ip.portrange.randomized: 1

:banghead:

guess what happens when i put randomised to 0?

IT ****** WORKS AHAHAHA WOOHOO! :headbang: :ices_rofl

/me goes off to test on the other boxes
ps, this randomising explains also why when i turned off PF at one end, it still gave errors on the amsterdam freebsd box -> because source ports were too early reused the other end had trouble putting the connection in the state table, because it was already there.
Quote from Victor ::banghead:

guess what happens when i put randomised to 0?

IT ****** WORKS AHAHAHA WOOHOO! :headbang: :ices_rofl

/me goes off to test on the other boxes

Great !!!!!!
Well, it explains why only freebsd->freebsd is problematic. But back to the beginning of this thread, all the problematic mysql_connect() functions was called from one freebsd box? Notice, that our freebsd machines has also net.inet.ip.portrange.randomized: 1.
I think linux based os's, using iptables & conntrack, have a quicker timeout for ended connections, cleaning up earlier, so the duplicate port wouldn't be a problem. That's my guess anyway.

But yes, I agree, there is something dodgy about the random ports - it seems not to remember the ports it recently used and as such there is a chance that it reuses them too early.

But I don't really care atm - random ports or not, that's really not very important in our case.

Quote :all the problematic mysql_connect() functions was called from one freebsd box?

from one freebsd box to another yes.

I've done the db connection test again, with a simultaneous regular socket connection test -> 0 errors now.

Also did tests at higher connection rates:

100 connections / s = all OK
1000 connections / s = all OK
10000 connections / s = all OK (well, it didn't go faster than 5000 per sec in reality)
Ok, great then .
According to this, pf keeps states of closed connections for 90 seconds.

/OT, from different thread:
Quote from Victor :(this is why i love freebsd)

Nobody´s perfect but Linux and me .

FGED GREDG RDFGDR GSFDG