After much to-ing and fro-ing we are coming to the end of our collective teathers. Since Sunday morning (last Sunday) morning at around 8am we have been having an odd problem with our primary nameserver, ns1.ihug.net.nz (203.29.160.4). It goes something like this, nameserver is doing it's thing happily, when for no apparent reason it just stops responding to named requests for a short time and then a while later starts going again. For obvious reasons this is a major problem. And we can find nothing that is causing this. The symptoms: o Server stops responding to named queries. o At one stage it appeared that when constantly pinging it the TTL would change from 64 to 255 for the exact same duration that the nameserver was affected. I am unable to confirm that this is still the case. o The outages appear to be for exactly 2 or 3 minutes to the seccond. (Maybe other durations?) o At the same time this is happening we are getting "Possible syn flood" messages from various IP addresses (they don't seem to be related). It is my understanding that at about the same time (Sunday) that CLEAR, Waikato and possible Telecom/Xtra had a similar problems (I am not sure where this information comes from or how accurate it is), but are no longer. What we have done: o We swapped our nameservers around, this seems to solve the problem however, it affects our practical ability to control zones etc... o New software installations. The box affected had recently been rebuilt before the incident, and we did not suspect it was a software issue, but rebuilt anyhow. It has affected both Slackware and Debian installations with BIND 8.1.2 and BIND 8.2. Conclusions: o It only occurs when then machine is on 203.29.160.4 and is acting as a master. It does not affect slaves on that IP and does not affect masters on another IP. o Hardware is not a fault - We have used more than one physical machine. The effects did not change. o It appears the the nameserver itself stops during that time. Incoming traffic still reaches the box, but none goes out. Also for the duration there seems to be no nameserver logging. Short completely changing our nameserver infrastrucure we are at a loss as to what we can do. Probable solutions: o None - Goddammit! Any ideas? Dylan Reeve DDI: +64 9 359-2746 Assistant DNS Admin Fax: +64 9 358-5134 ihug business Freecall: 0800 847-638 http://www.ihug.co.nz/ Email: dylan(a)ihug.co.nz --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
On Thu, Jun 03, 1999 at 03:36:15PM +1200, dylan(a)ihug.co.nz wrote:
It is my understanding that at about the same time (Sunday) that CLEAR, Waikato and possible Telecom/Xtra had a similar problems (I am not sure where this information comes from or how accurate it is), but are no longer.
I don't know where this information came from either, but I can't confirm it for CLEAR. No DNS anomolies that I am aware of.
What we have done:
o We swapped our nameservers around, this seems to solve the problem however, it affects our practical ability to control zones etc...
o New software installations. The box affected had recently been rebuilt before the incident, and we did not suspect it was a software issue, but rebuilt anyhow. It has affected both Slackware and Debian installations with BIND 8.1.2 and BIND 8.2.
Did you truss the stuck named's and see where they were sticking?
Conclusions:
o It only occurs when then machine is on 203.29.160.4 and is acting as a master. It does not affect slaves on that IP and does not affect masters on another IP.
Could it be that your slave configuration restricts zone transfers to none, and that bind is clever enough to not bother listening unless there is at least one local zone which is transferable? If this is the case, does this look like a SYN flood to tcp/53? Maybe not intentional -- do you have slaves elsewhere which can route to your master, but which your master can't route back to?
o Hardware is not a fault - We have used more than one physical machine. The effects did not change.
o It appears the the nameserver itself stops during that time. Incoming traffic still reaches the box, but none goes out. Also for the duration there seems to be no nameserver logging.
Maybe it's that hokey operating system you're using :) Joe "FreeBSD" Abley --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
On Thu, Jun 03, 1999 at 10:04:55PM +1200, Joe Abley wrote:
Did you truss the stuck named's and see where they were sticking?
Without wishing to sound picky, what you want to do is 'strace -p
Could it be that your slave configuration restricts zone transfers to none, and that bind is clever enough to not bother listening unless there is at least one local zone which is transferable?
If this is the case, does this look like a SYN flood to tcp/53? Maybe not intentional -- do you have slaves elsewhere which can route to your master, but which your master can't route back to?
More likely the box was just loaded... the SYN flood detection code is a little sensitive for some people. You could try enabling SYN cookies... (depending on kernel version, make sure it's compiled in and then do something like: echo 1 >/proc/sys/net/ipv4/tcp_syncookies" to enable these).
o Hardware is not a fault - We have used more than one physical machine. The effects did not change.
o It appears the the nameserver itself stops during that time. Incoming traffic still reaches the box, but none goes out. Also for the duration there seems to be no nameserver logging.
Maybe it's that hokey operating system you're using :)
Maybe... what version of linux are you running? It's not getting hit by funnies that Alan Cox posted a fix to bugtraq yesterday (um, the deay before I think) is it? -cw --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
On Fri, 4 Jun 1999, Chris Wedgwood wrote:
Did you truss the stuck named's and see where they were sticking?
Without wishing to sound picky, what you want to do is 'strace -p
' and send me the results (the whole list probably won't want them).
If bind isn't scheduling for some reason, and you've got a valid psdatabase, then 'ps auxl | grep bind' and tell me where it's stuck.
I didn't do that myself, but one of the programming staff has, I am not sure what his conclusions were, I am CCing this to him...
Could it be that your slave configuration restricts zone transfers to none, and that bind is clever enough to not bother listening unless there is at least one local zone which is transferable?
If this is the case, does this look like a SYN flood to tcp/53? Maybe not intentional -- do you have slaves elsewhere which can route to your master, but which your master can't route back to?
More likely the box was just loaded... the SYN flood detection code is a little sensitive for some people. You could try enabling SYN cookies... (depending on kernel version, make sure it's compiled in and then do something like: echo 1 >/proc/sys/net/ipv4/tcp_syncookies" to enable these).
The load on the box never goes above about 0.3 even when named is pooping itself.
o It appears the the nameserver itself stops during that time. Incoming traffic still reaches the box, but none goes out. Also for the duration there seems to be no nameserver logging.
Maybe it's that hokey operating system you're using :)
Could be, but I somehow doubt it, I am sure however there is atleast one person on staff who would love to get (Free|Open|Net)BSD on the servers...
Maybe... what version of linux are you running? It's not getting hit by funnies that Alan Cox posted a fix to bugtraq yesterday (um, the deay before I think) is it?
Happened with 2.0.36 and 2.2.9 (patched). Doesn't seem to have anything to do with kernel. The variable we have issolated seem to be IP address and the fact the server is master. Anyhow, I will forward the responses to people with more technical smarts than me and see what they say... Thanks for you response (you too Joe). Dylan Reeve DDI: +64 9 359-2746 Assistant DNS Admin Fax: +64 9 358-5134 ihug business Freecall: 0800 847-638 http://www.ihug.co.nz/ Email: dylan(a)ihug.co.nz --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
On Fri, Jun 04, 1999 at 09:45:40AM +1200, dylan(a)ihug.co.nz wrote:
I didn't do that myself, but one of the programming staff has, I am not sure what his conclusions were, I am CCing this to him...
The other thing you might be able to do, when the problems are occuring is: tcpdump -s 2000 '(port 53) or icmp' and then send me some.file Your apparently running 8.1.2-T3B which fixes the bug smashing attack which I normally see fairly often... plenty of other people are running this version.
The load on the box never goes above about 0.3 even when named is pooping itself.
SYN Cookies won't help the load, it will help determine if you are being SYN flooded though. Actually, you could also do this tcpdump -s 2000 -w some.file 'tcp[13] & 3 != 0' and send me some.file -- this will show SYN and FIN packets. If there are zillions of SYN packets and very few FIN packets, it's probably a SYN flood (you'll also see lots of RST packets out-bound).
Could be, but I somehow doubt it, I am sure however there is atleast one person on staff who would love to get (Free|Open|Net)BSD on the servers...
If its a stack smashing attack, it might help. Mostly because the stack offsets are OS dependent and since only 9 people in the whole world (including Joe) run FreeBSD, probably nobody ever worked out an attack for it :)
Happened with 2.0.36 and 2.2.9 (patched). Doesn't seem to have anything to do with kernel. The variable we have issolated seem to be IP address and the fact the server is master.
Anyhow, I will forward the responses to people with more technical smarts than me and see what they say...
If you have the time and inclination, you might want to check out Dents, who's design is different and hopefully isn't vulnerable to poisoning attacks amongst other things... -cw --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
Chris Wedgwood wrote:
If its a stack smashing attack, it might help. Mostly because the stack offsets are OS dependent and since only 9 people in the whole world (including Joe) run FreeBSD, probably nobody ever worked out an attack for it :)
We run FreeBSD. So who are the other 7? Clearly if the numbers running the OS is a guide to its robustness the relationship is inversely proportional. (:-) MS systems have a huge installed base, Linux has a large installed base, FreeBSD is less popular. -- Mailto:asjl(a)netlink.net.nz Post: Netlink, PO Box 5358, Lambton Quay, Wellington, New Zealand -- --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
Clearly if the numbers running the OS is a guide to its robustness the relationship is inversely proportional. (:-)
Definately, MacOS servers are pillars of performance and reliability. -cw (thinking we are perhaps wondering a tad offtopic, perhaps we should take this up and your-os-sucks-more-than-my-os(a)f00f.org) --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
On Fri, Jun 04, 1999 at 10:03:19AM +1200, Chris Wedgwood wrote:
If its a stack smashing attack, it might help. Mostly because the stack offsets are OS dependent and since only 9 people in the whole world (including Joe) run FreeBSD, probably nobody ever worked out an attack for it :)
Yup. That's me, Yahoo, Hotmail, Walnut Creek, BEST Internet, the IMDB, and three people from IHUG :) Joe --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
On Fri, 4 Jun 1999, Joe Abley wrote:
On Fri, Jun 04, 1999 at 10:03:19AM +1200, Chris Wedgwood wrote:
If its a stack smashing attack, it might help. Mostly because the stack offsets are OS dependent and since only 9 people in the whole world (including Joe) run FreeBSD, probably nobody ever worked out an attack for it :)
Yup. That's me, Yahoo, Hotmail, Walnut Creek, BEST Internet, the IMDB, and three people from IHUG :)
What about FreeBSD.org? I bet it's really WindowsNT, there is no more denying it, NT is the ultimate operating system for both desktop and server applications. *straight face* Gargh.. Can't.... Do... It.... *falls over laughing* Dylan Reeve DDI: +64 9 359-2746 Assistant DNS Admin Fax: +64 9 358-5134 ihug business Freecall: 0800 847-638 http://www.ihug.co.nz/ Email: dylan(a)ihug.co.nz --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
On Fri, Jun 04, 1999 at 10:32:08AM +1200, dylan(a)ihug.co.nz wrote:
On Fri, 4 Jun 1999, Joe Abley wrote:
Yup. That's me, Yahoo, Hotmail, Walnut Creek, BEST Internet, the IMDB, and three people from IHUG :)
What about FreeBSD.org? I bet it's really WindowsNT, there is no more denying it, NT is the ultimate operating system for both desktop and server applications.
FreeBSD.org is hosted at Walnut Creek. Keep trying, linux-weenie :) Joe --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
participants (5)
-
Andy Linton
-
Chris Wedgwood
-
dylan@ihug.co.nz
-
Joe Abley
-
Joe Abley