FW: Help! Cistron major failure in production! (fwd)
Anyone had any experience with cistron radius? I'm suffering some sort of
major magic weridy problem, if anyone has any ideas.
---
Matt Camp
---------- Forwarded message ----------
Date: Wed, 25 Jul 2001 17:24:51 +1200 (NZST)
From: Matt Camp
I have had similar problems in the past like this. First place I normally look is at things that have changed. In your case the users file, configs What I would try is building a basic users file from scratch. One that you know will defently work see what happens from there. I found once that a profile that had a syntax error and managed to stuff the rest of them up. The radius server had failed to tell me that there was a syntax error and continued on its way denying every user it could. I bascaully put in a DEFAULT profile and worked backwards until I found the problem. This was using Merit Radius mind you, infact the free one :-) Give it a try, it might shed some light on your problem. Simon Allard (Senior Tool Monkey) IHUG Ph (09) 358-5067 Email: simon.allard(a)staff.ihug.co.nz "There is no spoon"
Anyone had any experience with cistron radius? I'm suffering some sort of major magic weridy problem, if anyone has any ideas.
--- Matt Camp
---------- Forwarded message ---------- Date: Wed, 25 Jul 2001 17:24:51 +1200 (NZST) From: Matt Camp
To: cistron-radius(a)lists.cistron.nl Subject: Help! Cistron major failure in production! I run a network of 4 cistron 1.6.3 servers on FreeBSD machines.
This afternoon, all 4 of them stopped responding. Its as like the radiusd process isn't running. Neither our lucent NAS gear, radtest, or the various people who have proxys pointed at our servers get any response at all.
The server logs, and console when running in -xxx mode show that the server starts, and then is just waiting for connections.
Every 5 minutes, i propagate a new raddb/users file out from a central host. This system has worked fine for around 18 months now.
On _ONE_ of the servers, if i restart it (totally. Kill and restart radiusd), the log shows "Wed Jul 25 17:21:38 2001: Info: Starting - reading configuration files ...", then there is nothing for approximately 4 minutes, after which it appears to process all the requests that have been sent to it during that time. (of course, most of them actually fail in reality since the NAS has long since dropped the call).
Then it will run fine, up until it detects a new users file, and reloads the config files, at which point its back into total non-responsiveness until i manually restart it.
This only works on one server. The other 3 don't come back at all, even after a restart.
ANd they're all on the same config. (Which i rsync to them after making a change)
radtest tests from localhost which normally work fine just act exactly as if the radiusd process isn't even running.
I've tried everything i can think of, including the microsoft approach of power-cycling all machines involved. Load averages are normal, disk space is fine. Heaps of memory available.
File descriptor loads seem ok, in that there are only around 100 or so connections to each box in total. (for the other services they run)
Any ideas? This is urgent... I've got thousands customers who can't log in right now.
--- Matt Camp
--------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
--------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
This was my initial thought too, so i cleaned it right back to a single-profile users file, and it made no difference. A truss of the radiusd process gives the following upon it finishing loading, and ready to accept clients: syscall open("/etc/hosts",0,0666) returns 6 (0x6) syscall fstat(6,0xbfbfd70c) returns 0 (0x0) syscall read(0x6,0x921b000,0x2000) returns 1021 (0x3fd) syscall read(0x6,0x921b000,0x2000) returns 0 (0x0) syscall close(6) returns 0 (0x0) syscall socket(0x2,0x2,0x0) returns 6 (0x6) syscall connect(0x6,0x280f3110,0x10) returns 0 (0x0) syscall sendto(0x6,0xbfbfcbcc,0x2b,0x0,0x0,0x0) returns 43 (0x2b) syscall gettimeofday(0xbfbfc984,0x0) returns 0 (0x0) Then nothing until you point a client at it, at which point you get: syscall poll(0xbfbfc974,0x1,0x1388) returns 0 (0x0) syscall close(6) returns 0 (0x0) syscall socket(0x2,0x2,0x0) returns 6 (0x6) syscall sendto(0x6,0xbfbfcbcc,0x2b,0x0,0x280f3120,0x10) returns 43 (0x2b) syscall gettimeofday(0xbfbfc984,0x0) returns 0 (0x0) yet, the client acts as if there was never any connection. On Wed, 25 Jul 2001, Simon Allard wrote:
I have had similar problems in the past like this. First place I normally look is at things that have changed. In your case the users file, configs
--- Matt Camp --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
Found the problem! After much playing with ktrace and kdump, found what it was barfing on. The 192.168.0.33 and 192.168.0.34 addresses for the telecom DSL radius proxies... which weren't in /etc/hosts. Adding them fixed it. Now, the big question is, Why has this worked fine for 6 months, and only decided to crap out today? Thanks for all who posted suggestions. On Wed, 25 Jul 2001, Simon Allard wrote:
This afternoon, all 4 of them stopped responding. Its as like the radiusd process isn't running. Neither our lucent NAS gear, radtest, or the various people who have proxys pointed at our servers get any response at all.
--- Matt Camp --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
Now, the big question is, Why has this worked fine for 6 months, and only decided to crap out today?
Options ... [1] Management heard about your employment post this morning [2] Jabley was playing with remote access from a 747-400 [3] Your gear had a bad hair day -PWM "Hell, there are no rules here - we're trying to accomplish something!" Thomas A. Edison --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
Matt Camp
After much playing with ktrace and kdump, found what it was barfing on.
The 192.168.0.33 and 192.168.0.34 addresses for the telecom DSL radius proxies... which weren't in /etc/hosts.
Adding them fixed it.
Now, the big question is, Why has this worked fine for 6 months, and only decided to crap out today?
I'd guess that it was being rather dumb about trying to look up reverse map entries for these in the DNS. 168.192.in-addr.arpa is delegated to blackhole.ep.net and blackhole.isi.edu, of which the ep.net name servers don't seem to be responding at the moment. Idea: create empty zones for 168.192.in-addr.arpa zones on your DNS forwarders to intercept such lookups and give immediate errors, rather than waiting on the blackhole... servers to fail to answer. -- don --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
At 12:07 AM 26/07/01 +1200, Don Stokes wrote:
Matt Camp
wrote: After much playing with ktrace and kdump, found what it was barfing on.
The 192.168.0.33 and 192.168.0.34 addresses for the telecom DSL radius proxies... which weren't in /etc/hosts.
Adding them fixed it.
Now, the big question is, Why has this worked fine for 6 months, and only decided to crap out today?
I'd guess that it was being rather dumb about trying to look up reverse map entries for these in the DNS. 168.192.in-addr.arpa is delegated to blackhole.ep.net and blackhole.isi.edu, of which the ep.net name servers don't seem to be responding at the moment.
Strange, I don't see the same problem here, although I'm running 1.6.4, or it could be a difference in our DNS server setup. It's normal for radiusd to reverse look-up the NAS ip address since its possible to specify NAS's by hostname in raddb/clients, and if the DNS lookups are timing out very slowly it would definately cause problems...The /etc/hosts entry is probably a good idea....(/me goes and does it now to be sure :) And to Matt: If you're doing automated updates to the users file, etc, and propogating them to redundant servers without some good verification there are no mistakes in those files, I'd suggest you update to 1.6.4 so you can use the -C option to automatically check the new config files before applying them so your servers dont all go splat at once because of some silly little error in one of those files :) (It happens....) Regards, Simon --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
On Thu, Jul 26, 2001 at 12:07:32AM +1200, Don Stokes wrote:
Matt Camp
wrote: Now, the big question is, Why has this worked fine for 6 months, and only decided to crap out today?
I'd guess that it was being rather dumb about trying to look up reverse map entries for these in the DNS. 168.192.in-addr.arpa is delegated to blackhole.ep.net and blackhole.isi.edu, of which the ep.net name servers don't seem to be responding at the moment.
Bill Manning posted something to nanog about that; the ep.net nameservers were being DoSed, so Bill had his providers null-route them for a while until the attack stopped (*shrug*). I bet he never realised that by doing so he would break dial-up access for users on the other side of the planet :) Joe --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
On Wed, Jul 25, 2001 at 06:20:27PM +1200, Matt Camp wrote: The 192.168.0.33 and 192.168.0.34 addresses for the telecom DSL radius proxies... which weren't in /etc/hosts. Adding them fixed it. The proper fix is for someone at Telecom to get a clue and not use such such stupid address choices. This also applies to all the morons out there who number interfaces with rfc1918 address space too. (If you're doing this, and don't see why it's a bad idea, then you shouldn't be doing this. Pottery is nice and relaxing). --cw --------- To unsubscribe from nznog, send email to majordomo(a)list.waikato.ac.nz where the body of your message reads: unsubscribe nznog
participants (7)
-
Chris Wedgwood
-
Don Stokes
-
Joe Abley
-
Matt Camp
-
Peter Mott
-
Simon Allard
-
Simon Byrnand