We've been experiencing ~15 peers flapping their BGP sessions on a regular basis facing the APE route servers this afternoon. After a fair amount of digging through logs etc we think we've mostly pieced together what's happened. Sometime this morning these peers started flapping their sessions. We've not sure what the trigger event was but once they started flapping the route servers became very busy. Once in this busy state the route servers were struggling to respond to BGP keepalives in a timely fashion. As a result sessions where the BGP timers had been changed from the default ( 3 x 60 seconds ) started to notice missing keepalives and bounced their sessions. Once these sessions started bouncing the route servers become even busier and the problem became self-sustaining. The route servers have now settled down but it's unclear if that's due to our fiddling with them or a peer ceasing some unfriendly behaviour. We're working on building some new route servers with more horse power. In the meantime some interesting facts (taken from rs2): 84 peers defaulted to 60 second keepalives (this includes inactive sessions). 22 peers forced the timer down to 30 seconds. 2 peers forced the timer down to 20 seconds. 2 peers forced the timer down to 15 seconds. 2 peers forced the timer down to 10 seconds. 2 peers forced the timer down to 8 seconds. 2 peers forced the timer down to 5 seconds. 1 peer forced the timer down to 2 seconds. There was a *very* strong correlation between low keepalive timers and how badly a peer was affected. Suffice to say the peer with a 2 second keepalive spent more time standing up their session than with a working session. This seems like a good time to question the wisdom of using a low keepalive on the APE (or WIX etc). Given your BGP session is with the route servers, not other APE participants the absence of a route server doesn't necessarily imply the inability to reach other peers. On the other hand if your APE connection fails outright, reacting promptly is beneficial. Does anyone have a strong opinion on this? As part of trying to reduce the load on the route servers we've removed the majority of the inactive peers. If this affects you and you'd like your session reinstated please drop an email to peering(a)citylink.co.nz. At this stage we're only aware of one significant change recently, Pipe Networks have joined the APE and started announcing routes in the last 24 hours. They're now our second largest contributor with 595 routes at last count :) One working theory is that the additional routes pushed another peers router over it's limit (memory, hardware resource, etc) and their router started crashing/resetting. Dylan
Hi Folks
On Fri, Feb 10, 2012 at 5:05 PM, Dylan Hall
We've been experiencing ~15 peers flapping their BGP sessions on a regular basis facing the APE route servers this afternoon.
<snip> So, who had their sessions configured for max prefix of 2500? :) Cheers, Blair
* cough * Mmmmmmmaybe. WHAT OF IT?! JSR On 10/02/2012, at 7:54 PM, Blair Harrison wrote:
Hi Folks
On Fri, Feb 10, 2012 at 5:05 PM, Dylan Hall
wrote: We've been experiencing ~15 peers flapping their BGP sessions on a regular basis facing the APE route servers this afternoon.
<snip>
So, who had their sessions configured for max prefix of 2500?
:)
Cheers, Blair
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
Y'All might like to check your sessions are still up, looks like PIPE
started announcing some more prefixes and probably hit some more
limits :)
Cheers,
Blair
On Sun, Feb 12, 2012 at 12:49 AM, John Russell
* cough *
Mmmmmmmaybe. WHAT OF IT?!
JSR
On 10/02/2012, at 7:54 PM, Blair Harrison wrote:
Hi Folks
On Fri, Feb 10, 2012 at 5:05 PM, Dylan Hall
wrote: We've been experiencing ~15 peers flapping their BGP sessions on a regular basis facing the APE route servers this afternoon.
<snip>
So, who had their sessions configured for max prefix of 2500?
:)
Cheers, Blair
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
A general question - would it be sensible for APE/WIX/etc to recommend or even require a minimum max-prefix limit and if anyone expects to go over that, for them to notify people? On 16/02/2012, at 10:46 AM, Blair Harrison wrote:
Y'All might like to check your sessions are still up, looks like PIPE started announcing some more prefixes and probably hit some more limits :)
Cheers, Blair
On Sun, Feb 12, 2012 at 12:49 AM, John Russell
wrote: * cough *
Mmmmmmmaybe. WHAT OF IT?!
JSR
On 10/02/2012, at 7:54 PM, Blair Harrison wrote:
Hi Folks
On Fri, Feb 10, 2012 at 5:05 PM, Dylan Hall
wrote: We've been experiencing ~15 peers flapping their BGP sessions on a regular basis facing the APE route servers this afternoon.
<snip>
So, who had their sessions configured for max prefix of 2500?
:)
Cheers, Blair
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog _______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
-- Jay Daley Chief Executive .nz Registry Services (New Zealand Domain Name Registry Limited) desk: +64 4 931 6977 mobile: +64 21 678840
Looks like RS2 is bouncing, see attached. Work taking place? barry On Thu, 16 Feb 2012 10:53:44 +1300, Jay Daley wrote:
A general question - would it be sensible for APE/WIX/etc to recommend or even require a minimum max-prefix limit and if anyone expects to go over that, for them to notify people?
On 16/02/2012, at 10:46 AM, Blair Harrison wrote:
Y'All might like to check your sessions are still up, looks like PIPE started announcing some more prefixes and probably hit some more limits :)
Cheers, Blair
On Sun, Feb 12, 2012 at 12:49 AM, John Russell
wrote: * cough *
Mmmmmmmaybe. WHAT OF IT?!
JSR
On 10/02/2012, at 7:54 PM, Blair Harrison wrote:
Hi Folks
On Fri, Feb 10, 2012 at 5:05 PM, Dylan Hall
wrote: We've been experiencing ~15 peers flapping their BGP sessions on a regular basis facing the APE route servers this afternoon.
<snip>
So, who had their sessions configured for max prefix of 2500?
:)
Cheers, Blair
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog _______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
After a further patch of instability this morning we've made a number changes on the APE. - We've contacted people who had very low timer settings and had them changed to less aggressive settings. - We've upgraded the hardware of one of the route servers this afternoon. - On the other route server we deployed the changes to filtering that were discussed at NZNOG and in the postings Andy recently made to this list. These changes have resulted in much greater stability and the latter change has resulted in a significantly lower load on the route server still running on the old hardware. We anticipate rolling out a further hardware upgrade in the short term and will be deploying the changed filter lists tomorrow albeit with a relaxed set of rules that will allow /25- /29 prefixes. We'll be contacting customers we see still advertising those prefixes prior to us modifying the filters on 1 April to disallow them. If you missed Andy's posts the pertinent parts were - "If you advertise prefixes from /25 to /29 to the route servers they won't be accepted" and "When do we plan to do this? Soon. Like this month - so if you've got issues with this then please contact us directly at peering at citylink.co.nz" Thanks, The CityLink Team
participants (5)
-
Barry Murphy
-
Blair Harrison
-
Dylan Hall
-
Jay Daley
-
John Russell