Umm anyone at woosh know why at random times this week, packets are now no longer going through APE? 2 lns2.sky.woosh.net.nz (202.74.206.8) 243.850 ms 244.428 ms 244.435 ms 3 * * * 4 lnk-ww.atm-12-3-110.u12.brh.telstraclear.net (203.98.23.158) 244.511 ms 245.012 ms 245.278 ms 5 atm-12-3-110.u12.brh.telstraclear.net (203.98.23.157) 319.527 ms 319.764 ms 319.990 ms 6 ggis-gige-v906.telstraclear.net (203.98.18.67) 245.413 ms 242.277 ms 242.259 ms 7 g0-1-0-4.akcr8.global-gateway.net.nz (210.55.202.49) 242.251 ms 433.205 ms 433.255 ms 8 g0-1-0.tkcr3.global-gateway.net.nz (210.55.202.50) 433.652 ms 434.040 ms 434.509 ms 9 203.96.117.142 (203.96.117.142) 372.288 ms 372.789 ms 372.506 ms 10 60.234.9.8 (60.234.9.8) 435.538 ms 435.605 ms 380.208 ms Is woosh de-deeping, or is this just a redundant link? Must admit over 90% packet loss before it switches over to this route.
Just noticed a connection from Orcon to iconz failed as well... Did not get a traceroute on that one sorry.. But fibre to orcon, and I assume fibre / APE connection to iconz. Outage happened on both woosh and orcon at the same time :) On Wed, 2008-06-11 at 22:19 +1200, Chris Hodgetts wrote:
Umm anyone at woosh know why at random times this week, packets are now no longer going through APE?
2 lns2.sky.woosh.net.nz (202.74.206.8) 243.850 ms 244.428 ms 244.435 ms 3 * * * 4 lnk-ww.atm-12-3-110.u12.brh.telstraclear.net (203.98.23.158) 244.511 ms 245.012 ms 245.278 ms 5 atm-12-3-110.u12.brh.telstraclear.net (203.98.23.157) 319.527 ms 319.764 ms 319.990 ms 6 ggis-gige-v906.telstraclear.net (203.98.18.67) 245.413 ms 242.277 ms 242.259 ms 7 g0-1-0-4.akcr8.global-gateway.net.nz (210.55.202.49) 242.251 ms 433.205 ms 433.255 ms 8 g0-1-0.tkcr3.global-gateway.net.nz (210.55.202.50) 433.652 ms 434.040 ms 434.509 ms 9 203.96.117.142 (203.96.117.142) 372.288 ms 372.789 ms 372.506 ms 10 60.234.9.8 (60.234.9.8) 435.538 ms 435.605 ms 380.208 ms
Is woosh de-deeping, or is this just a redundant link? Must admit over 90% packet loss before it switches over to this route.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
Anyone from citylink (ape) alive at the moment? :)
On Wed, 11 Jun 2008 22:22:57 +1200, Chris Hodgetts
Just noticed a connection from Orcon to iconz failed as well...
Did not get a traceroute on that one sorry.. But fibre to orcon, and I assume fibre / APE connection to iconz.
Outage happened on both woosh and orcon at the same time :)
On Wed, 2008-06-11 at 22:19 +1200, Chris Hodgetts wrote:
Umm anyone at woosh know why at random times this week, packets are now no longer going through APE?
2 lns2.sky.woosh.net.nz (202.74.206.8) 243.850 ms 244.428 ms 244.435 ms 3 * * * 4 lnk-ww.atm-12-3-110.u12.brh.telstraclear.net (203.98.23.158) 244.511 ms 245.012 ms 245.278 ms 5 atm-12-3-110.u12.brh.telstraclear.net (203.98.23.157) 319.527 ms 319.764 ms 319.990 ms 6 ggis-gige-v906.telstraclear.net (203.98.18.67) 245.413 ms 242.277 ms 242.259 ms 7 g0-1-0-4.akcr8.global-gateway.net.nz (210.55.202.49) 242.251 ms 433.205 ms 433.255 ms 8 g0-1-0.tkcr3.global-gateway.net.nz (210.55.202.50) 433.652 ms 434.040 ms 434.509 ms 9 203.96.117.142 (203.96.117.142) 372.288 ms 372.789 ms 372.506 ms 10 60.234.9.8 (60.234.9.8) 435.538 ms 435.605 ms 380.208 ms
Is woosh de-deeping, or is this just a redundant link? Must admit over 90% packet loss before it switches over to this route.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
If they weren't, they will be now.
I'm seeing much the same, high loss across the APE.
2008/6/11 Barry Murphy
Anyone from citylink (ape) alive at the moment? :)
On Wed, 11 Jun 2008 22:22:57 +1200, Chris Hodgetts
wrote: Just noticed a connection from Orcon to iconz failed as well...
Did not get a traceroute on that one sorry.. But fibre to orcon, and I assume fibre / APE connection to iconz.
Outage happened on both woosh and orcon at the same time :)
On Wed, 2008-06-11 at 22:19 +1200, Chris Hodgetts wrote:
Umm anyone at woosh know why at random times this week, packets are now no longer going through APE?
2 lns2.sky.woosh.net.nz (202.74.206.8) 243.850 ms 244.428 ms 244.435 ms 3 * * * 4 lnk-ww.atm-12-3-110.u12.brh.telstraclear.net (203.98.23.158) 244.511 ms 245.012 ms 245.278 ms 5 atm-12-3-110.u12.brh.telstraclear.net (203.98.23.157) 319.527 ms 319.764 ms 319.990 ms 6 ggis-gige-v906.telstraclear.net (203.98.18.67) 245.413 ms 242.277 ms 242.259 ms 7 g0-1-0-4.akcr8.global-gateway.net.nz (210.55.202.49) 242.251 ms 433.205 ms 433.255 ms 8 g0-1-0.tkcr3.global-gateway.net.nz (210.55.202.50) 433.652 ms 434.040 ms 434.509 ms 9 203.96.117.142 (203.96.117.142) 372.288 ms 372.789 ms 372.506 ms 10 60.234.9.8 (60.234.9.8) 435.538 ms 435.605 ms 380.208 ms
Is woosh de-deeping, or is this just a redundant link? Must admit over 90% packet loss before it switches over to this route.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
From what we've seen so far, it looks like APE exploded - packets coming out of everywhere gunging up the tubes.
All fixed now, I imagine we'll hear from Citylink shortly.
Erin Salmon
From: Neil Fenemor [mailto:neil(a)underground.geek.nz]
Sent: Wednesday, 11 June 2008 10:28 p.m.
To: Barry Murphy
Cc: NZNOG
Subject: Re: [nznog] Issues with Woosh (Or perhaps APE).
If they weren't, they will be now.
I'm seeing much the same, high loss across the APE.
2008/6/11 Barry Murphy
Just noticed a connection from Orcon to iconz failed as well...
Did not get a traceroute on that one sorry.. But fibre to orcon, and I assume fibre / APE connection to iconz.
Outage happened on both woosh and orcon at the same time :)
On Wed, 2008-06-11 at 22:19 +1200, Chris Hodgetts wrote:
Umm anyone at woosh know why at random times this week, packets are now no longer going through APE?
2 lns2.sky.woosh.net.nz (202.74.206.8) 243.850 ms 244.428 ms 244.435 ms 3 * * * 4 lnk-ww.atm-12-3-110.u12.brh.telstraclear.net (203.98.23.158) 244.511 ms 245.012 ms 245.278 ms 5 atm-12-3-110.u12.brh.telstraclear.net (203.98.23.157) 319.527 ms 319.764 ms 319.990 ms 6 ggis-gige-v906.telstraclear.net (203.98.18.67) 245.413 ms 242.277 ms 242.259 ms 7 g0-1-0-4.akcr8.global-gateway.net.nz (210.55.202.49) 242.251 ms 433.205 ms 433.255 ms 8 g0-1-0.tkcr3.global-gateway.net.nz (210.55.202.50) 433.652 ms 434.040 ms 434.509 ms 9 203.96.117.142 (203.96.117.142) 372.288 ms 372.789 ms 372.506 ms 10 60.234.9.8 (60.234.9.8) 435.538 ms 435.605 ms 380.208 ms
Is woosh de-deeping, or is this just a redundant link? Must admit over 90% packet loss before it switches over to this route.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
Hi all On Wed, Jun 11, 2008 at 10:50:08PM +1200, Erin Salmon said:
From what we've seen so far, it looks like APE exploded - packets coming out of everywhere gunging up the tubes.
All fixed now, I imagine we'll hear from Citylink shortly.
Or not so shortly - sorry for the delay getting back to everybody. Looks like somebody on APE managed to loop the APE fabric back on to us. We saw lots of this in the logs: Jun 11 22:31:14: %SW_MATM-4-MACFLAP_NOTIF: Host 00xx.xxxx.xxxx in vlan y is flapping between port Po2 and port Po1 where lots is, well lots: # sudo zgrep flapping cisco.log.4.gz | wc -l 15713 of those, the switch showing the most flapping activity logged 5924 messages, of which 5908 mention g0/40. We had a word in the ear of the ISP attached to g0/40, they did admit that they were making changes earlier that evening, and they do have a couple of connections to APE. This isn't actually a particularly uncommon occurrence - many of the ISP's that have multiple connections to APE have managed to achieve this over the years, so I don't particularly want to hang these particular guys out to dry, since they're by no means the only folks to make this mistake. What I do observe, though, is that it gets more annoying everytime it happens, and I'd like it to stop. Essentially, we have two ways of making this stop. We can rely on spanning-tree, and hope it spots the loop and blocks somewhere, or we can impose MAC filters so that even if there is a loop, only the ISP's approved devices can appear on that port. Relying on spanning tree clearly doesn't work - it gets filtered, there are incompatibilites between different implementations, and lots of people don't understand it. MAC filtering works extremely effectively, but has been a pain to administer - discovering the MAC filters in place is a constant surprise to exchange users, and it requires that exchange users have to interact with Citylink when they'd rather not do so (usually at 3am, when they're stressed, and we're sleepy). So up until now MAC filtering has been applied haphazardly, because it caused such an increase in workload for both exchange participants and Citylink. Happily, there is now a third way - since completing our 10GE upgrades a few weeks back, we've now mainly Cisco 3750's and 2960's in the core of the exchange, which means that we're now running sufficiently recent versions of IOS that we can support secure static MAC aging. Essentially, that means we can lock each exchange port to a fixed number of MAC's (normally one), and if that MAC is idle for longer than a few minutes, or if the physical link on that port drops, then that MAC gets timed out and a new one can take its place. No muss, no fuss - when you want to attach a new router, you unplug the old one, plug in the new one, if you can organise dropping the physical link you'll be working straight away, otherwise, it'll be five minutes before your new machine starts working. We've been testing aging MAC limits in Wellington, and have found them largely problem free(*), so we've started installing them on ports on APE - initially on ports newly going into service, but with last weeks performance, we're going to place them on every port within the next week. If you have any issues or concerns with this, please get in touch. Cheers Simon (*) The issue that I can see is that if an ISP has a device attached that is chatty (for eg, a layer 3 switch), then there's a chance that it will win the race to be the approved MAC for that port. There isn't much we can do about that, other than setting the MAC limit higher - the best approach would be to shut up the chatty switch, either by removing it, or configuring it quiet.
Simon Blake wrote:
MAC filtering works extremely effectively, but has been a pain to administer - discovering the MAC filters in place is a constant surprise to exchange users, and it requires that exchange users have to interact with Citylink when they'd rather not do so (usually at 3am, when they're stressed, and we're sleepy).
So up until now MAC filtering has been applied haphazardly, because it caused such an increase in workload for both exchange participants and Citylink.
I actually don't see this as a problem. APE and WIX could be considered as critical infrastructure within the New Zealand Internet. As such I think that a certain level of administration overhead can be assumed to be necessary. If what you are saying is that the continued smooth running of APE and WIX is at risk by relying on spanning tree, and that MAC filters would mitigate this risk in the short term, then the increased administration overhead is possibly justified. I don't think having to work around MAC filters is a significant barrier to entry for anyone who is serious about network service continuity. Dean
On 16 Jun 2008, at 22:42, Dean Pemberton wrote:
I don't think having to work around MAC filters is a significant barrier to entry for anyone who is serious about network service continuity.
... and as I mentioned in private mail, it's nothing that can't be easily automated so that port filters can be inspected/modified directly by exchange-point members without having to wait for business- hours warm-bodies to deal with tickets. Other exchange points do this. Joe
On 16 Jun 2008, at 22:53, Joe Abley wrote:
On 16 Jun 2008, at 22:42, Dean Pemberton wrote:
I don't think having to work around MAC filters is a significant barrier to entry for anyone who is serious about network service continuity.
... and as I mentioned in private mail, it's nothing that can't be easily automated so that port filters can be inspected/modified directly by exchange-point members without having to wait for business- hours warm-bodies to deal with tickets. Other exchange points do this.
... but as I should have mentioned before I hit send, Simon's "secure static MAC aging" words also sounded perfectly plausible. My words in the message above quite possibly look like I'm trying to mistakenly give instructions to Citylink on how to run their switch, which is certainly not the case :-) Joe
Just confirmed from colo box in orcon (with BGP) that all routes to ape are
being lost every now and then, traffic then passes over national route
(telstra or telecom).
Xnet I'm seeing the same issue and Chris is seeing it with woosh and iconz.
Seems ape is broken at the moment :/
B
On Wed, 11 Jun 2008 22:22:57 +1200, Chris Hodgetts
Just noticed a connection from Orcon to iconz failed as well...
Did not get a traceroute on that one sorry.. But fibre to orcon, and I assume fibre / APE connection to iconz.
Outage happened on both woosh and orcon at the same time :)
On Wed, 2008-06-11 at 22:19 +1200, Chris Hodgetts wrote:
Umm anyone at woosh know why at random times this week, packets are now no longer going through APE?
2 lns2.sky.woosh.net.nz (202.74.206.8) 243.850 ms 244.428 ms 244.435 ms 3 * * * 4 lnk-ww.atm-12-3-110.u12.brh.telstraclear.net (203.98.23.158) 244.511 ms 245.012 ms 245.278 ms 5 atm-12-3-110.u12.brh.telstraclear.net (203.98.23.157) 319.527 ms 319.764 ms 319.990 ms 6 ggis-gige-v906.telstraclear.net (203.98.18.67) 245.413 ms 242.277 ms 242.259 ms 7 g0-1-0-4.akcr8.global-gateway.net.nz (210.55.202.49) 242.251 ms 433.205 ms 433.255 ms 8 g0-1-0.tkcr3.global-gateway.net.nz (210.55.202.50) 433.652 ms 434.040 ms 434.509 ms 9 203.96.117.142 (203.96.117.142) 372.288 ms 372.789 ms 372.506 ms 10 60.234.9.8 (60.234.9.8) 435.538 ms 435.605 ms 380.208 ms
Is woosh de-deeping, or is this just a redundant link? Must admit over 90% packet loss before it switches over to this route.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
I'm getting the same with world exchange (xnet) at the moment. 80% packet
loss to ape routes, however it seems to be switching to global-gateway
after a while then all is fine.
Not sure if a tech is working on the fault or if this is just happening on
its own.
B
On Wed, 11 Jun 2008 22:19:54 +1200, Chris Hodgetts
Umm anyone at woosh know why at random times this week, packets are now no longer going through APE?
2 lns2.sky.woosh.net.nz (202.74.206.8) 243.850 ms 244.428 ms 244.435 ms 3 * * * 4 lnk-ww.atm-12-3-110.u12.brh.telstraclear.net (203.98.23.158) 244.511 ms 245.012 ms 245.278 ms 5 atm-12-3-110.u12.brh.telstraclear.net (203.98.23.157) 319.527 ms 319.764 ms 319.990 ms 6 ggis-gige-v906.telstraclear.net (203.98.18.67) 245.413 ms 242.277 ms 242.259 ms 7 g0-1-0-4.akcr8.global-gateway.net.nz (210.55.202.49) 242.251 ms 433.205 ms 433.255 ms 8 g0-1-0.tkcr3.global-gateway.net.nz (210.55.202.50) 433.652 ms 434.040 ms 434.509 ms 9 203.96.117.142 (203.96.117.142) 372.288 ms 372.789 ms 372.506 ms 10 60.234.9.8 (60.234.9.8) 435.538 ms 435.605 ms 380.208 ms
Is woosh de-deeping, or is this just a redundant link? Must admit over 90% packet loss before it switches over to this route.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
participants (7)
-
Barry Murphy
-
Chris Hodgetts
-
Dean Pemberton
-
Erin Salmon
-
Joe Abley
-
Neil Fenemor
-
Simon Blake