Outage update from Telecom
This is what they're telling us at the moment: ---------------------------------------------------------------------- Telecom technicians have identified the two locations where the North Island Telecom network has today been damaged and are working to repair them. It is anticipated that a patch will be in place by late this afternoon, enabling the restoration of services. A large number of mobile, Internet and other data services have been affected. In addition, many voice calls on fixed lines are congested. The outage has been affecting customers since late this morning. Telecom is working to repair the outage as soon as possible. Telecom regrets the inconvenience to customers and will provide more information throughout the day. The outage, which occurred at 10.48am today, was caused by physical damage to the network on both the east and west coast sections of the network. Technicians are currently working on repairing both the eastern fibre break which is in the Rimutaka area, and the western break in south Taranaki. The outage is affecting calls to Telecom’s own call centres, causing overloading. Telecom is asking customers to be patient. We will provide more information as soon as possible. For more information, please contact: Sarah Berry Media Relations Executive 04 498 9481 027 470 7900 Phil Love Senior Media Relations Executive 04 498 9155 027 277 8496 Melanie Marshall 04 498 9272 027 452 6231 John Goulter Public Affairs & Government Relations Manager 04 498 9369
We have some of our links coming back up at the moment.
So hopefully TNZ has fixed some of the problems.
Drew Collins
Group Communications Manager
Group Technology Services
APN Holdings NZ Ltd
DDI: +64 9 373 9573
Mobile +64 21 823268
Fax: +64 9 373 6411
Ph: +64 9 379 5050
eMail: drew.collins(a)apn.co.nz
Website: www.apn.co.nz
Juha Saarinen
fingers crossed
Regards
Adam Fenech
www.dreamnet.co.nz
0800 484 393
WARNING This email contains information which is CONFIDENTIAL and may be subject to LEGAL PRIVILEGE. If you are not the intended recipient you must not peruse, use, disseminate, distribute or copy this email or attachments. If you have received this in error please notify us immediately by return email, 0800484393 and delete this email. Thank you.
----- Original Message -----
From: Drew.Collins(a)apn.co.nz
To: Juha Saarinen
Cc: NZNOG List
Sent: Monday, June 20, 2005 3:24 PM
Subject: Re: [nznog] Outage update from Telecom
We have some of our links coming back up at the moment.
So hopefully TNZ has fixed some of the problems.
Drew Collins
Group Communications Manager
Group Technology Services
APN Holdings NZ Ltd
DDI: +64 9 373 9573
Mobile +64 21 823268
Fax: +64 9 373 6411
Ph: +64 9 379 5050
eMail: drew.collins(a)apn.co.nz
Website: www.apn.co.nz
Juha Saarinen
We're back too. _____ From: Drew.Collins(a)apn.co.nz [mailto:Drew.Collins(a)apn.co.nz] Sent: Monday, June 20, 2005 3:24 PM To: Juha Saarinen Cc: NZNOG List Subject: Re: [nznog] Outage update from Telecom We have some of our links coming back up at the moment. So hopefully TNZ has fixed some of the problems.
Anyone on Telecom PON have their links up yet? James Butler NewJobz _____ From: Craig Humphrey [mailto:Craig.Humphrey.Work(a)paradise.net.nz] Sent: Monday, June 20, 2005 3:50 PM To: 'NZNOG List' Subject: RE: [nznog] Outage update from Telecom We're back too. _____ From: Drew.Collins(a)apn.co.nz [mailto:Drew.Collins(a)apn.co.nz] Sent: Monday, June 20, 2005 3:24 PM To: Juha Saarinen Cc: NZNOG List Subject: Re: [nznog] Outage update from Telecom We have some of our links coming back up at the moment. So hopefully TNZ has fixed some of the problems.
yep James William Butler wrote:
Anyone on Telecom PON have their links up yet?
James Butler
NewJobz
------------------------------------------------------------------------
*From:* Craig Humphrey [mailto:Craig.Humphrey.Work(a)paradise.net.nz] *Sent:* Monday, June 20, 2005 3:50 PM *To:* 'NZNOG List' *Subject:* RE: [nznog] Outage update from Telecom
We're back too.
------------------------------------------------------------------------
*From:* Drew.Collins(a)apn.co.nz [mailto:Drew.Collins(a)apn.co.nz] *Sent:* Monday, June 20, 2005 3:24 PM *To:* Juha Saarinen *Cc:* NZNOG List *Subject:* Re: [nznog] Outage update from Telecom
We have some of our links coming back up at the moment.
So hopefully TNZ has fixed some of the problems.
------------------------------------------------------------------------
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
POP, PON, NGN, TLA of the week... we're on it, and we're back. BTW we were told that the barer lines to the South Island were down, but it turns out our Welly office never lost connectivity with our Christchurch office... (though did experience increased packet loss) _____ From: James William Butler [mailto:muh47_6(a)hotmail.com] Sent: Monday, June 20, 2005 4:07 PM To: NZNOG(a)list.waikato.ac.nz Subject: RE: [nznog] Outage update from Telecom Anyone on Telecom PON have their links up yet? James Butler NewJobz _____ From: Craig Humphrey [mailto:Craig.Humphrey.Work(a)paradise.net.nz] Sent: Monday, June 20, 2005 3:50 PM To: 'NZNOG List' Subject: RE: [nznog] Outage update from Telecom We're back too.
maybe you went out over microwave? what sort of latency did you experience? Craig Humphrey wrote:
POP, PON, NGN, TLA of the week... we're on it, and we're back.
BTW we were told that the barer lines to the South Island were down, but it turns out our Welly office never lost connectivity with our Christchurch office... (though did experience increased packet loss)
------------------------------------------------------------------------ *From:* James William Butler [mailto:muh47_6(a)hotmail.com] *Sent:* Monday, June 20, 2005 4:07 PM *To:* NZNOG(a)list.waikato.ac.nz *Subject:* RE: [nznog] Outage update from Telecom
Anyone on Telecom PON have their links up yet?
James Butler
NewJobz
------------------------------------------------------------------------
*From:* Craig Humphrey [mailto:Craig.Humphrey.Work(a)paradise.net.nz] *Sent:* Monday, June 20, 2005 3:50 PM *To:* 'NZNOG List' *Subject:* RE: [nznog] Outage update from Telecom
We're back too.
------------------------------------------------------------------------
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
WGN <-> CHC latency stayed at it's normal ~9ms average round trip, jitter was possibly slightly higher than normal (peak latency around 16ms).
-----Original Message----- From: Dan Clark [mailto:dan(a)synaptic.net.nz] Sent: Monday, June 20, 2005 4:21 PM To: Craig Humphrey Cc: 'James William Butler'; NZNOG(a)list.waikato.ac.nz Subject: Re: [nznog] Outage update from Telecom
maybe you went out over microwave? what sort of latency did you experience?
So is this a good time to ask about the design of a network that concentrates so much of its core service in Auckland? Is TelstraClear's network design any better? How many regional providers were unable to talk to others in the same region because everything (?) is hauled back to Auckland? Does having Domestic Internet Service from Telecom and TelstraClear concentrated between these two "Tier 1" providers seem like a good idea? Does a network of regional Exchanges have any value? If TelstraClear had been peering at the Exchanges would their connectivity to Telecom's Tier 2 customers been disrupted as much? As Telecom customers you should demand answers about their failure of course but you also should be thinking about how you can have things in place to route round failures as well.
Andy, all good points (as per your norm)
From a large corp side, without our redundancy via TelstraClear today we would have lost 3 of our major newspapers. So full vendor redundancy works, it just costs more.
Drew Collins
Group Communications Manager
Group Technology Services
APN Holdings NZ Ltd
DDI: +64 9 373 9573
Mobile +64 21 823268
Fax: +64 9 373 6411
Ph: +64 9 379 5050
eMail: drew.collins(a)apn.co.nz
Website: www.apn.co.nz
Andy Linton
At 5:01 PM +1200 20/6/05, Andy Linton wrote:
So is this a good time to ask about the design of a network that concentrates so much of its core service in Auckland? Is TelstraClear's network design any better?
I'd like to think so. :) For a start, it takes more than two fibre cuts in the North Island to disrupt us (that's new since you fled Andy :) Currently the core network is distributed and I prefer to follow that mantra whenever possible, though sometimes resources are naturally 'clumpy' (is that a word?) -- Michael Newbery IP Architect TelstraClear Limited Tel: +64-4-920 3102 Mobile: +64-29-920 3102 Fax: +64-4-920 3361
Michael Newbery wrote:
I'd like to think so. :) For a start, it takes more than two fibre cuts in the North Island to disrupt us (that's new since you fled Andy :)
Good! How about three? (:-) What happens in the South Island? Crossing the Cook Straight? ..... The point I'm trying to make is that for most of us we've no idea how reduncancy works in the Telecom and TelstraClear networks. Given that the "trust us, we've got five 9s reliability in the core" message from Telecom sounds a little hollow right now, it seems reasonable to ask the same question of TelstraClear and other major providers. And the real point as you so rightly said is that distributed systems are the way to go. I understand about 'clumpiness' but I've never yet seen a telco who doesn't tend to excessive clumpiness and that's one of the questions I'm asking. This isn't a poke at Telecom or TelstraClear - they are what they are. What I am asking is for those on this list to ask themselves should they rely on a single provider for all their services (and that's definitely "clumpiness" that the telcos encourage) and then complain when the inevitable happens. And I can't think of a better time to ask these questions!
John Goulter on Checkpoint explaining it all http://y3m.net/files/audio/telecom.fault.interview-2005-06-20.mp3 Mark Weldon talking about the outage to NZX (second time in the last week! IIRC last one was due to a HDD failure in a firewall) http://y3m.net/files/audio/telecom.fault.weldon-2005-06-20.mp3 I am now waiting for the questions in parliament about the outage on stuff like (all this from a post made in one of the online forums so may not be right. I doubt anyone within Telecom would tell us the real story) a. 111 Calls b. Apparently some home detention services c. Things like the following make the dsl outage seem minor Air New Zealand - flights impacted All South Island Airports grounded Airways - planes grounded d. Banks, Stock Exchange, Credit unions affected (List is much longer but these are a few examples) All it took was two problems: a rat and a post digger. We don't need any terrorists 8) Like Andy asking, can we really protect from such things? Do we have the redundancy? Hypothetically if Someone in telecom went to Telstraclear and asked if they could take over 111 calls, could they? Could NZ afford more redudancy? Could NZ not afford to have greater redundancy? Apart from Telecom, I doubt we have anything better. There is no phone network alternative. Only nationwide phone network that was going today was vodafone. However that doesn't do 111, DSL or IPNet. Is this the cost we pay for having such a big monopoly dominate telecommunications in NZ? Let's not forget that TelstraClear and many other ISPs in NZ resell Telecom (either wholesale or UBS). TCL has patches of their own network (eg Chch and wgtn are two examples). However the last mile home for majority of NZers is still Telecom. lin
On Mon, 20 Jun 2005, Lin Nah wrote:
a. 111 Calls ok I am going to have to apologise for the above. It was something I heard earlier on but I think later it came out that 111 calls were not affected (unless perhaps ppl calling it from their mobile?). There were however 75 police stations affected.
Basically info was posted to two public forums, and the link circulated 8) regards lin
I wonder what ever happened to the Post Office/Telecom Analogue/Fibre Link that ran along State highway one... Both the Rimutaka and Taranaki Faults are miles away! LOL... It sucks to be a VoIP customer! -----Original Message----- From: Lin Nah [mailto:lin(a)darkmere.gen.nz] Sent: Monday, 20 June 2005 07:20 p.m. To: nznog(a)list.waikato.ac.nz Subject: Re: [nznog] Outage update from Telecom John Goulter on Checkpoint explaining it all http://y3m.net/files/audio/telecom.fault.interview-2005-06-20.mp3 Mark Weldon talking about the outage to NZX (second time in the last week! IIRC last one was due to a HDD failure in a firewall) http://y3m.net/files/audio/telecom.fault.weldon-2005-06-20.mp3 I am now waiting for the questions in parliament about the outage on stuff like (all this from a post made in one of the online forums so may not be right. I doubt anyone within Telecom would tell us the real story) a. 111 Calls b. Apparently some home detention services c. Things like the following make the dsl outage seem minor Air New Zealand - flights impacted All South Island Airports grounded Airways - planes grounded d. Banks, Stock Exchange, Credit unions affected (List is much longer but these are a few examples) All it took was two problems: a rat and a post digger. We don't need any terrorists 8) Like Andy asking, can we really protect from such things? Do we have the redundancy? Hypothetically if Someone in telecom went to Telstraclear and asked if they could take over 111 calls, could they? Could NZ afford more redudancy? Could NZ not afford to have greater redundancy? Apart from Telecom, I doubt we have anything better. There is no phone network alternative. Only nationwide phone network that was going today was vodafone. However that doesn't do 111, DSL or IPNet. Is this the cost we pay for having such a big monopoly dominate telecommunications in NZ? Let's not forget that TelstraClear and many other ISPs in NZ resell Telecom (either wholesale or UBS). TCL has patches of their own network (eg Chch and wgtn are two examples). However the last mile home for majority of NZers is still Telecom. lin _______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
Lin Nah wrote:
Apart from Telecom, I doubt we have anything better. There is no phone network alternative. Only nationwide phone network that was going today was vodafone. However that doesn't do 111, DSL or IPNet.
BCL are another option and they'll give me an SLA. -- Bill Walker, MCSE, MCP+I Partner Netspeed (Wanaka) Ltd ------------------------------------------------------------- Phone: +64 21 222 4440 Fax: +64 3 443 4440 Email: bill.walker(a)netspeed.net.nz Web: www.netspeed.net.nz ICQ: 4746863 MSN: msn(a)wjw.co.uk Skype: wjw.co.uk
At 5:56 PM +1200 20/6/05, Andy Linton wrote:
The point I'm trying to make is that for most of us we've no idea how reduncancy works in the Telecom and TelstraClear networks. Given that the "trust us, we've got five 9s reliability in the core" message from Telecom sounds a little hollow right now, it seems reasonable to ask the same question of TelstraClear and other major providers.
And the real point as you so rightly said is that distributed systems are the way to go. I understand about 'clumpiness' but I've never yet seen a telco who doesn't tend to excessive clumpiness and that's one of the questions I'm asking.
This isn't a poke at Telecom or TelstraClear - they are what they are. What I am asking is for those on this list to ask themselves should they rely on a single provider for all their services (and that's definitely "clumpiness" that the telcos encourage) and then complain when the inevitable happens.
If you ask me for high availability, I'll ask *you* (and what I ask those customers and prospective customers that I get wheeled in front of): * How much does downtime cost you? We'll tell you how much we'll charge you for two, three, four, five, six, seven etc 'nines' etc. At which point you decide if it's worth it to you---or not. I don't want to get into an availability tutorial here, but consider Availability Downtime 99% 3.65 days/year 99.9% 8.76 hours/year 99.99% 52.56 minutes/year 99.999% 5.26 minutes/year 99.9999% 31.54 seconds/year I.e., to go from 99.99% to 99.999%, just how much is 0:47:30 worth to you? The surest way to increase availability is to duplicate infrastructure. But, since a chain is only as strong as its weakest link, unless and until you duplicate EVERY link in the chain, you don't increase availability. Thus, single points of failure, are the Great Enemy, and we avoid them whenever possible. (We have elected to triplicate, or better, infrastructure in some cases, but that costs lots of money. Sometimes we can justify it though.) If high availability is critical to you, we'll help you work out how to do that. And point out such pitfalls as places where our fibre runs down one side of a bridge, and Telecom's runs down the other, (so just using an alternate provider may not help you as much as you'd thought, when the river rises...), and help you work around that (if possible). I fear I could go on for some time...so I think I'll stop now. -- Michael Newbery IP Architect TelstraClear Limited Tel: +64-4-920 3102 Mobile: +64-29-920 3102 Fax: +64-4-920 3361
On Tue, 2005-06-21 at 10:32 +1200, Michael Newbery wrote:
At 5:56 PM +1200 20/6/05, Andy Linton wrote:
The point I'm trying to make is that for most of us we've no idea how reduncancy works in the Telecom and TelstraClear networks. Given that the "trust us, we've got five 9s reliability in the core" message from Telecom sounds a little hollow right now, it seems reasonable to ask the same question of TelstraClear and other major providers.
And the real point as you so rightly said is that distributed systems are the way to go. I understand about 'clumpiness' but I've never yet seen a telco who doesn't tend to excessive clumpiness and that's one of the questions I'm asking.
* How much does downtime cost you?
We'll tell you how much we'll charge you for two, three, four, five, six, seven etc 'nines' etc. At which point you decide if it's worth it to you---or not.
I don't want to get into an availability tutorial here, but consider Availability Downtime 99% 3.65 days/year 99.9% 8.76 hours/year 99.99% 52.56 minutes/year 99.999% 5.26 minutes/year 99.9999% 31.54 seconds/year
I.e., to go from 99.99% to 99.999%, just how much is 0:47:30 worth to you?
The surest way to increase availability is to duplicate infrastructure. But, since a chain is only as strong as its weakest link, unless and until you duplicate EVERY link in the chain, you don't increase availability. Thus, single points of failure, are the Great Enemy, and we avoid them whenever possible. (We have elected to triplicate, or better, infrastructure in some cases, but that costs lots of money. Sometimes we can justify it though.)
Some good points Michael, but I suspect your analysis doesn't include failure doesn't include a visit from Mr FatFingers and friend at the Operations console. The only way downstream can protect itself from Mr FatFingers at Upstream is through institutional diversity, and no amount of network architecture within an single management entity can avoid that. And really that's a decision based on the the premise that people do stupid stuff in different organisations at different times. And I know we all make mistakes.
If high availability is critical to you, we'll help you work out how to do that. And point out such pitfalls as places where our fibre runs down one side of a bridge, and Telecom's runs down the other, (so just using an alternate provider may not help you as much as you'd thought, when the river rises...), and help you work around that (if possible).
Agreed. There are going to be some immovable objects in the single point of failure shuffle. You can't eliminate them. All you can do is shift it to a lowest risk area.
I fear I could go on for some time...so I think I'll stop now.
Seems on topic to me.
On 2005-06-20, at 18:49, jamie.baddeley(a)vpc.co.nz wrote:
Some good points Michael, but I suspect your analysis doesn't include failure doesn't include a visit from Mr FatFingers and friend at the Operations console.
This is a wildly good point. Vijay Gill did an analysis of AOL's Nx10G backbone network which he presented at a NANOG a while back, and it transpired that the overwhelming majority of customer- affecting outages were due to operator error, rather than equipment or circuit failure.
The only way downstream can protect itself from Mr FatFingers at Upstream is through institutional diversity, and no amount of network architecture within an single management entity can avoid that. And really that's a decision based on the the premise that people do stupid stuff in different organisations at different times. And I know we all make mistakes.
What Jamie said. Joe
At 7:06 PM -0400 20/6/05, Joe Abley wrote:
This is a wildly good point. Vijay Gill did an analysis of AOL's Nx10G backbone network which he presented at a NANOG a while back, and it transpired that the overwhelming majority of customer-affecting outages were due to operator error, rather than equipment or circuit failure.
We had a presentation from a high availability specialist from (unnamed vendor) who showed very similar results. I was surprised at the actual numbers. The N.A. 'operator error' percentages were VERY much higher than ours, at least for what I'd call the core. I'm not trying to blow my own trumpet here. While I'd be flattered to think that we employ the best people, we are not somehow magically *that* superior to the N.A. telcos. Upon analysis, it looks like what we do differently is that we've rigorously eliminated complexity from parts of our network, (precisely to address this issue). -- Michael Newbery IP Architect TelstraClear Limited Tel: +64-4-920 3102 Mobile: +64-29-920 3102 Fax: +64-4-920 3361
On 2005-06-20, at 19:29, Michael Newbery wrote:
I'm not trying to blow my own trumpet here. While I'd be flattered to think that we employ the best people, we are not somehow magically *that* superior to the N.A. telcos.
Perhaps the North American telcos are just better at analysis :-) Joe
On Mon, 20 Jun 2005, Joe Abley wrote:
On 2005-06-20, at 18:49, jamie.baddeley(a)vpc.co.nz wrote:
Some good points Michael, but I suspect your analysis doesn't include failure doesn't include a visit from Mr FatFingers and friend at the Operations console.
This is a wildly good point. Vijay Gill did an analysis of AOL's Nx10G backbone network which he presented at a NANOG a while back, and it transpired that the overwhelming majority of customer- affecting outages were due to operator error, rather than equipment or circuit failure.
I seem to remember somebody (I completely forget who) touting one advantage of their new network being that the core configuration was so simple and straightforward that it would (almost) never need to be changed. Thus nobody would ever login and potentially break it [1]. Possibly the fact that networking *does* involve actual physical equipment plugging into physical ports makes it harder to automate and drive everything from a database like other areas. [1] - On the other hand when you do have the problem and it takes longer to fix since people aren't used to working with the system. See the letter in this week's computerworld. -- Simon J. Lyall. | Very Busy | Mail: simon(a)darkmere.gen.nz "To stay awake all night adds a day to your life" - Stilgar | eMT.
At 10:49 AM +1200 21/6/05, jamie.baddeley(a)vpc.co.nz wrote:
Some good points Michael, but I suspect your analysis doesn't include failure doesn't include a visit from Mr FatFingers and friend at the Operations console.
Actually, it does. I don't say that I have all the answers to layer 8 problems though :)
The only way downstream can protect itself from Mr FatFingers at Upstream is through institutional diversity, and no amount of network architecture within an single management entity can avoid that. And really that's a decision based on the the premise that people do stupid stuff in different organisations at different times. And I know we all make mistakes.
Yes, but when you do a risk analysis, you need to: 1. Do the risk analysis (obvious? Well, most people don't do it). 2. Identify the risks 3. *Quantify* the risks If you have insulated yourself from Clotus Ineptus at Upstream, through institutional diversity, have you also insulated yourself from the bumbling building manager who is about to throw the wrong breaker in the basement---or your own PhatFingerz sysadmin? I'm not necessarily arguing against institutional diversity---it may well be appropriate--just that institutional diversity, in itself, is not a silver bullet. -- Michael Newbery IP Architect TelstraClear Limited Tel: +64-4-920 3102 Mobile: +64-29-920 3102 Fax: +64-4-920 3361
participants (15)
-
Andy Linton
-
Bill Walker
-
Craig Humphrey
-
Dan Clark
-
Dream Net Internet
-
Drew.Collins@apn.co.nz
-
James William Butler
-
jamie.baddeley@vpc.co.nz
-
Jeremy Brooking
-
Joe Abley
-
Juha Saarinen
-
Lin Nah
-
Michael Newbery
-
Russell Sharpe
-
Simon Lyall