Re: [nznog] Outage update from Telecom

20 Jun 2005

      On Tue, 2005-06-21 at 10:32 +1200, Michael Newbery wrote:
...
At 5:56 PM +1200 20/6/05, Andy Linton wrote:
...
The point I'm trying to make is that for most of us we've no idea how
reduncancy works in the Telecom and TelstraClear networks. Given that the
"trust us, we've got five 9s reliability in the core" message from Telecom
sounds a little hollow right now, it seems reasonable to ask the same question
of TelstraClear and other major providers.
And the real point as you so rightly said is that distributed systems are the
way to go. I understand about 'clumpiness' but I've never yet seen a telco who
doesn't tend to excessive clumpiness and that's one of the questions 
I'm asking.
...
* How much does downtime cost you?
We'll tell you how much we'll charge you for two, three, four, five, 
six, seven etc 'nines' etc. At which point you decide if it's worth 
it to you---or not.
I don't want to get into an availability tutorial here, but consider
Availability	Downtime
99%		3.65	days/year
99.9%		8.76	hours/year
99.99%		52.56	minutes/year
99.999%		5.26	minutes/year
99.9999%	31.54	seconds/year
I.e., to go from 99.99% to 99.999%, just how much is 0:47:30 worth to you?
The surest way to increase availability is to duplicate 
infrastructure. But, since a chain is only as strong as its weakest 
link, unless and until you duplicate EVERY link in the chain, you 
don't increase availability.
Thus, single points of failure, are the Great Enemy, and we avoid 
them whenever possible. (We have elected to triplicate, or better, 
infrastructure in some cases, but that costs lots of money. Sometimes 
we can justify it though.)
Some good points Michael, but I suspect your analysis doesn't include
failure doesn't include a visit from Mr FatFingers and friend at the
Operations console.

The only way downstream can protect itself from Mr FatFingers at
Upstream is through institutional diversity, and no amount of network
architecture within an single management entity can avoid that. And
really that's a decision based on the the premise that people do stupid
stuff in different organisations at different times. And I know we all
make mistakes.
...
If high availability is critical to you, we'll help you work out how 
to do that. And point out such pitfalls as places where our fibre 
runs down one side of a bridge, and Telecom's runs down the other, 
(so just using an alternate provider may not help you as much as 
you'd thought, when the river rises...), and help you work around 
that (if possible).
Agreed. There are going to be some immovable objects in the single point
of failure shuffle. You can't eliminate them. All you can do is shift it
to a lowest risk area.
...
I fear I could go on for some time...so I think I'll stop now.
Seems on topic to me.

Re: [nznog] Outage update from Telecom

jamie.baddeley＠vpc.co.nz