On Tue, 2005-06-21 at 10:32 +1200, Michael Newbery wrote:
At 5:56 PM +1200 20/6/05, Andy Linton wrote:
The point I'm trying to make is that for most of us we've no idea how reduncancy works in the Telecom and TelstraClear networks. Given that the "trust us, we've got five 9s reliability in the core" message from Telecom sounds a little hollow right now, it seems reasonable to ask the same question of TelstraClear and other major providers.
And the real point as you so rightly said is that distributed systems are the way to go. I understand about 'clumpiness' but I've never yet seen a telco who doesn't tend to excessive clumpiness and that's one of the questions I'm asking.
* How much does downtime cost you?
We'll tell you how much we'll charge you for two, three, four, five, six, seven etc 'nines' etc. At which point you decide if it's worth it to you---or not.
I don't want to get into an availability tutorial here, but consider Availability Downtime 99% 3.65 days/year 99.9% 8.76 hours/year 99.99% 52.56 minutes/year 99.999% 5.26 minutes/year 99.9999% 31.54 seconds/year
I.e., to go from 99.99% to 99.999%, just how much is 0:47:30 worth to you?
The surest way to increase availability is to duplicate infrastructure. But, since a chain is only as strong as its weakest link, unless and until you duplicate EVERY link in the chain, you don't increase availability. Thus, single points of failure, are the Great Enemy, and we avoid them whenever possible. (We have elected to triplicate, or better, infrastructure in some cases, but that costs lots of money. Sometimes we can justify it though.)
Some good points Michael, but I suspect your analysis doesn't include failure doesn't include a visit from Mr FatFingers and friend at the Operations console. The only way downstream can protect itself from Mr FatFingers at Upstream is through institutional diversity, and no amount of network architecture within an single management entity can avoid that. And really that's a decision based on the the premise that people do stupid stuff in different organisations at different times. And I know we all make mistakes.
If high availability is critical to you, we'll help you work out how to do that. And point out such pitfalls as places where our fibre runs down one side of a bridge, and Telecom's runs down the other, (so just using an alternate provider may not help you as much as you'd thought, when the river rises...), and help you work around that (if possible).
Agreed. There are going to be some immovable objects in the single point of failure shuffle. You can't eliminate them. All you can do is shift it to a lowest risk area.
I fear I could go on for some time...so I think I'll stop now.
Seems on topic to me.