*scratches his head* Core switch....lost power....don't people usually UPS things like that?
If it is on UPS, then most likely someone bumped the cable and suddenly things stop working...an hour after the first fault?

Someone's NOC team need to find the software to monitor when their equipment goes down *cough*nagios-for-example*cough*

Philip Seccombe




-----Original Message-----
From: Chris Hodgetts [mailto:chris@archnetnz.com]
Sent: Wed 7/2/2008 9:38 AM
To: Bill Walker
Cc: NZNOG
Subject: Re: [nznog] Vector, did you try turning it off and then on   again

(Hope I am not breaking any confidential things here, but I saw nothing
that said it was on the paper work I saw flash past my desk)

--


At 12:33PM on Sunday 29thJune 2008 a Cisco core Ethernet switch, located
at our Hobson Street node, lost power and rebooted. Upon rebooting the
switch lost it's configuration and raised an alarm.


This alarm was mistakingly interpreted as being a repeat of an earlier
SDH alarm (initially received at 11.27 AM) and as a result no action was
taken by our NOC until approximately 5:30 PM.


Customer calls received by our customer call centre were also
incorrectly assumed to be related to the above SDH alarms (based on
advice from our NOC) and customers were incorrectly advised accordingly.


At approximately 5:30PM our NOC became aware of the true nature of the
outage. At 5:46 PM an engineer was dispatched to reload the
configuration on the Cisco switch. This was completed and all services
were restored by 9:26 PM.

--

I do not like how Telco's think they know better than their customers...

One would assume (and thanks to the on-call tech at Citylink I spoke to
on Sunday, and also as indicated by him, many others did as well, who
ACTUALLY checked out to see if the APE was alive, because
non/customer(s) queried about the APE's status, as it was connectivity
to the APE that was impaired)

Vector was arrogant, one would assume when customers start to call the
call centre and say, hey there is stuff broken, even though there was an
alarm raised previous, SOMEONE would have gone... OMG we suddenly have a
large increase in calls here with the same issue..... perhaps something
is wrong..... but no... Telco knows best .



On Wed, 2008-07-02 at 07:13 +1200, Bill Walker wrote:
> Thats the best reply so far.
>
> We have just proposed triple redundancy to a customer. Fibre wireless and satellite the ongoing costs aren't too bad it's  the setup costs that are a killer. 10 mbp satellite gear isn't cheap. If you've got the money it's the way to go.
>
> ....
> Bill Walker
> Sent from my phone.
>
> -----Original Message-----
> From: "John Russell" <jsr@jsr.com>
> To: "NZNOG" <nznog@list.waikato.ac.nz>
> Sent: 1/07/2008 11:41 p.m.
> Subject: Re: [nznog] Vector, did you try turning it off and then on   again
>
>
> On [Various Times], [Many People Wrote] wrote:
>
>  > [Words]
>
> A 9 hour outage seems long, but it's hardly absurd. Though we haven't
> seen a cause report from Vector yet, if it was spade fade, 9 hours is
> pretty good.
>
> I wasn't on call when it broke, but my colleague who was tells me he
> was told "Hardware Failure" was the problem. If something like a  6509
> plane died, I can see how you might hit 9 hrs:
>
> The thing faults, alarms go off, customers call, the on-duty tech at
> Vector has to do some basic diagnosis, wave his hands in the air and
> run around like a monkey for a bit, call the on-call call engineer.
> Then that engineer has to wake up and/or finish his beer and get home
> from the pub, then do some more diagnostics, see the dead module and
> then do that thing where you put your hands flat on your forehead and
> drag them down your face while going "Gahhhhh".  Then he gets to go
> pull a spare from stores, or call Cisco for a part and then THEY get
> to kick off THEIR internal process to get the thing to you. Then
> there's truck time to the site, waiting for an elevator to L48 of the
> sky tower that isn't already full of tourists in orange jumpsuits
> ready to jump off and/or carts full of crab canapes, swaping out the
> dead unit, sanity checking the restored services and making config
> changes if required, etc. And all that is just if everything does go
> to plan, and you don't find out that your spare hardware is in the
> lab, and the lab is locked and the guy who has the key has gone
> fishing, or that Cisco have already given their only spare WS-X6516 to
> someone else, and so on and so on.
>
> So, anyway, thing is, 9hrs, sure.
>
> However it really shouldn't matter that much. The ISP network I am
> currently fussing over, for example, has vector connectivity to the
> Sky Tower, which carries APE and some other stuff. This all broke when
> Vector went down. Domestic traffic, however, simply switched over to
> other peering links, as it should because it is, you know, The
> Internet.  Some Vector-only stuff broke, of course, but core services
> just failed over and carried on.
>
> If your network connection is absolutely critical for your business,
> and it's wholly dependent on one vendor, you should perhaps rethink
> your approach. Talk to your ISP, explain that you need redundancy in
> your connection. Chuck in something like a DSL connection beside that
> Vector link. Ask your ISP about sourcing you a router than can connect
> to both, and setting up BGP or MPLS-based failover to your secondary
> link in the event that your Vector link fails. Now your connection is
> vendor independent (if not ISP-independent - if they fail, you fail.
> Try to pick an ISP that Does Not Fail Much). If you can't get DSL, ask
> your ISP about Wireless or IPStar or something similar. If you can
> fail from optical to satellite, that's pretty good diversity. And it's
> not complicated to do. You may not get the same performance, of
> course, but you'll have _a_ connection, which is better than _no_
> connection, especially if it's only for a short time.
>
> This setup won't, of course, solve the problem of a local power cut
> killing your Vector link. I'm not even certain why we're discussing
> that. Talk to Vector and your electrician to get a cable run from your
> generator and/or UPS-backed distribution board to wherever your
> building vector switch is. Plug the switch into it. And you're done.
> It's a one-off cost, and likely not a large one. Even without a UPS,
> the worst that can happen is that the switch powers down when the cut
> takes place, then boots back up when your generator starts. A few
> minutes, tops. If you _don't have_ a generator, and your building
> power is out, I guess you'll be sitting in the dark, looking at your
> blank screen, and won't care if your internet connection is down. This
> is a perfect opportunity to go to the pub, and have a drink with the
> Vector engineer. Assuming he's awake.
>
> JSR
> --
> John S Russell
> Big Geek. Doing Geek Stuff.
>
> _______________________________________________
> NZNOG mailing list
> NZNOG@list.waikato.ac.nz
> http://list.waikato.ac.nz/mailman/listinfo/nznog
>
> No virus found in this incoming message.
> Checked by AVG.
> Version: 8.0.101 / Virus Database: 270.4.3/1526 - Release Date: 30/06/2008 8:43 a.m.
> _______________________________________________
> NZNOG mailing list
> NZNOG@list.waikato.ac.nz
> http://list.waikato.ac.nz/mailman/listinfo/nznog

_______________________________________________
NZNOG mailing list
NZNOG@list.waikato.ac.nz
http://list.waikato.ac.nz/mailman/listinfo/nznog

--
This message was scanned by Turnstone Spam Filter and is believed to be
clean.
Click here to report this message as spam.
http://spamfilter.turnstone.co.nz/cgi-bin/learn-msg.cgi?id=2FBEA27F14.CC895