From that point on, it was decided to work through all known affected customers with issues and rollback the software on the affected switches
This is the formal reply from TCL on a email I sent them two days ago. Last Sunday morning between 0100-0600 during the regular planned maintanance window we upgraded 90% of the IP Networks' access switches to a new version of code. The main reason for this upgrade was to introduce software into the core switches that would support 'hitless upgrades' upon the next release of processor cards. This feature added resiliance to the IP Network in that we could upgrade software on the core switches and customers would not notice any outage. During the course of Sunday day/night there were a number of customers who appeared to have unrelated issues and engineers worked through the evening to resolve these. On Monday morning, when most customers started work, other network faults were reported. Along with some unrelated issues, it also appeared that there were issues with regard to the software upgrade on Sunday morning. About 12% of IP customers were affected with software issues related to the upgrade. After further diagnosis, it was decided to roll back to the previous, known good, software version of code on the core access switches in Auckland, Wellington and Christchurch. This was achieved by around 11am Monday morning and appeared to help restore service to affected customers. that they are connected to. This work carried on through the day on a case by case basis. This process will continue, as required, and engineers will work with any further affected customer. While the above is occuring, engineers are working with suppliers to resolve the issues and then a plan can be devised to complete the upgrade. We regret the impact this has had on customers. Significant steps were taken prior to the software change to avoid this happening. It is important to note that this version of GA (or General availability) code had gone through extensive testing in the TelstraClear IP Lab and had been running successfully online in parts of the Network for several weeks. Russell Prince TelstraClear Business Consultant Drew Collins Group Communications Manager Group IT Services NZ APN Holdings NZ Ltd My DDI: +64 9 373 9573 My Mobile +64 21 823268 My Fax: +64 9 373 6411 Ph: +64 9 379 5050 My eMail: drew_collins(a)apn.co.nz Website: www.apn.co.nz *************************************************************************** This eMail may contain privileged and confidential information intended only for the use of the intended recipient. If you are not the intended recipient of this message, any use, dissemination, distribution or reproduction of this message is prohibited. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of APN News & Media NZ Ltd. For more information on APN News & Media NZ Ltd please visit our web site at http://www.apn.co.nz ***************************************************************************
wow it would have been nice to have been told this during the outage so at least we could have known we where 'in line' for the switch we are attached to to be rolled back/whatever. oh well. - appears to be problem free today - roll on tomorrow... On Tue, 2004-07-06 at 16:08, Drew_Collins(a)apn.co.nz wrote:
This is the formal reply from TCL on a email I sent them two days ago.
Last Sunday morning between 0100-0600 during the regular planned maintanance window we upgraded 90% of the IP Networks' access switches to a new version of code.
The main reason for this upgrade was to introduce software into the core switches that would support 'hitless upgrades' upon the next release of processor cards. This feature added resiliance to the IP Network in that we could upgrade software on the core switches and customers would not notice any outage.
During the course of Sunday day/night there were a number of customers who appeared to have unrelated issues and engineers worked through the evening to resolve these.
On Monday morning, when most customers started work, other network faults were reported. Along with some unrelated issues, it also appeared that there were issues with regard to the software upgrade on Sunday morning. About 12% of IP customers were affected with software issues related to the upgrade.
From that point on, it was decided to work through all known affected customers with issues and rollback the software on the affected switches
After further diagnosis, it was decided to roll back to the previous, known good, software version of code on the core access switches in Auckland, Wellington and Christchurch. This was achieved by around 11am Monday morning and appeared to help restore service to affected customers. that they are connected to. This work carried on through the day on a case by case basis.
This process will continue, as required, and engineers will work with any further affected customer.
While the above is occuring, engineers are working with suppliers to resolve the issues and then a plan can be devised to complete the upgrade.
We regret the impact this has had on customers. Significant steps were taken prior to the software change to avoid this happening. It is important to note that this version of GA (or General availability) code had gone through extensive testing in the TelstraClear IP Lab and had been running successfully online in parts of the Network for several weeks.
Russell Prince TelstraClear Business Consultant
Drew Collins Group Communications Manager Group IT Services NZ APN Holdings NZ Ltd
My DDI: +64 9 373 9573 My Mobile +64 21 823268 My Fax: +64 9 373 6411 Ph: +64 9 379 5050 My eMail: drew_collins(a)apn.co.nz Website: www.apn.co.nz
*************************************************************************** This eMail may contain privileged and confidential information intended only for the use of the intended recipient. If you are not the intended recipient of this message, any use, dissemination, distribution or reproduction of this message is prohibited. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of APN News & Media NZ Ltd. For more information on APN News & Media NZ Ltd please visit our web site at http://www.apn.co.nz ***************************************************************************
______________________________________________________________________
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
--
Gavin Legge
We are still having issues today with the TCL cache server, is anyone
other then us have issues?
Drew Collins
Group Communications Manager
Group IT Services NZ
APN Holdings NZ Ltd
My DDI: +64 9 373 9573
My Mobile +64 21 823268
My Fax: +64 9 373 6411
Ph: +64 9 379 5050
My eMail: drew_collins(a)apn.co.nz
Website: www.apn.co.nz
Gavin Legge
This is the formal reply from TCL on a email I sent them two days ago.
Last Sunday morning between 0100-0600 during the regular planned maintanance window we upgraded 90% of the IP Networks' access switches to a new version of code.
The main reason for this upgrade was to introduce software into the core switches that would support 'hitless upgrades' upon the next release of processor cards. This feature added resiliance to the IP Network in that we could upgrade software on the core switches and customers would not notice any outage.
During the course of Sunday day/night there were a number of customers who appeared to have unrelated issues and engineers worked through the evening to resolve these.
On Monday morning, when most customers started work, other network faults were reported. Along with some unrelated issues, it also appeared that there were issues with regard to the software upgrade on Sunday morning. About 12% of IP customers were affected with software issues related to the upgrade.
From that point on, it was decided to work through all known affected customers with issues and rollback the software on the affected switches
After further diagnosis, it was decided to roll back to the previous, known good, software version of code on the core access switches in Auckland, Wellington and Christchurch. This was achieved by around 11am Monday morning and appeared to help restore service to affected customers. that they are connected to. This work carried on through the day on a case by case basis.
This process will continue, as required, and engineers will work with any further affected customer.
While the above is occuring, engineers are working with suppliers to resolve the issues and then a plan can be devised to complete the upgrade.
We regret the impact this has had on customers. Significant steps were taken prior to the software change to avoid this happening. It is important to note that this version of GA (or General availability) code had gone through extensive testing in the TelstraClear IP Lab and had been running successfully online in parts of the Network for several weeks.
Russell Prince TelstraClear Business Consultant
Drew Collins Group Communications Manager Group IT Services NZ APN Holdings NZ Ltd
My DDI: +64 9 373 9573 My Mobile +64 21 823268 My Fax: +64 9 373 6411 Ph: +64 9 379 5050 My eMail: drew_collins(a)apn.co.nz Website: www.apn.co.nz
***************************************************************************
This eMail may contain privileged and confidential information intended only for the use of the intended recipient. If you are not the intended recipient of this message, any use, dissemination, distribution or reproduction of this message is prohibited. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of APN News & Media NZ Ltd. For more information on APN News & Media NZ Ltd please visit our web site at http://www.apn.co.nz
***************************************************************************
______________________________________________________________________
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
--
Gavin Legge
Gavin Legge wrote:
wow it would have been nice to have been told this during the outage so at least we could have known we where 'in line' for the switch we are attached to to be rolled back/whatever.
oh well. - appears to be problem free today - roll on tomorrow...
I'm sure there are quite a few TC technicians on this list, why do none of them announce it on here ? This is, IMHO, one of the main features that could be utilised by the mailling list. Wherebouts there are a huge amount of people on this list which would be directly affected and if not, they would definitely have a trail of contacts that would let the non-aware of this list, aware. A little effort, goes a long way. (and it saves me ringing up Gavin wondering why our bitchy counter-strike players are getting good pings :P) My two cents. - Drew
Jeremy Brooking wrote:
Drew Broadley wrote:
I'm sure there are quite a few TC technicians on this list, why do none of them announce it on here ?
Because they value their jobs?
Company policy I dare say. Same in, most places.
Value their jobs ? It's company policy to NOT advertise major outtages ? I'm more referring to current outtages, if anything. LET PEOPLE KNOW. This will save so much wasted time of middle men not knowing what is going on, while in the meantime the end users who don't know closer lines of communication then the places 0800 helpdesk monkeys wondering what the heck is going on. Policy or not, it's just plain stupid not to let someone know. (This does not include a summary AFTER the outtages which seems to have been put out) - Drew
Here's your sign..
It's company policy, not to tell as many people as possible about major
outages.. It looks bad on said telco if they cannot maintain uptime..
Helpdesk member X, would get fired for going "oh man at work, our network is
so shit, yeah yet another outage today" on a public mailing list..
----- Original Message -----
From: "Drew Broadley"
Jeremy Brooking wrote:
Drew Broadley wrote:
I'm sure there are quite a few TC technicians on this list, why do none of them announce it on here ?
Because they value their jobs?
Company policy I dare say. Same in, most places.
Value their jobs ? It's company policy to NOT advertise major outtages ? I'm more referring to current outtages, if anything. LET PEOPLE KNOW.
This will save so much wasted time of middle men not knowing what is going on, while in the meantime the end users who don't know closer lines of communication then the places 0800 helpdesk monkeys wondering what the heck is going on.
Policy or not, it's just plain stupid not to let someone know. (This does not include a summary AFTER the outtages which seems to have been put out)
- Drew
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
Craig Spiers wrote:
Here's your sign..
It's company policy, not to tell as many people as possible about major outages.. It looks bad on said telco if they cannot maintain uptime..
Helpdesk member X, would get fired for going "oh man at work, our network is so shit, yeah yet another outage today" on a public mailing list..
Sorry, technicians was a bad word. I should have stated Head Technicians and/or Management. My main point was, this is a good medium for announcement. I have just come from places where our ISP's (even if they lease bandwidth off major telcos) have generally had good communication, even to say "Somethings wrong with national traffic, we are looking into it". What I am trying to say is, that it's all about peace of mind and whether or not to put time into investigating my end. *pitches up his new sign* - Drew
On Tue, 6 Jul 2004, Drew Broadley wrote:
My main point was, this is a good medium for announcement. I have just come from places where our ISP's (even if they lease bandwidth off major telcos) have generally had good communication, even to say "Somethings wrong with national traffic, we are looking into it". What I am trying to say is, that it's all about peace of mind and whether or not to put time into investigating my end.
In which case you should discuss this with your account manager and ask him things like who exactly at the company is allowed to post to nznog especially out problems the company is seeing. Right now the Companies just see the downside of bad publicity from admitting to problems (especially when a techie rather than a PR guy is doing to admitting) publicly. If you show (especially with your wallet) that there is an upside in that companies that are open about their problems and work with the community get more business then the account managers will push it to happen. -- Simon J. Lyall. | Very Busy | Mail: simon(a)darkmere.gen.nz "To stay awake all night adds a day to your life" - Stilgar | eMT.
Simon Lyall wrote:
If you show (especially with your wallet) that there is an upside in that companies that are open about their problems and work with the community get more business then the account managers will push it to happen.
This is the lesson that Intel learned quite a few years ago -- that keeping quiet and pretending it didn't happen doesn't make bad things go away. -- Juha
Drew Broadley wrote:
Value their jobs ? It's company policy to NOT advertise major outtages ? I'm more referring to current outtages, if anything. LET PEOPLE KNOW.
Im not disagreeing with you... You just asked a question, I gave you a plausable answer. Yes a lot TC techs are probably on the list. Do they have permission to speak on behalf of TC? I dare say not. Add in the fact that the messengers are normally the one shot... And the fact NZNOG is not the appropriate medium to use... You cant really blame them for not commenting here, can you.
Jeremy Brooking
And the fact NZNOG is not the appropriate medium to use...
Personally I would have thought that nznog is an appropriate forum to use to announce major outages - not that any of ours are important enough to mention here. Am I wrong about that? -- James Riden / j.riden(a)massey.ac.nz / Systems Security Engineer Information Technology Services, Massey University, NZ. GPG public key available at: http://www.massey.ac.nz/~jriden/
On Tue, 6 Jul 2004, James Riden wrote:
Personally I would have thought that nznog is an appropriate forum to use to announce major outages - not that any of ours are important enough to mention here. Am I wrong about that?
Major outages that affect non-customers could are probably okay, on the other hand outages that affect just customers should probably be announced via normal channels. Obviously there is some crossover since (for example) if 10% of Telstra customers were offline then people other other ISPs would have had problems talking to them and thus would have been affected. In general if people have problems with you not being properly informed about outages then they should discuss them with their account manager. Obviously reliability of a service and communication during problems should be part of your buying decision. -- Simon J. Lyall | Very Busy | Web: http://www.darkmere.gen.nz/ "To stay awake all night adds a day to your life" - Stilgar | eMT.
James Riden wrote:
Personally I would have thought that nznog is an appropriate forum to use to announce major outages - not that any of ours are important enough to mention here. Am I wrong about that?
IMHO, yeah. A public forum, with no knowledge of subscribed members, to announce issues to your customers? Even tattooing the message on homing pigeons and sending them off seems more appropriate to me. If it gets to the point you need NZNOG to recieve network outage notifications from your provider... Its time to ask for a new account manager.
Jeremy Brooking
James Riden wrote:
Personally I would have thought that nznog is an appropriate forum to use to announce major outages - not that any of ours are important enough to mention here. Am I wrong about that?
IMHO, yeah.
A public forum, with no knowledge of subscribed members, to announce issues to your customers?
I'm not writing from the point of view of a Telstra customer. When the bigger ISPs have issues, we get tickets raised about the problem and then have to go digging to see whose problem it is. I agree that problems which only affect a few non-customers have no need to be announced here and should go through the usual channels. cheers, Jamie
I agree that most TC would not comment (and I don't blame them), but you
would hope / wish TCL management would use all forms of communication (i.e
this forum) to help remove calls to the TCL help desk.
Drew Collins
Group Communications Manager
Group IT Services NZ
APN Holdings NZ Ltd
My DDI: +64 9 373 9573
My Mobile +64 21 823268
My Fax: +64 9 373 6411
Ph: +64 9 379 5050
My eMail: drew_collins(a)apn.co.nz
Website: www.apn.co.nz
Jeremy Brooking
Drew Broadley wrote:
I'm sure there are quite a few TC technicians on this list, why do none of them announce it on here ?
Because they value their jobs?
Company policy I dare say. Same in, most places.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
*************************************************************************** This eMail may contain privileged and confidential information intended only for the use of the intended recipient. If you are not the intended recipient of this message, any use, dissemination, distribution or reproduction of this message is prohibited. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of APN News & Media NZ Ltd. For more information on APN News & Media NZ Ltd please visit our web site at http://www.apn.co.nz ***************************************************************************
Jeremy Brooking wrote:
Drew Broadley wrote:
I'm sure there are quite a few TC technicians on this list, why do none of them announce it on here ?
Because they value their jobs?
Company policy I dare say. Same in, most places.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
I dare say that all of the people at TCL value their jobs, its more likely they were too busy rolling back upgrades, talking to angry customers or completely oblivious to what was going on.. Also, if you dont know what the exact problem is, what are you to say? In regards to fear of loosing jobs over commenting about a problem publicly before a "management announcement" has been made, if TCL outages like that continue i'm sure some people at TCL will start to speak out more publicly with out such fear. From what i've gathered most of the company was unaware of the severity of the problem or that it affected that many customers, just ask the business customer services team as i did today.They were pretty NFI about the whole thing apart from their IVR was not working correctly for most of yesterday. At the end of it all i still can't believe TCL did not believe the faults on sunday were related to the upgrade done on sunday morning. I also think the mathematician that worked out that "around 12% of TCL's customers were affected" has a broken calculator. I'm pretty certain the last time i checked all of TCL's clear.net and paradise.net traffic goes over that core network. All customers with internet faults would have been affected. <sarcasm>Good work TelstraClear!</sarcasm> --peter.
While I agree there was a bit of a botch up with TCL and firmware rollouts, I hear that the firmware was vendor approved, and was tested in a lab, so there is no way they could have known that this would happen. <sarcasm> It's good to see people like you peter, who could be able to do a better job of running a network of that size.. </sarcasm>
<sarcasm>Good work TelstraClear!</sarcasm>
--peter.
Yeah but they woudl be scared to post, i've been stood down from work because of my posting On Tue, July 6, 2004 4:37 pm, Drew Broadley said:
Gavin Legge wrote:
wow it would have been nice to have been told this during the outage so at least we could have known we where 'in line' for the switch we are attached to to be rolled back/whatever.
oh well. - appears to be problem free today - roll on tomorrow...
I'm sure there are quite a few TC technicians on this list, why do none of them announce it on here ?
This is, IMHO, one of the main features that could be utilised by the mailling list. Wherebouts there are a huge amount of people on this list which would be directly affected and if not, they would definitely have a trail of contacts that would let the non-aware of this list, aware.
A little effort, goes a long way. (and it saves me ringing up Gavin wondering why our bitchy counter-strike players are getting good pings :P)
My two cents.
- Drew
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
In Australia Telstra put all outage notices thru their legal department before they appear on their service status pages... http://www.bigpond.com/help/servicestatus/ Guess this was a result of spending to much time in the courts (which does seem to be a pass time for Telstra, who employ the largest corporate legal team in Australia AIUI) For the more technically minded this one is also useful... http://tcruskit.telstra.net/cgi-bin/dispout.pl Reading Russell's email reminded me of the good old days on Telstra's ADSL network back in 2001. I'm not supprised they act like this given that they've paid out tens of millions in compensation and rebates to users in the past (including those not on SLA agreements). Anyone know what equipment Telstra are using for 'access switches' here? Can't be worse than the Nortell equipment they used in Australia. Cheers Don On Tue, 2004-07-06 at 16:08, Drew_Collins(a)apn.co.nz wrote:
This is the formal reply from TCL on a email I sent them two days ago.
Last Sunday morning between 0100-0600 during the regular planned maintanance window we upgraded 90% of the IP Networks' access switches to a new version of code.
The main reason for this upgrade was to introduce software into the core switches that would support 'hitless upgrades' upon the next release of processor cards. This feature added resiliance to the IP Network in that we could upgrade software on the core switches and customers would not notice any outage.
During the course of Sunday day/night there were a number of customers who appeared to have unrelated issues and engineers worked through the evening to resolve these.
On Monday morning, when most customers started work, other network faults were reported. Along with some unrelated issues, it also appeared that there were issues with regard to the software upgrade on Sunday morning. About 12% of IP customers were affected with software issues related to the upgrade.
From that point on, it was decided to work through all known affected customers with issues and rollback the software on the affected switches
After further diagnosis, it was decided to roll back to the previous, known good, software version of code on the core access switches in Auckland, Wellington and Christchurch. This was achieved by around 11am Monday morning and appeared to help restore service to affected customers. that they are connected to. This work carried on through the day on a case by case basis.
This process will continue, as required, and engineers will work with any further affected customer.
While the above is occuring, engineers are working with suppliers to resolve the issues and then a plan can be devised to complete the upgrade.
We regret the impact this has had on customers. Significant steps were taken prior to the software change to avoid this happening. It is important to note that this version of GA (or General availability) code had gone through extensive testing in the TelstraClear IP Lab and had been running successfully online in parts of the Network for several weeks.
Russell Prince TelstraClear Business Consultant
Drew Collins Group Communications Manager Group IT Services NZ APN Holdings NZ Ltd
My DDI: +64 9 373 9573 My Mobile +64 21 823268 My Fax: +64 9 373 6411 Ph: +64 9 379 5050 My eMail: drew_collins(a)apn.co.nz Website: www.apn.co.nz
*************************************************************************** This eMail may contain privileged and confidential information intended only for the use of the intended recipient. If you are not the intended recipient of this message, any use, dissemination, distribution or reproduction of this message is prohibited. Any views expressed in this message are those of the individual sender and may not necessarily reflect the views of APN News & Media NZ Ltd. For more information on APN News & Media NZ Ltd please visit our web site at http://www.apn.co.nz ***************************************************************************
______________________________________________________________________ _______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
participants (11)
-
Craig Spiers
-
Don Gould
-
Drew Broadley
-
Drew_Collins@apn.co.nz
-
Gavin Legge
-
James Riden
-
Jeremy Brooking
-
Juha Saarinen
-
Peter Garmaz
-
Simon Lyall
-
Tristram Cheer