BA Fiasco

A board for news and views on what's happening in the world

BA Fiasco

Postby cromwell » 28 May 2017, 21:29

We are told that BA's computer centre suffered a power failure, and that's why their entire operation has fallen to bits, with passengers stranded for hours on hours.

Question - what's happened to their backup, to their disaster recovery centre? It's always possible that your computer suite will have a total power failure, get hit by a meteorite or go under floodwater. That's why you back up data and hold it at a separate location in a fire safe, that's why you have a back up computer suite, a disaster recovery centre.

BA recently outsourced a lot of IT to India, to save money.

So, where is the DRC? Did they outsource that to India? Did they actually check that one exists in India? Or did the bin it to save money?

It's not a ridiculous question. When a guy called A.rchie N.orman took over at ASDA, he (allegedly) cut the disaster recovery centre to save money. As a short term, wing-it-we-might-get-away-with-it sort of thing, perhaps. Long term if you do it, it will come back to bite you.
"Facts do not cease to exist because they are ignored" - Aldous Huxley
cromwell
 
Posts: 9157
Joined: 26 Nov 2012, 12:46
Location: Wakefield, West Yorkshire.

Re: BA Fiasco

Postby Workingman » 28 May 2017, 22:19

cromwell wrote:It's not a ridiculous question.

It certainly isn't. In the old pen and paper days if LAX, LBA, MCR or LHR went down the whole global network was not affected... instantly.

We have managed to digitally intertwine events all over the place and make them all 'local' when in reality a delay at LAX to LGW has no impact on a flight arriving in SCEL (Santiago).
User avatar
Workingman
 
Posts: 21750
Joined: 26 Nov 2012, 15:20

Re: BA Fiasco

Postby TheOstrich » 29 May 2017, 14:16

The human cost; the thousands and thousands of folk who have had their travel plans, their holidays completely disrupted is terrible.

BA will lose both financially and reputationally over this, and deservedly so.

In the old pen and paper days if LAX, LBA, MCR or LHR went down the whole global network was not affected... instantly.


With this fiasco following on so soon after the NHS ransomware incident, we need to pause this mad march to total digitalisation, and make sure we are not building a huge global infrastructure on sand ......
One day, the Government's digital platforms will go down - where will we all be then? It's one of the reasons I'm personally keeping my personal affairs "off-line" wherever I can.
User avatar
TheOstrich
 
Posts: 7582
Joined: 29 Nov 2012, 20:18
Location: North Dorset

Re: BA Fiasco

Postby AliasAggers » 29 May 2017, 18:52

TheOstrich wrote:With this fiasco following on so soon after the NHS ransomware incident, we need to pause this mad march to total digitalisation, and make sure we are not building a huge global infrastructure on sand ......
One day, the Government's digital platforms will go down - where will we all be then? It's one of the reasons I'm personally keeping my personal affairs "off-line" wherever I can.


That, Ostrich, is one on the most apt and sensible comments I have read for a long time, and I fully agree with you.
There are no strangers here; Only friends you haven't yet met.
User avatar
AliasAggers
 
Posts: 1568
Joined: 17 Sep 2016, 12:22
Location: West Midlands

Re: BA Fiasco

Postby Suff » 30 May 2017, 09:34

I noticed, just before the financial crisis, that if you wanted to get into certain banking jobs as a contractor you had to have prior experience. These areas were the ones which collapsed.

I have noticed, recently, that contract jobs in the Airline industry now insist that you have prior airline experience. They simply won't touch you if you do not.

Given the Airline failures around the globe, that speaks a certain message to me.

As for outsourcing, you can do it well or you can do it badly. RBS did two stupid things in one go. First they deployed a software update that the developers insisted was not fully tested and did it only one week after having passed over the management of the data and the data recovery to Infosys. Of course we know the end result of that, the software failed and when the Indians screwed up the roll back, RBS wound up in an unknown position and were unable to either take payments in or make payments out... For more than a week! As they coded to fix the problem.

When I worked at Linde Gas they had outsourced to T-Systems. One Monday I logged on to find my mail server down. I found that a data storage tech had logged into one storage system on the SAN in that data centre and deleted all the drive volumes from one of the clustered mail servers. Before the backup ran. The same engineer, within 30 minutes, logged into the second storage system, in the second data center 10km away and deleted all of the drive volumes for the second mail server in the cluster. Also before the backup ran. Then, when they went to recover it, they restored with the wrong option and after mail began to arrive on the server, but not get to people's mailboxes on their PC's and laptops, they had to take the whole system offline, keep the restored mailboxes with the new mail, restore the original mailboxes properly then the Linde guys (not T-Systems), had to copy across all new mail between the first restart and the second correct one.

Back in 2003 the used the one power feed in, not a feed from both supplies. The generators kicked in to maintain the system but nobody phoned the support company to come and ensure that the fuel tanks were full and that they remained full. All systems crashed, hard, overnight.

These are just the incidences of idiocy that I've seen or had reported to me directly. It does not surprise me that BA has had this problem, it's almost certain that they let the people with the knowledge leave the company during the outsourcing then, when things go badly wrong, find out all the things they didn't have documented and didn't think to ask about during the handover, when the system failed.

The outcome of the BA failure is that everything will be properly documented. This time. And for the next 5-10 years that documentation will stand them in good stead until it happens again and they have not updated that documentation and all the new stuff is at risk again. Companies who don't do due diligence with their documentation tend to do it in spurts as problems arise, then leave it to moulder over the years.

I have seen quite a large chunk of jobs turn up in the security and identity space over the last weeks. I'm sure we'll see some airline jobs coming up fairly soon too.

Good disaster recovery is not about backing up data or making provisions to "survive" a disaster. Well that's part of DR but it is not what Good DR is about. Good DR is all about the DR testing, the backup recovery testing, the procedures, the processes, the staff knowledge and the constant little disruptions required to validate the DR scenarios and the way the system responds to it.

BA is a global 24x7 business. It has to be online 24x7x366. As such DR testing is always under pressure. Now they know whey their DR specialists were always on their case.
There are 10 types of people in the world:
Those who understand Binary and those who do not.
User avatar
Suff
 
Posts: 10785
Joined: 26 Nov 2012, 08:35

Re: BA Fiasco

Postby cromwell » 30 May 2017, 09:49

Suff wrote:Good disaster recovery is not about backing up data or making provisions to "survive" a disaster. Well that's part of DR but it is not what Good DR is about. Good DR is all about the DR testing, the backup recovery testing, the procedures, the processes, the staff knowledge.


Good points.

Anyway!

Now we know what went wrong. According to the Telegraph the problem was caused by a power surge at the Heathrow data centre which took BA systems down.

So why didn't they go to their disaster recovery option? Go to their separate site? Because it wasn't a separate site - the disaster recovery suite was also sited at Heathrow!
:lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol: :lol:

I don't know how you deal with this depth of idiocy. How do people who take decisions like that ever get their jobs in the first place?
"Facts do not cease to exist because they are ignored" - Aldous Huxley
cromwell
 
Posts: 9157
Joined: 26 Nov 2012, 12:46
Location: Wakefield, West Yorkshire.

Re: BA Fiasco

Postby Suff » 30 May 2017, 13:31

Ok so let's ask the other salient question. All data centres are massively protected against surges.

How the hell did they get a surge INSIDE the data centre???
There are 10 types of people in the world:
Those who understand Binary and those who do not.
User avatar
Suff
 
Posts: 10785
Joined: 26 Nov 2012, 08:35

Re: BA Fiasco

Postby cromwell » 30 May 2017, 18:57

That's true. I'd forgotten about the old UPS.

So maybe when they tried to go to backup systems the startup failed? They aren't coming clean on this are they?
"Facts do not cease to exist because they are ignored" - Aldous Huxley
cromwell
 
Posts: 9157
Joined: 26 Nov 2012, 12:46
Location: Wakefield, West Yorkshire.

Re: BA Fiasco

Postby AliasAggers » 30 May 2017, 20:21

All this reliance on technology is going to be the death of the human race - Mark my words.
There are no strangers here; Only friends you haven't yet met.
User avatar
AliasAggers
 
Posts: 1568
Joined: 17 Sep 2016, 12:22
Location: West Midlands

Re: BA Fiasco

Postby Suff » 31 May 2017, 09:27

AliasAggers wrote:All this reliance on technology is going to be the death of the human race - Mark my words.


More than you think Aggers. UP till the early 90's you could still get pretty much every book ever created in print and it could be found somewhere. Now things are coming out only in digital form. If we lose the ability to read digital media we are, essentially, going to take a real drop in out ability to drag ourselves out of whatever hole we find ourselves in.

Even worse, unlike physical print, things are being lost because the machines they ran on have ceased to exist and the programs they were displayed in also are vanishing. There is a project going on to virtualise and retain all of this knowledge as obsolescence is a big problem.
There are 10 types of people in the world:
Those who understand Binary and those who do not.
User avatar
Suff
 
Posts: 10785
Joined: 26 Nov 2012, 08:35

Next

Return to News and Current Affairs

Who is online

Users browsing this forum: No registered users and 26 guests