Recovering from acts of malice, incompetence and God in banking IT systems

Recovering from acts of malice, incompetence and God in banking IT systems

In the wake of huge amounts of down time across many of the country’s key financial institutions, the Bank of England and the Financial Conduct Authority has had enough.

The joint initiative has set a 5 October deadline for banks to report their exposure to risks and response measures for outages. It has already been suggested that two days should be the ‘maximum outage time’, though the pair will make a final decision based on the outcome of the reports.

The figures already released have been damning. In mid-August, Britain’s five biggest banks (Barclays, Lloyds Banking Group, HSBC, Santander, and the Royal Bank of Scotland) disclosed that they had experienced 64 payment outages in the second quarter of 2018. The majority of these outages affected online banking (Lloyds reported 19 of these incidents, Barclays 18, RBS 16, HSBC seven and Santander four), but phone and mobile banking had also been affected. The banks didn’t say what caused the outages, but some pointed out that the incidents only affected internal systems or a limited number of customers.

A large-scale outage that occurred earlier this year at TSB reflects the massive disruptive effect that outages can have.  In April, 1.9 million customers were locked out of their accounts for up to a month, leaving the bank’s reputation crippled and the public furious. Similarly on Friday 1 June, 5.2 million transactions using Visa failed as a result of an IT collapse.

Further afield in the US, at time of writing Sun Trust – a bank with 1,400 bank branches and 2,160 ATMs across 11 south eastern states and Washington, D.C. – has seen its online and mobile banking services down. This, like in the case of TSB, was due to a software update that went awry.

Crucially in a world where the finger is often pointed at malware when any downtime is experienced, in all three situations cybercriminals were not to blame. Instead the downtime was due to IT failings. TSB’s meltdown came as a result of a botched IT upgrade, while the panic that led to abandoned purchases around Europe was caused by the failure of a single switch in one of Visa’s data centres. These incidents, along with 64 outages referred to above, illustrate that there is more for IT departments at banks to be wary of than just cybercrime.

The common line of thinking these days is that it’s not if you’ll be affected by outages, it’s when. Any organisation that thinks they’re not going to be taken down is acting with an air of naivety. Businesses should be prepared for these kinds of malfunctions to hit them – but they also need to make sure that they can do everything in their power to be as secure as possible to make it as unlikely as possible. When it comes to storage, data requirements are only ever going to grow and achieving high performance at an affordable cost while reducing risk is any operation’s key objective. To do this you need to understand your data storage needs and ensure that you use the most appropriate place to store and backup your data, whether that is for compliance or regulatory reasons or to run your business more efficiently.

One of the things that is always undervalued, not looked after properly and needs to be treated with transparency is the backup and recovery environment. A startling fact is that the majority of organisations that do not have a system fully managed with external support see almost 25 percent of their nightly backups fail. That’s a massive number, and in most cases the business will have no idea what’s in that lost or unavailable data. If, for example, that was a finance database and the nightly backup doesn’t successfully happen, you’d be forced to go back potentially 48 hours or more back, depending on when that last backup failed.

On top of hardware or software malfunctions, other environmental factors can cause downtime for banks. This is commonly referred to in the legal profession as an act of God – an instance of uncontrollable natural forces in operation. For example,in areas that experience severe weather, network outages can become routine procedure. In places like the Southern States of the USA where the summer months are dominated by hurricanes and tropical storms, large disruptions are a normal part of life and everything – from houses to banks – have to be built with this in mind. In these situations the banks must ensure their branches can perform critical functions even if the primary network connection is lost.

It is surprising just how many organisations don’t do any form of disaster recovery testing on their data. Although they might have implemented a lot of the right technology, many have never tested and found any faults in the solution. Testing is essential to managing the effectiveness of the recovery environment and ensuring that the data is available whenever and however it is needed. Without testing in a controlled and simulated environment, it is impossible for IT and security teams to fully understand their system’s integrity. It’s exactly the same reason why we’re always told to regularly test fire alarms. You don’t want to discover your fire alarm doesn’t work when you most need it, just like how you don’t want to find out your disaster recovery system is ineffective in the event of an outage.

While the UK is seldom subject to the sort of severe weather conditions that cause blackouts and network shortages, there always exists the risk of freak accidents. A burst pipe in a shared building or road workers drilling through electrical or network cabling, for example, could see a bank offline for an indeterminate period of time outside of its control. Whether it’s the external forces of nature or the knock-on effects of routine maintenance elsewhere, banks need to consider the effect that the environment can play on their operations.

This is not to say that malware and ransomware are not a factor at all – far from it. Over the past 12 months the financial services and insurance sector was attacked by ransomware more than any other industry, with the number of cyber-attacks against financial services companies in particular, rising by more than 80 percent. This vulnerability to attacks is due in part to the breadth of customer information stored, making these organisations prime targets.

If such an organisation were to be hit by ransomware, all online systems for banking and insurance transactions will need to be taken offline, rendering that organisation unable to operate. As a result, there is a 50 percent chance of employees in this industry suffering productivity loss, a 30 percent chance that the financial and insurance services will shut down temporarily, and a 20 percent chance of revenue loss and adverse effect on customer perception.

All of these factors mean that if an organisation is faced with the choice of paying a ‘ransom for data’, then most financial and insurance professionals feel forced to pay the attackers. Especially as the large amounts of data they keep is stored in a variety of disparate systems making recovery of that data difficult.

When a bank goes offline – regardless of if it’s due to environmental factors or malicious actors – operators need a way to get the system back and fast. This focus on speed of recovery is exactly the reason why organisations should adopt a zero day approach to architecture. Customers are willing to accept that any operation will have downtime, but a prolonged period of outage will drive them way. Zero day architecture allows organisations to minimise downtime and recover from backups without having to worry about lost data.

Essentially, what a zero day recovery architecture offers is a service that allows you to be able to quickly bring work code or data into operation without having to pay a ransom or without having to worry about whether or not that workload is still compromised. An evolution of the 3-2-1 backup rule (three copies of your data stored on two different media and one backup kept offsite), zero day recovery enables an IT department to partner with the cyber team and create a set of policies which define the architecture for what they want to do with data backups being stored offsite, normally in the cloud. This policy could, for example, mean that a particular workload needs to be brought back into the system within 20 minutes while another workload can wait a couple of days.

With the proposed maximum outage time potentially resulting in fines for those financial organisations that are sloppy in recovery, banks now more than ever need to invest in a solution that will minimise the amount of time and money that will be lost and give them the ability to control and prioritise workloads. Ultimately, when downtime is out of the control of bank operators, they depend on a system getting up and online as quickly as possible. Whether it’s hackers demanding a ransom or a hurricane causing flooding, the wise bank will look to an architecture and approach that it knows inside and out – and one it knows it can utilise at speed.

Tectrade protect, recover, manage, store and secure petabytes of data for organisations around the world. Our finance team are experts in ensuring continuity of operations in the event of any failure.

Please speak to one of our Finance team if you would like some support to meet your recovery time objectives.


Written by Alex Fagioli, September 2018
Published by Global Banking and Finance Review : https://www.globalbankingandfinance.com/recovering-from-acts-of-malice-incompetence-and-god-in-banking-it-systems/

20.10.2018