As Benjamin Franklin famously said: “An ounce of prevention is worth a pound of cure,” and that’s especially true when it comes to disaster recovery.
Most companies with a decent-sized IT department will have an incident response plan, but that in itself is nowhere near enough. Such plans have to be constantly updated and tested regularly – and not just by the internal team. Operators also need to have remediation software and backups ready to roll.
“I’ll admit myself, as an IT pro early in my career, testing was just a pit of failure,” Rick Vanover, VP of product strategy at disaster recovery specialist Veeam, told The Register. “The thought was, we get all huddled up on a Saturday night, the boss would buy pizza, and more than half of the things would fail. We’ll just try to do better next year, we thought, but that’s not going to fly anymore.”
Over the last few decades, things have changed a lot, he explained. But for too many firms, practice is still a tabletop exercise discussing “what if” scenarios and seldom actually trying them out on their own networks, or virtual versions of them.
Software can be a big help in this process, he told us. In 2011, Netflix developed Chaos Monkey, a tool that would randomly shut down virtual machines and cause network problems. The streaming business has since evolved these into a suite of tools dubbed the Simian Army, including Chaos Gorilla, which mocks an entire AWS availability zone, and Chaos Kong, which simulates bringing down a whole region (which isn’t unknown).
But there are plenty of other tools out there, such as manufacturers’ own, for example AWS’s Fault Injection Service and Azure’s Chaos Studio, open source code such as Litmus for testing Kubernetes resilience, and commercial packages like Gremlin.
Admins also need scripts ready to automate remedial processes as much as possible, and should keep them up to date and ready to deploy. With the speed of attacks, doing everything manually just isn’t an option, Vanover suggested.
Jacob Dorval, senior director at Secureworks Adversary Group and a former US Air Force networks specialist, agreed, saying that too many companies skimp on training as it’s seen as taking time from the main goal of having a smoothly functioning network, but staff need to practice everything, even the physical act of disconnecting and reconnecting hardware.
“I can relate to it so much because of my military experience and what the team does,” he told us. “You practice, practice, practice, and you train, train, train. Because when that one time happens, you’re like, ‘Right? I got this. It’s no problem. We know exactly what we’re going to do’.”
Know your network
It’s also vital to know your territory, he added, and that means constantly updated network mapping. It’s this information that is not only vital for detecting problems as they occur, but fixing them too.
Networks seldom stick to the original design layout, with new hardware and software being added and older stuff getting removed. Checking the precise network topology is key to seeing what’s happening, where, and how to isolate the issue and deal with it.
“You’ve got to make sure you have visibility. Because this entire game is all about detection and response,” Dorval warned. “How quick can you detect the threat and then respond to the threat and neutralize the threat. And dwell times are getting shorter and shorter, so making sure that you have visibility is absolutely critical.”
Hiring third parties to come in and test out your network is also an increasingly popular option. A penetration testing crew will follow the attackers’ strategy of mapping out networks with minimal interference before making any moves to simulate how it works in most real-world attacks. It’s historically a technique US intel authority the National Security Agency (NSA) has used for years, as the former head of its hacking crew, Rob Joyce, explained.
“If you really want to protect your network you have to know your network, including all the devices and technology in it,” he told the Enigma security conference in 2016. “In many cases, we know networks better than the people who designed and run them.”
Dorval agreed, calling it “absolutely critical,” pointing out that a third party can sometimes find things that a network administrator might miss. But there are downsides to this approach.
These kinds of pen tests are a business, after all, and this can lead to problems if the contractor is trying too hard to impress the client with a huge list of vulnerabilities and weaknesses. The end result is that admins face a daunting task, unless the reporting is clear on what to prioritize and what can be left fallow for a while.
“Finding 1,000 problems instantly may not be the most instructional or illuminative,” warned Dave Russell, SVP and head of strategy at Veeam. “Recommendations should be limited to a set of actionable feedback. But remember, there’s no such thing as failing a disaster recovery test. You found out ahead of time that the parachute didn’t open. That is not failure.”
Recovery plans
The final and most important piece of the plan in many cases is backups, and while everyone makes them, they are still an all too common point of failure.
There are key factors in backups to take into account. The recording process needs to be continuous, but Russell told us that in some cases, when they visit a site, they will find backups that are either out of date or improperly configured. If your budget allows for it, an off-site second backup system is a wise investment too, in case of physical damage to a site.
Then they have to be tested regularly. The Register has spoken to far too many admins who discovered that when they needed their backups, they were found to be incomplete, corrupted, or even non-existent because no one had checked to see if the backup system was working as it should.
And in terms of network connections, backups increasingly need to be guarded. The more advanced cybercriminals are making targeting backups a priority – for infostealers, because it’s where all the information is, and for ransomware, because a victim is much more likely to pay up if their backups are locked down too.
Ultimately, there are always going to be incidents, human-made or otherwise, that can bring a network to its knees. But by preparing for the worst, maybe the outcome will be for the best and enable faster, more effective responses.
“You’ve been awake now for 29 hours,” Russell explained. “You’re pounding coffee, you’re trying to remember where that encryption key is, and you’re hoping you make good decisions time after time to get everything going. So that sounds very risky.” ®
0 Comments