Boring’s Not So Boring
Your Business’ Disaster Recovery Plan & Mistakes to Avoid
It is a scenario that every IT team fears. You diligently back up critical servers to your on-site appliance or to the cloud, but when an incident happens and you need it the most, the backup recovery fails.
Let’s take a look at why disaster recovery fails and how you can avoid the factors that lead to this failure:
1. Failure to Identify and Understand Recovery Dependencies
Disaster recovery plans often include backup and data retention strategies that do not thoroughly map the dependencies and requirements needed for smooth disaster recovery. Failing to align backup plans with specific restore expectations can have devastating consequences.
Restoring Server Operations Only the Beginning
For business operations running on multi-tier or N-tier applications, simply restoring server operation is not enough. In these environments, processing, data management, and presentation functions may be hosted on different machines that all need to communicate with one another perfectly. If you back up these components on different schedules or restore them with the wrong boot order or to a host with a different virtual network, the communication between them may be lost. As a result, disaster recovery will fail, data may be lost, and you may waste many hours troubleshooting the issues.
Configuration issues in the production environment can jeopardize your disaster recovery. For example, when configuring virtual server environments as backup targets, you need to allocate space for snapshots so they can be properly executed and saved.
Tips for Identifying Critical Dependencies and Ensuring Successful Recoveries:
- Brainstorm on a variety of downtime scenarios and walk through specific steps you will need to follow to restore service to end users. Examine each step in the process for potential dependencies or barriers to disaster recovery.
- Document critical dependencies – boot orders, application requirements, etc. and incorporate them in the recovery steps.
2. Understanding and Avoiding Software Compatibility Issues
There are a wide range of software compatibility issues that can render data unrecoverable. Microsoft Volume Shadow Copy Service (VSS) is a common source of compatibility issues. However, new backup and cloud disaster recovery technologies are integrating advanced self-healing software to solve software compatibility. This technology automatically detects VSS compatibility issues, misconfigurations and a wide range of threats to recoverability. The software mediates VSS conflicts, restarts backups and performs a variety of other steps to remediate backup issues before they threaten recovery, without requiring any intervention by your IT team.
3. Inadequate Testing
IT teams continue to struggle to find the time and resources to perform disaster recovery testing frequently enough to ensure recoveries will happen as planned. However, testing backups is essential for smooth recovery. A solid disaster recovery plan should avoid shortcuts such as testing only annually (or not at all), pre-loading tapes in tape libraries, pre-staging servers and substituting spot-checking for full testing of restores.
Tips for Improved Testing:
- Implement advanced backup and cloud recovery solutions that will automatically conduct complete recovery testing of your backup environment.
- Invest in backup and recovery solutions that will automatically test recoverability of applications and document actual disaster recovery time and recovery point.
4. Failure to Protect Against Data Corruption and Malware
There are myriad causes of corruption of backup data that can cause recoveries to fail – from solar flare bit flipping to unexpected power outages to XFS and filesystem issues to various hardware failures (issues with RAID controllers, storage controllers, file system corruption, NAS failures, etc.).
Despite the growing frequency of headline-grabbing incidents, failing to detect malware in backup environments continues to be among the most common issues causing disaster recovery failures. Ransomware creators have become increasingly sophisticated – creating programs that lie dormant long enough to be included in data backups, eliminating the ability to defend against attacks with a simple recovery of the latest data.
Tips for Avoiding Data Corruption and Malware Infection in Your Backups:
- Choose a backup and recovery technology that is Linux-based. Most malware infections target Windows-based systems.
- Ensure your backup and recovery technology can detect early warning signs of malware infection.
- Use a backup and recovery solution that automatically spins up and tests recoverability of applications in your backup environment to ensure you can safely recover with uninfected backups in the event of a ransomware (or other malware) attack.
- Ensure your backup solution has integrated cyclic redundancy checking (CRC). CRC is an error-detecting code used to detect accidental changes to blocks of data entering your backup system to ensure there is no data corruption.
5. Failure to Follow Media Management Best Practices
One of the most common reasons why a seemingly perfect backup cannot be recovered is due to the mishandling of backup or archive media – tapes, removable hard drives, etc. While tape and removable disk media are relatively low-tech, they are highly manual and require disciplined adherence to best practices. Simple human errors, such as mislabeling tape or archive drives, can make recovery from these media impossible.
For efficient data backup, you can automate the entire process of managing backup, off-site replication and long-term retention. You could also choose to use disaster recovery as a service (DRaaS).
Today’s leading appliances also come with self-healing hardware and remote monitoring that detect early warning signs of hardware issues and automatically schedule servicing before an actual failure occurs.
Today’s IT infrastructures are increasingly complex combinations of on-premises, SaaS, cloud and virtual environments. For efficient backup and recovery, consider a backup solution that integrates with your endpoint management solution. Monitor and manage all of your endpoints, and manage AV/AM deployments and backups, all from a single console.
Interested in learning more about our Disaster Recovery & Back-up Solutions? Give us a call or shoot us a message here: https://boring.com/contact/
Source: Kaseya Blog