Disaster Recovery

Everybody who has used a computer knows that occasionally things go wrong. Cybersecurity professionals must plan for the worst-case computing scenarios. Together, these practices are known as disaster recovery.

DR

Learning Objectives

You should be able to:

  • Discuss recovery point objectives and recovery time objectives
  • Discuss hot sites, cold sites, and warm sites
  • Identify the strengths and weaknesses of different backup options

Disaster Recovery Objectives and Options

Computer systems fail. Hard drives crash. Data is accidentally deleted. Power surges fry equipment. There are a lot of things that can go wrong.

Backups should be made in such a way that systems can be restored. Critical to the backup process is testing the backups to make sure that data can truly be restored. There is an old adage, "If you have not tested your backup, you do not have a backup."

Organizations must answer the question--how much data are we willing to lose? The less data an organization is willing to lose, the more expensive the backup solution will be. A recovery point objective (RPO) describes how much data an organization is willing to lose. The recovery point objective could be 1 second before a disaster, a minute, a day, a week, etc. If an organization takes weekly backups, that means that they are willing to risk losing a week's worth of data. Some organizations take nightly backups, in which case only a day's data might be lost. For critical systems, no loss of data is acceptable, so highly redundant systems store backups of data right up to the point of failure.

It can take time to rebuild systems after a disaster. The time it takes to restore systems is the recovery time objective (RTO). Complex systems could take weeks or months to restore if starting from scratch. If system administrators cannot meet the RTO, they can consider options for speeding up the process. They might use hot sites, cold sites, and warm sites to hit the RTO.

RPO RTO

  • Hot Site. Hot sites are secondary data centers that have up-to-date data. If the primary system fails, the hot site can quickly take over data processing. Users should notice little to no interruption in service.
  • Cold Site. A cold site is an empty data center. Servers and network equipment must be installed after a disaster takes place. Backups would have to be loaded onto the computer systems. Cold sites are the cheapest options, but they take the longest to set up.
  • Warm Site. A warm site sits between the hot and cold sites. Some network and server equipment might be installed. Network connectivity may already be established. Warm sites lack up-to-date data. At a minimum, a warm site needs backup data to be restored before the site can launch.

Backup Options

Digital tapes are often used for backups. Tapes can store a tremendous amount of data, but they are relatively slow. Hard drives are a common backup alternative. Hard drives are more expensive than tapes but are easier to work with. Critically important if using tape or hard drive backups is to take the media off-site. People who simply back up their data to an external hard drive and then leave the hard drive next to the main computer find themselves without a backup in case of fires or floods.

Companies like Backblaze backup data to the cloud. An advantage of cloud backup is that the backups are automatically made off-site. One problem with online backups is the time it takes to make an initial backup. It can take weeks or months to do an initial backup of a computer system. If you have a lot of data, a slow internet connection, or a combination of both, online backup services might not be good options.

Reflection

  • How much of your personal data on your computers are you willing to lose?
  • What is your personal backup strategy?
  • Have you ever tested the process of restoring data from your backup?

Key Terms

  • Disaster Recovery: A set of policies, procedures, and tools designed to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Disaster recovery focuses on restoring IT operations and data access to ensure business continuity.
  • Hot Site: A fully operational offsite data center that is equipped with hardware, software, and network connectivity to take over business operations immediately or within a very short period after a disaster. Hot sites are kept up-to-date with real-time data replication and are ready for immediate use.
  • Cold Site: An offsite facility that provides space and basic infrastructure, such as power and cooling, but does not include active IT equipment or data. In the event of a disaster, the organization must transport and install its own hardware and restore data from backups, resulting in a longer recovery time.
  • Warm Site: An offsite facility that is partially equipped with hardware and network connectivity, but may not have the latest data or fully configured systems. Warm sites require some setup and data restoration before they can take over business operations, offering a balance between cost and recovery time.
  • Recovery Point Objective (RPO): The maximum acceptable amount of data loss measured in time. RPO defines the point in time to which data must be recovered after a disaster to ensure business continuity. It helps determine the frequency of data backups and replication.
  • Recovery Time Objective (RTO): The maximum acceptable amount of time that a system, application, or process can be down after a disaster before normal operations must be restored. RTO defines the target time frame for recovering IT and business operations to minimize the impact of a disruption.