Recent events have brought a renewed focus to preventing catastrophic data loss. Data loss can occur due to a major destructive event, a series of events, or a security breach. This paper discusses some of the risks, planning pitfalls, and ways to financially quantify the spending of monies to prevent failures of data integrity. The discussion will apply specifically to some of the unique challenges surrounding PACS, focusing on methods for the justification and implementation of strategies to prevent security breaches or catastrophic data loss.
A BRIEF OVERVIEW
Business continuity, disaster avoidance, disaster recovery, and data risk management are all the same terms for the planning and implementation of actions that lessen the risk of data loss, corruption, or theft from abnormal and highly improbable destructive events.
For purposes of clarification, expected operational failures are different than disaster events. Operational failures are those that are expected to happen in time. For example, all electronic devices, ie, hard drives, CPUs, floppy disks, or optical drives, have a mean time to failure (MTF) rating, which represents the average time it should take the device to fail based on its manufacturing data. Note that this measure is not a guaranteed lifespan, but its average life. Therefore, these devices can be expected to fail in time.
These types of failures should not be included in disaster or security planning. These are operational failures of disposable devices and proficient data center operations procedures, such as solid backup schemes, drive redundancy, and life-cycle management, should mitigate these problems. While hard drives are expected to fail and network routers will lose power, these events are far different from an entire data center being destroyed by a tornado.
Protecting assets must also be differentiated from protecting data. As the health care industry moves further away from paper-based practice, we need to realize that what is important is data and the functionality it provides, not the hardware that stores or processes it.
This is an important distinction, for the costs of guaranteeing the survival of a physical asset is far higher than warranting the survival of data. One could lose all computer and storage systems for an enterprise, but, if a complete and up-to-date backup of the information resides somewhere else, then it can all be recreated.
THE ODDS GAME
All business continuity and security management projects are nothing more than specific applications of risk management. And when properly done, these all are extensions of the age-old odds game. Just as a successful, long-time gambler must know the exact odds of winning a game and its payout, for someone to successfully mitigate risks of either security- or disaster-related threats, we must understand the exact odds, the costs we will incur if the negative event takes place, and the responsibility we must incur to attempt to prevent it.
The alternative approach is blanket coverage for all risks. The drawback of this approach is that it may miss specific risks or may become too cost prohibitive.
In the past, the challenge was to accurately quantify risks. During the past 15 years, insurance companies and government agencies have collected large amounts of relatively accurate data involving disasters and security breaches. This allows us to very accurately calculate the probability of a disaster event happening.
For example, if it is known that, in a radius of 50 miles from a given facility, two square miles have been directly hit by category five tornados in the past 10 years, we can calculate the odds of our building being hit over the next 10 years as follows:
We can accurately calculate this for most natural disasters using data collected by various government agencies. For incidents such as internal flooding or fire, we can rely on insurance company data regarding our structures and locations.
The most difficult risks to calculate are soft risks. These are events such as employee sabotage or theft, interfaced system failures, or information theft. We may not be able to calculate these risks with the same specificity as the aforementioned, but, with a bit of diligence, it should be possible to calculate them with less than 10% error.
During the process of identifying risks and calculating their odds, one will come upon three categories of events that can be excluded from further planning. The first we have discussed, which are those that fall under normal expected operating failures.
The second are those events that, after analysis, prove to be so highly unlikely that there is no use planning for them (ie, a hurricane in Nebraska). These events can be removed in order to not cloud the planning process.
The final category comprises those events that regardless of their odds are so catastrophic that they would eliminate not only the data storage facilities, but also the business they support along with the customer base. An example of this is nuclear war or asteroid strike.
Remember, the odds of the earth getting hit by a large, environment-changing asteroid are better than those of winning the lottery for every dollar played.
REMOVING THE EMOTIONS
One of the greatest pitfalls of disaster or security planning is the emotions and individual reactions that surround any disaster, security, or terrorist event. The goal of terrorism is to create an unrealistic fear in a population. To that end, the September 11, 2001 attacks succeeded fantastically.
While there has been a lot of talk about terrorist attacks in the past few years, in the past 10, only three major attacks have made the news (two separate attempts at the World Trade Center and the Oklahoma City Federal Building). These three attacks have killed less than 3,000 people. Yet a great deal of resources and energy has been devoted to either mitigating invisible risks or incorrectly trying to prevent real risks.
For example, all the focus has been on bioterrorism attacks that are known, such as anthrax. Self-proclaimed experts tell people that they should buy plastic and duct tape to prevent against such an attack, yet these supplies would be almost completely useless against a real anthrax attack. The method that has been overlooked is a simple can of Lysol? spray. We fear a 300 lives per year event, but do not seem to bat an eye at the 26,000 people that die per year from the flu.
The reason these examples are raised is to expose the huge cost to both industry and individuals that is incurred in response to hype. A numbers-based approach will remove the speculation and hype surrounding disaster planning and bring the costs in line with the real risks.
September 11 played on this irrational fear. For example, one self-proclaimed expert convinced a few Congressmen that a small plane could be flown into a nuclear power plant and cause a nuclear explosion. We leave it to the reader to examine the details of this absurd scenario, especially in light of the government study that showed that even a Boeing 737 jet flying into a nuclear power plant could not breach the containment vessel.
CALCULATING THE COSTS
How does one go about calculating the costs of a disaster or security breach? Again, we are not very concerned with any impacted computing or storage hardware, but with the data and functionality it provides. As we move toward a paperless environment and one that is becoming highly automated, a complete loss of all electronic information may well result in a company ceasing to do business. We can then honestly show that such a loss is the value of the company.
Other models dictate calculating the loss costs based on revenue or operating profit. We believe it is more accurate to calculate what it would cost the organization in real dollars based on manual processes, legal ramifications, re-entry of data, loss of competitive advantage, customer perceptions, decision-making, command and control loss, and other standard business practices. This method requires someone on the planning and analysis team that has a good understanding of the operating costs of the business.
Be aware that many times the costs are grossly miscalculated or underestimated. There have been a number of examples over the past few years of the costs of losing data centers to flooding and other disasters that exemplify this point.
THE COST/RISK MODEL
In order to quantify the costs and justified expenditures involved with security and disaster prevention planning, we propose using a modified cost risk model based on actuary and risk barter concepts from old financial market practices.
We have previously talked about calculation of odds and the need to be able to quantify, as accurately as possible, these risks. What follows is a method to use the risks and the corresponding costs to come up with a fact-based risk cost.
After one has, as accurately as possible, calculated the odds of each event occurring utilizing data from, for instance, the hospital insurance company, the National Oceanographic and Atmospheric Administration, and in-house security personnel, the next step is to ascertain the costs of damage associated with each event. In other words, if the identified risk were to occur, how much damage would it do? It may be helpful to scale large events into various subgroups. For example, a tornado may be of three different magnitudes and may also pass near the facility, only graze it, or hit it completely. These permutations can be used to create multiple damage estimates.
A hurricane provides a good example. If a category five hurricane hit a data center, it would be a complete loss. Remember, we are looking at the data loss far more than the physical assets. So if all the data is lost, we need to look at the impact to the business. Take the example of a facility with a fully electronic medical record, billing, document imaging, and PACS. Though basic functionality may exist if all data was lost, the most reasonable outcome would be that the business would cease. Therefore, the cost of this event is the value of the company.
It can be argued that one could subtract out the cost of backups or the time it would take to rebuild all the systems and restore them from tape. While this does hold merit, it should not be included in the weighting or cost factors.
Now that you have the costs of the events, calculating risk becomes simple:
RISK COST = The Cost of Damage x The Odds of Damage
For example, in the case of the aforementioned hurricane, if the odds of being hit by the hurricane are 1 in 10,000 (probability =.0001) and the value of the business is $100 million, then we have:
.0001 x $100,000,000 = $10,000
|Figure 1. Cost/risk model calculation for a tornado hitting a data center. Odds and cost are hypothetical.|
The risk cost is $10,000. This is a realistic mathematical-based amount to spend on the mitigation of this identified risk. Figure 1 (page 32) illustrates a simple example based on a catastrophic tornado strike on a data center. The odds and costs are not accurate and are presented only for the sake of example.
If we take the odds of a tornado of complete destructive force hitting our data center to be .01, or 1%, over the next 10 years and we know that the loss of the data center will cost the business $1.5 billion, we can multiply these to get a risk cost of $15 million. This would closely parallel the costs of insuring data (which no one will do) over a 10-year period and, it can be argued, is realistic spending to prevent such a disaster.
What one can do, in order to simplify planning while remaining objective and quantitative, is to group the disaster/security events into common cost groupings. For example, one can group all of the events that would result in the closure of business, those that may cost $250 million, $100 million, and down to $50,000 events. The number of groupings can be expanded or reduced as needed.
|Figure 2. Disaster/security events are placed in common cost groupings to simplify the planning process.|
Figure 2 on page 32 provides a simplified example of this method. In the simple example illustrated in Figure 2 (again the numeric values are presented only for example purposes and are not accurate), we have grouped our risk events into three categories; $50 million risk events, $10 million risk events, and $1 million events. In an actual evaluation, we would use more discrete categories.
After assigning risk events to their respective damage costs groups, we then plug in the calculated risk scores or probability of each event happening. In our example, the $50 million events are major tornado, data center flood, data center fire, and explosion. For each of these, we know the probability, or risk, of the event occurring. These are reflected in the oval to the right of each risk factor. Since these are unrelated and not compounding events, we can total them to get the risk of a $50 million event happening. In our example we have a 0.1 probability of having a $50 million event. Multiplying .1 x $50 million gives us a risk cost of $5 million.
We now can do the same for the other risk categories. We then have three boxes with $5 million, $2 million, and $1 million as the risk costs for each group. Since these are also independent, we can add these up to come to a total risk costs of $7.3 million over the span of the risk time frame, which in this case is 10 years. Again, this calculation should represent what it would cost to insure the data, and therefore a reasonable amount to spend to guarantee survival of the data.
This model is advantageous in that it is scaleable to any organization. It can be made as complex or simple as needed. And finally, it takes out the ubiquity involved with preventing loss.
In the planning process, much attention has been given to preventing malicious data theft, so we will bypass that area and focus on some of the less visible risks.
Inherent in most DICOM images are the header data. These include many patient identifiers, institutional information, and other bits of data that could prove damaging both to a patient and from litigation against a facility if it fell into the public domain. For this reason, close scrutiny must be placed on the procedures for the transfer, for either diagnostic or research purposes, of DICOM images.
As we move to a setup where we attempt to display radiology data on nonradiologist workstations across the enterprise, we must deal with the risks of incorrect access and distribution of sensitive information.
Hospital and clinic security is somewhat open due to the nature of our business. Can you picture it if we asked for two forms of ID and ran a background check before we admitted someone into the emergency department? Our products and services are focused on fixing organic machines. To this end, we create an environment that fosters many people visiting and entering our facilities with very little in the way of security.
Yet we have started to move toward securing vulnerable patient areas.?? Pediatric monitoring is an example, but these systems are focused more toward preventing the unauthorized removal of a patient or the malicious entry of an entity. But it is an excellent example of how we can leverage technology to enhance security.
Though there is not yet an accepted trust by the general populace in electronic media, it is far easier to break into a mailbox and steal someone’s paper mail than it is to crack into the standard email system. Down this line, we can, with correct design and configuration, leverage our PACS and document storage systems as enhanced security tools. Both lessen the number of films and documents that exist both inside and outside of the health care system. It is arguably better to have the films or documents locked away electronically where only those with a need-to-know can access it, than to use multiple film clerks or medical records clerks, to pass the information through.
Finally, as with any optical or tape storage system, we must work to secure the areas that house the data archives. It is not a pretty sight when a primary system has the highest security, yet someone walks off with half of the optical storage platters.???? To this end, optical and tape storage systems must be physically protected from all employees except those involved in supporting systems.
SPECIFIC PACS CHALLENGES
Digital image storage systems for both medical diagnostic images and documents have their own inherent challenges on top of those mentioned.
The primary challenge revolves around the costs of these systems. Per user, they are usually the most expensive in health care. Redundancy in access is very difficult and costly due to unique configurations (banks of eight monitors, for example, in a reading room).
Frequently, sets of images are stored so that order and indexing are very important. Of all the data stored in health care, medical diagnostic images represent the largest per study. This is due to the required resolution and the tendency not to compress data, even if lossless methods are available. Furthermore, we have challenges where not only do we need to store the source data but also any further diagnostic studies created by enhancement or rendering software. Many times, this data is extremely important for historical and legal reasons.
Vendors have not been forthcoming in moving users toward a redundant platform, or helping to reduce the costs of storage.
Two bright prospects are now on the horizon. The first is large storage area networks (SANs). With most PACS implementations using well over 2 or 3 TB of data, until recently, it was cost prohibitive to store all of the image data on magnetic disks. With the fall in prices of disk drives and the evolving disk chaining technology, it is now possible to use magnetic disk to store all images, both historical and current, and utilize the optical subsystems as remote backup silos. Though this is not optimal and creates only two interdependent data sets, it is far less costly than the present option of creating two entire sets of optical archives.
The other prospect is long-term streaming storage, which has undergone recent advances. These systems copy all data committed to disk, either optical or magnetic, over a fast network connection to super servers in remote locations. There, the backup system stores the critical information on disk, while constantly writing data to magnetic tape. Furthermore, the system manages the tapes and backup sets and can create multiple copies.
Most PACS today run on standard hardware and operating systems. Even most modalities are available for shipping in a very short period of time. This, again, relates back to the preservation of data as opposed to hardware.
One final risk, in contrast to the previous paragraph, revolves around specialized systems, made with older or less than market-dominating hardware. It is wise to assess not only the imaging vendor’s health, but also the health of the companies that provide the components.
Disaster avoidance and security planning in health care have many unique challenges due to our ethics of caring for all, along with our increasing dependence on data-based systems to preserve human lives. It is our hope that by focusing on the facts surrounding the risks, organizations can provide a sound data preservation environment, while keeping costs aligned with the identified risks.
Sean D’Arcy is senior consultant for HealthLink, Houston, TX, a health care information technology consulting company; [email protected]. Julie D’Arcy, PhD, is a technical writer.