Storage and access to stored information have always been a critical component of health care. Access and comparison of historical medical information with the most recent information for a particular patient or group of patients is a part of every physician’s daily practice. Historically, standard medical practice includes many varied analog data storage techniques, such as radiology film, hand-written paper notes, analog dictated reports, pathology slides, and ECG paper printouts. These analog techniques inhibit the systematic comparison of data and restrict what analysis can be done to a few select co-located collaborating physicians or researchers. Limitations of these storage techniques often have hindered the delivery of care and the ability of clinicians and researchers to use stored information to improve and evolve health care.

The Internet and advances in storage and computing make broad-based digital electronic medical records practical. Never before has the opportunity been so large to advance health care based on the quantitative analysis of current and historical data. Advances in the field of genomics are an example of acceleration in knowledge growth catalyzed by accessible digital data storage. Genomics research is carried on worldwide in diverse and creative ways exploiting the accessibility of digitally stored data. The analysis of digitally stored data to improve outcomes for patients and evolve the practice of medicine is an imperative in the future of health care.

Why Now?

The Joint Commission on Accreditation for Healthcare Organizations (JCAHO) has established aggressive measurement and comparison goals for health care. Many of these goals can be practically implemented with readily accessible and extensible digital storage architectures. JCAHO guidelines for the electronic medical record (EMR) encompass both consolidation of individual patient records as well as benchmarking of key outcomes measures across health care organizations. Some of these information management JCAHO initiatives include:

  • IM.3 Uniform data definitions and data capture methods are used whenever possible.
  • IM.4 The necessary expertise and tools are available for the analysis and transformation of data into information.
  • IM.5.1 The format and methods for disseminating data and information are standardized, whenever possible.
  • IM.7 The hospital defines, captures, analyzes, transforms, transmits, and reports patient-specific data and information related to care processes and outcomes.
  • IM.7.2 The medical record contains sufficient information to identify the patient, support the diagnosis, justify the treatment, document the course and results, and promote continuity of care among health care providers.
  • IM.7.8 Every medical record entry is dated, its author identified, and, when necessary, authenticated.
  • IM.7.9 The hospital can quickly assemble and have access to all relevant information from components of a patient’s record, when the patient is admitted or seen for ambulatory or emergency care.
  • IM.7.10 Medical records are reviewed on an ongoing basis for completeness and timeliness of information, and action is taken to improve the quality and timeliness of documentation that impacts patient care.
  • IM.8 The hospital collects and analyzes aggregate data to support patient care and operations.
  • IM.9.1 The hospital’s knowledge-based information resources are available, and authoritative.
  • IM.10 Comparative performance data and information are defined, collected, analyzed, transmitted, reported, and used.

These goals emphasize the requirements for an accessible and extensible storage infrastructure. Storage of textual data has been addressed by many EMRs. Standards, such as HL7, have greatly facilitated this effort, encouraging the standardized exchange of textual information between the many different systems in the health care enterprise. Storage of textual information in large and small facilities is accommodated easily by magnetic disk technologies provided by many commercial vendors. Efforts led by such groups as Integrating the Healthcare Enterprise (IHE) focus on standardization of nomenclature, workflow improvements, and document sharing to facilitate consistent and secure exchange of medical information. Additionally, point-of-care capture systems will continue to evolve to improve medical practice efficiency, consistency, and precision. These are all initiatives set forth in the JCAHO guidelines.

Conversely, the production of medical imaging data exceeds the size of textual data by two orders of magnitude. The challenges of storing medical image and textual data have been addressed by SCAR1 (now the Society for Imaging Informatics in Medicine) and by the Waterloo Institute for Health Informatics Research (WIHIR).2 Key storage criteria are:

  • a) scalability;
  • b) accessibility;
  • c) cost containment;
  • d) routine technology replacement; and
  • e) disaster recovery.

Fundamental criteria established for medical image storage do not differ from the requirements for textual storage. But due to the vast amount of image information to be stored long term, implementation strategies may differ significantly. WIHIR provides criteria for an appropriate health care storage request for proposal (RFP). WIHIR also provides insight into likely vendor responses to a poorly prepared RFP. A strong vendor-driven bias is evident in the WIHIR white paper. This bias can be easily seen in the report’s summary of issues for future study. Item 10, in particular, states, “The evolution of Information Lifecycle Management will be towards automated Hierarchical Storage Management (aHSM); the effect of this on the capabilities of storage management systems must be studied.” HSM has been and is today a proven, readily available technology offered by several vendors. The vendors who sell HSM do not happen to be the storage vendors who played a key role in the WIHIR report.

Core Technology

Figure 1. The technology roadmap for magnetic disk and tape, which has enjoyed more than 2 decades of consistent and reliable cost reductions and capacity improvements. (click image for larger view.)

Key medical information storage criteria must be implemented in a practical storage architecture. The end-user service-level criteria must be converted into a technical specification and architecture. The key technology components required to achieve the service-level goals summarized above are:

  • a) balanced data read and write speeds to storage media;
  • b) adoption of computing industry standards for storage;
  • c) an extensible file system; and
  • d) an HSM strategy.

Balanced read and write speeds to digital storage media are required to ensure the ability of the stored data to be migrated regularly from old storage media to new technologies as they become available. Data migration is a necessity. Magnetic disk technologies are obsolete in as little as 2 years with storage densities doubling every 18 months. Optical and other media that write data significantly slower than they read can require more system resource to migrate existing data than the resource required to service new archive and retrieval requests. This makes data migration impractical. Optical media also may exhibit undesirable error rates.3 The available balanced read and write speed technology is magnetic (Figure 1). Magnetic disk and tape have enjoyed more than 2 decades of consistent and reliable cost reductions and capacity improvements.

Figure 2.The growth of medical image data acquisition far exceeds the growth of disk capacity at constant dollars. (click image for larger view.)

The univariate disk-only solution is intuitively appealing to all parties. Small hospitals may be able to budget appropriately for disk-only storage. Their optimal storage solution may be to set short on-site data retention periods and replace their entire disk-only infrastructure every 3 years. All of their storage could reasonably fit in only a single 19-inch rack. Hospitals exceeding 200 beds, facilities aggregating storage for radiology, cardiology, pathology, and other departments, soon will find the disk-only univariate solution costly and an ever-increasing part of their budget.4

Also, the disk-only solution advocated by the IT storage vendors is the most expensive. The growth of medical image data acquisition far exceeds the growth of disk capacity at constant dollars (Figure 2).5

Large disk archival systems almost always include licensed management software and support services. These additional nonstorage components now exceed 80% of the total dollars spent on storage (Figure 3).5 Technology enhancements in disk capacity account for less than 20% of the annual cost of storage. This makes incremental storage additions nearly as expensive as the first byte. Data migration from existing disk resource to newer disk technology can be an expensive proposition in a managed environment.

Figure 3. Licensed management software and support services for large disk archival systems now exceed 80% of the total dollars spent on storage. (click image for larger view.)

Ten years ago, the storage area network (SAN) was the future promise for disk data consolidation; today, it is information lifecycle management (ILM) and grid computing. These marketing strategies have forestalled the commoditization of the storage market since the early 1990s. Storage as a commodity is purchased at 5% to 10% of the “managed enterprise” cost. Today, common commodity disk is serial advanced technology attachment (SATA) technology. It is cheap, high capacity, and reliable. Fiber-attached storage may be appropriate for some database-intensive applications where smaller volumes of data can utilize more independently active disk spindles to enhance multi-user response.

Magnetic tape should play a key role in any archival and accessibility strategy. Magnetic tape must be carefully selected, but its benefits are a cost that is historically 10% of that for equal capacity disk. Duplicate data copies can be economically maintained. Enterprise-quality tape is available from vendors, such as IBM, Armonk, NY, and Sun Microsystems, Santa Clara, Calif. These tape systems can exceed the data reliability of RAID disk by an order of magnitude and provide access times of 30 seconds (Figure 4). Quality high-duty cycle tape drives are available to optimize either a backup and duplicate copy architecture or a full transactional architecture. Commercial-quality tape drives offer integral hardware data compression to further optimize capacity utilization. Tape read and write speeds can easily exceed those of individual RAID disk sets. This characteristic makes them suitable for restoring large volumes of data fast. A blended tape and disk strategy minimizes the total disk storage to be replaced every 3 years and can mitigate the costs of routine complete data migration from obsolete disk every 2 to 3 years.

A Role for Standards

Standards and regulations pertinent to storage of medical information include DICOM, HL7, HIPAA, FDA, JCAHO, CORBA, and many other acronyms. In addition, many proprietary standards and vendor marketing-driven concepts, such as content addressed storage (CAS), portable document format (PDF), global resource information database (GRID), SAN, virtualization, managed storage, and ILM, further clutter the decision process of the medical storage architect. The fundamental unit of storage is the digital bit. The fundamental computer storage technique for bits is the file system. Specifying a particular storage format, such as DICOM, or imposing a data interchange security requirement on storage, will result in greatly decreased flexibility in storage commoditization and reduced performance; it likely will lock in a single vendor as sole source. Security and specific formatting requirements can be met by applications software or standard computer operating system features.

File systems are the fundamental construct for storing digital data. Medical storage may require hundreds of tera-bytes of information. This data must be readily accessible, easily managed, and extensible. A firm knowledge and understanding of the limitations of available file systems are much more important than the selection of storage vendor. The standard UNIX file system (UFS) has significant performance limitations and extensibility challenges when storage exceeds 2 terabytes, or the number of files exceeds 10 million. Many acceptable file systems meet the needs of either a UNIX- or Windows-based storage architecture. Quota and file sentinel (QFS), Microsoft’s new technology file system (NTFS), zettabyte file system (ZFS), and third extended file system (EXT3) are all suitable alternatives. Cleveland Clinic Health System (CCHS) has investigated the scalability of each of these. Additionally, specific configurations have been tested for assured performance and reliability for more than 26 billion files in a single file system. Different procedures must be adopted to assure long-term data availability. One technique is to keep each file system relatively small and use many file systems. If a single file system fails, it can be restored from economical tape while access is maintained to all other valid file systems. Alternatively, some advanced file systems, such as QFS, store file system meta data on a physical volume separate from the data. This provides greatly enhanced speed while offering the management convenience of having a single file system no matter how large. The QFS separate meta data storage facilitates concurrent access to a second copy of data during automated restoration of failed primary copy media. This feature also provides great flexibility during data migration cycles as new technology is phased into production.

HSM software provides uniform and seamless access to a mix of varied storage technologies. HSM provides automated policy-based migration of data between media, duplicate copy maintenance, and transparent retrieval of data as file structured information. Most HSM provides automated data migration across media and subsystems as new data storage technology becomes available. Storage groups may be established within the data center and implemented with a SAN fabric. Alternatively, network attached storage (NAS)-connected remote storage can be transparently included in the architecture. This allows centralized management of a diverse storage architecture that can minimize network bandwidth requirements by locating storage in geographically separated areas where data is created and most often utilized.

An HSM can be configured to provide storage services directly to a PACS or an EMR, transparent to all applications software. Data access protocols, such as NFS, FTP, and CIFS for both UNIX and Windows, can be directly supported with data speeds limited only by the health care enterprise’s network infrastructure. The HSM provides tools to assure performance. Daily retrievals from RAID are balanced against access to data on tape. Prefetch operations driven by HL7 registration or orders interfaces optimize the cached data available directly from disk. The robotic tape system provides for the ad hoc query deterministically as long as the number of queries per hour is kept below about half of the system’s rated capacity (Figures 4 and 5). This capacity is typically in the range of 400 retrievals per hour. HSM data duplication and migration occur transparently coincident with ongoing daily operations.

Figure 4. Enterprise-quality tape systems can exceed the data reliability of RAID disk greatly and provide access times of 30 seconds. (click image for larger view.)
Figure 5. The traditional view of radiologist requests for comparisons shows the requests for comparisons declining to nearly zero over time. (click image for larger view.)

Large health care enterprises should be cognizant of the change in data access patterns that are facilitated by making data accessible. Within 90 days of making prior radiological imaging available, the number of comparison examination requests grew dramatically at CCHS. There are currently two prior examinations moved from the storage infrastructure for every new examination at CCHS. Patient history and course of illness are critical to medical practice and patient assessment. Figures 5 and 6 show two views of the same data retrieval over time. Figure 5 shows the traditional view of radiologist requests for comparisons declining nearly to zero over time. This has suggested that prior data can be taken offline after about a year, greatly simplifying the storage architecture design by reducing capacity. Considering the retrieval patterns at CCHS, the conclusion that prior data will not be utilized is clearly faulty. Many medical conditions require routine evaluation of aged data; osteoporosis, cancer, and heart disease are but a few. Which exact piece of old data will be required is difficult to predict. But old data in its entirety has significant value. Figure 6 shows the same data as Figure 5, but the probability of retrieval has been weighted by the number of data elements in the time period. Figure 6 allows us to draw the conclusion that old data clearly has value and, if accessible, will be used. Historical thinking about comparison study retrieval needs changes dramatically when deterministic image retrieval of comparisons is readily available.

Figure 6. The CCHS view of data over time uses the same data as in Figure 5, but weights the probability of retrieval by the number of data points in the time period. (click image for larger view.)
Figure 7. Illustration of the storage architecture for CCHS shared between radiology, cardiology, and other large data space users. (click image for larger view.)

Judicious choices in disk and tape technology managed within an HSM architecture can set new levels of medical data availability and retention reliability that greatly exceed those of the traditional analog systems. Figure 7 shows the storage architecture for CCHS shared between radiology, cardiology, and other large data space users.

In order to balance the physician needs for data access, budget requirements, long-term data retention, technology nonobsolescence, and scalability, a multivariate solution is required. A seamless mix of disk and tape magnetic technologies managed by HSM software may provide the optimal medical storage and accessibility solution.

Robert A. Cecil, PhD, is a physicist in the departments of radiology and cardiology at The Cleveland Clinic. For additional information, please contact .

References

  1. Cecil RA. PACS archival strategies. Proceedings of SCAR 2006, April 27-30, 2006, Austin, Tex. Leesburg, Va: Society for Imaging Informatics in Medicine.
  2. Storage in the Digital Healthcare Enterprise, A Waterloo Health Informatics Think-Tank Report. Waterloo, Ontario: Waterloo Institute for Health Informatics Research; November 29, 2004.
  3. The Orange Book, and other related color books. Eindhoven, The Netherlands: Philips Intellectual Property and Standards.
  4. McMahon LF Jr, Hayward R, Saint S, Chernew ME, Fendrick AM. Univariate solutions in a multivariate world: can we afford to practice as in the “good old days.” Am J Manag Care. 2005;8:473-6.
  5. Morris RJT, Truskowski BJ. The evolution of storage systems. IBM Systems Journal. 2003;2:207.