(This is the second part of a two-part series. To read part one, visit “Data Lakes: Will Healthcare Be a Trailblazer?“)

By Aine Cryts

Since 2013, UC Irvine Health, which includes the clinical, medical education, and research enterprises of the University of California, has been working with the Hortonworks Data Platform to improve its hospital’s clinical operations and the medical research being done at its medical school. The healthcare organization chose this data lake solution from Santa Clara, Calif.-based Hortonworks in order to accelerate new research projects, reduce readmissions, and track “patient vital stats on a minute-by-minute basis,” according to the vendor.

Before implementing their data lake, UC Irvine Health’s clinical information group had data scattered across multiple Excel spreadsheets, according to Hortonworks. In addition, the healthcare organization had 9 million semi-structured records for 1.2 patients over 22 years. None of that data – which included radiology and pathology reports and rounding notes – was searchable or retrievable.

With their data lake in place, the clinical information group has access to all 9 million of the semi-structured legacy records, according to the vendor. These records are now searchable and retrievable in Hortonworks’ Hadoop Distributed File System, which has allowed the health system to retire a legacy system – and that resulted in savings of more than $500,000.

One way UC Irvine is using its data lake solution to improve patient care is to process information captured from scales that congestive heart failure patients use to monitor their weight each day, according to Healthcare Informatics.1

Healthcare industry could lead the way with data lakes

Is this a new technology and what’s its value for healthcare today? According to David Dimond, chief technology officer for global healthcare business at EMC, industries that are working with data lakes today are those that have “embraced the ‘internet of things’ strategy for their business – where every piece of data has potential value. The data lake approach enables them to shift their thinking from analyzing the data they have to a new world where the focus is on collecting and measuring anything they can get – sometimes by the addition [of] sensors, creating more transaction logs, etc.”

Dimond says that healthcare has the opportunity to “leap ahead” of other industries when it comes to using data lakes. “Many industries with high analytics maturity have not made the shift to a data lake strategy yet. Some organizations are so sophisticated in what they do with data today, and their development of analytics platforms has been so rigid, that it will take time for them to move towards a more flexible and free form data lakes approach.”


Brent Richter, Partners Healthcare.

Brent Richter, associate director of information services operations at Partners Healthcare, advises any healthcare organization interested in embarking on a data lakes strategy to realize that the technology is fluid and everyone will use the system differently. He points to the need of the data lake to be “plastic” in order to accommodate the different use cases and needs at a particular health system.

Also good to keep in mind is the need to make the technology itself transparent to the end users of the analytics furnished by the data lakes solution. Richter cautions any organization to realize that “it’s not about the technology stack, [nor is it about] the data that’s included in the [data] lake. It’s about the framework availability to the innovators that allows them to develop their needed analytics, together with user interfaces,” he says. Since the data lakes project involves a lot of technology, he says the project team assigned should be highly integrated and include experts with skills in cloud computing, big data analytics, and high-performance database technologies.



  1. Hagland, M. (2013 December 15). At the University of California Irvine, a Big-Data Revolution. Healthcare Informatics. Accessed August 20, 2015 via http://www.healthcare-informatics.com/article/university-california-irvine-big-data-revolution.