Dr Frank Blog: Making Dark Medical Data Visible

According to a recent market survey conducted by HIMSS Media, an average of 66 percent of the unstructured data in healthcare enterprises remains inaccessible and unavailable for patient care decisions. Because a major portion of this data seems to be invisible and hard to manage, many organizations are overwhelmed by the abundance of data, rather than enabled and empowered.

In addition, InfoWorld published that most data scientists spend only 20 percent of their time on actual data analysis and 80 percent of their time finding, cleaning, and reorganizing huge amounts of data – an unfortunate and inefficient data strategy, indeed.

In an industry that is becoming even more data-driven and evidence-based, organizations must embrace new technologies that will help them manage data efficiently and uncover the many meaningful insights still hidden in their mountains of data.

Deploying a robust and holistic data strategy that enables healthcare organizations to leverage all their growing volumes of data usually dispersed across geographic and organizational boundaries is key to accelerating precision medicine and impacting the health of individuals. At the heart of this strategy lies a scalable, agile and sustainable IT foundation that can support the demands of precision medicine and is based on three key principles:

First, the architecture must be based on software-defined infrastructure (SDI) solutions that offer advanced policy-driven data and resource management capabilities. Although this infrastructure is built upon hardware solutions with chips and processors and data that resides in storage systems, we need to remember that the ability to operate, orchestrate, protect and manage data intelligently exists in the software, or in the middleware that sits between the hardware and the applications. This IT architecture must also support an open framework and the hundreds, if not thousands, of applications being developed in the areas of genomics, imaging, clinical analytics, artificial intelligence and deep learning. Many of these applications are isolated in functional and operational siloes, creating the need for a shared compute and storage infrastructure based on advanced software to consolidate and transform static, siloed systems into dynamic, integrated, and intelligent infrastructure, resulting in faster analytics and greater resource utility.

Second, any solution needs to follow a proven reference architecture that has been fully-tested. The deep experience we have at IBM with world-class healthcare and life sciences customers has taught us that software capabilities can easily be dictated by the underlying hardware building blocks (CPU vs GPU, on-premises vs cloud, x86 vs Power9) and even more so by the applications they need to serve. Without a consistent framework and roadmap in the form of a reference architecture, things can fall apart very quickly. Initially, building a robust data strategy and underlying IT infrastructure may take more effort, but the value and benefits that your organization can gain will be much more long-lasting and wide-reaching in terms of speed, scale, collaboration, ease of use and costs.

Finally, the architecture needs to be part of a global ecosystem. We all realize that collaboration does not exist within the four walls of a single organization anymore. We see many research initiatives between top cancer centers, genome centers and large pharma R & D and biotech companies that involve strategic partnerships around the globe. The common reference architecture they all use enables them to easily collaborate and share data.

For example, a research hospital can develop a cancer genomics pipeline and share it with another institution quickly, either by sending the XML-based script or publishing it in a cloud-based portal like an application store. We have also started to see early examples of data sharing using metadata and RESTful APIs. Based on this approach, parallel communities or consortia are being formed for digital medical imaging, translational research and big data analytics. This makes parallel discovery possible.

The journey of the reference architecture

IBM’s high-performance data and AI (HPDA) architecture for healthcare and life sciences was designed to boost medical research efforts. It is based on best practices that have been tested with top healthcare solution providers and partners, and most importantly with customers that are at the forefront of precision medicine, such as Sidra Medicine.

For the first generation of the references architecture that was established in 2013, we designed a “Datahub” as an abstraction layer for handling demanding genomics requirements such as high-throughput data landing, information life cycle management and global namespace regardless of sharing protocol. These requirements could sometimes be met easily on a single workstation or small cluster, but the capability to handle hundreds of servers and petabytes of data is what made the Datahub so unique and essential. And what made the Datahub even more valuable was its intrinsic scalability to start small (or big) and grow and scale out rapidly based on the workloads. As the next-gen sequencing technologies were rapidly advancing, the data and workloads could grow at a rate of 100% every six months. The Datahub fulfills these requirements through software in concert with storage building blocks (flash, disk and tape library) that support tiering based on performance and cost objectives.

We also designed an “Orchestrator” as the second abstraction layer for handling application requirements and mapped it towards the computing building blocks. It has specified functions such as parallel computing and workflow automation which can be fulfilled by software in concert with computing resource such as an HPC cluster or virtual machine farms.

This software-defined blueprint was essential to future-proof the infrastructure and sustain the usability of applications so that the hardware building blocks could be expanded or replaced without impacting the operation of the system, the running of the application, and ultimately the user experience.

The reference architecture has continued to evolve throughout the years to reflect the enormous changes on numerous fronts that healthcare and life sciences organizations have had to face due to disruptive market forces that constantly reshape the industry and the way patient care is delivered.

The main investments we’ve made in the last two years were focused on positioning the reference architecture as a true data-driven, cloud-ready, AI-capable solution that addresses very complex data at scale and the most demanding analytics requirements with cost-effective high-performance capabilities.

At this point, the second generation of the reference architecture has been developed and includes advanced features and solutions based on feedback we’ve received from users in fields of research and clinical practices such as genomics, metabolomics, proteomics, medical imaging, clinical analytics, biostatistics and even real-world evidence (RWE) and the internet of things (IoT).

One of the most exciting things that we observe is that this collaboration across fields has actually started to bring more and more users together to work and share their experiences.

We are fortunate to witness and document these challenges and the needs of leading institutes at the frontiers of precision medicine. Thanks to their HPDA-based solutions, they are experiencing significantly faster times to results, along with many other benefits. The results can be a genomics analysis of clinical variants for patients, or an AI model developed to diagnose Alzheimer’s disease, or new biomarkers for cancer. In all these cases, traditional desktop computing could no longer keep up with the workloads or the data storage. Previously, these users had to wait days or even weeks for data to be transferred and loaded, then even longer for processing and analysis. But after implementing the HPDA reference architecture – not anymore.

Learn more about some of the leading precision medicine initiatives around the globe supported by IBM’s high performance data and AI (HPDA) architecture at my upcoming talk at Bio-IT world.

Dr Frank Blog

Tuesday, April 9, 2019

Making Dark Medical Data Visible

The journey of the reference architecture

1 comment: