Thursday, December 25, 2014

History of IBM Reference Architecture for Genomics

About two years ago, I initiated an IBM project with MD Anderson Cancer Center to develop a research computing infrastructure for cancer genomics.

The reference architecture was born out of necessity from that project -- there are so many applications, workflow, pipelines to handle with even more choices of infrastructure and informatics technologies, both of which were also evolving quickly. We called the project PureGene -- "Pure" coming from Pure Flex system that hardware platform while "Gene" representing its connection with a gene/genetics/genomics.

As the project developed and matured, the reference architecture took root and gained more traction inside IBM. Starting 2014, a new term "PowerGene" was adopted as "Power" is a better representation of an IBM brand while Pure Flex was being moved as part of the Lenovo divesture.

Throughout 2014, PowerGene continued to gain  adoption through biomedical research community and there are 21 institutions and companies using PowerGene as enterprise architecture for research now.

The idea of a reference architecture is for it be both a point-of-time design for any system or platform, and also a blueprint for future expansion and growth. Doesn't sound exactly like what genes or genome are capable of? With that "organic" connection, I will start using the suffix "Gene" to represent all naming of reference architecture, with PowerGene being the first one.