Tuesday, February 26, 2013

State of Technology 2013 -- HPC Cloud

For my "State of Technology" series, I will review advancement in convergent technologies such as HPC Cloud, Big Data HPC and Big Data Cloud.

I define a true HPC Cloud as a high-performance computing system as a service delivered through private, public or hybrid cloud. To qualify for an HPC system, one needs to have tightly coupled networking and parallel file system connecting scalable computing and storage building blocks. The following are some of the latest advancement in HPC Cloud:

1. User Interface
A self-serving interface is the most prominent side of HPC Cloud, exposing the resource to users of a private cloud on campus, inside company or within a grid environment. VCL (Virtual Computing Lab) has been the leading solution in this space for many years, providing a scheduling-based reservation portal for users to request instance of service such as a classroom desktop image or HPC cluster (managed by LSF). An emerging main player is PCM-AE, a full-featured and commercial software from Platform Computing, now part of IBM. Platform Cluster Manager Advanced Edition (PCM-AE) has its root in both cluster management (hence the name) and cloud features such as a feature-rich user interface. Users can log in through a portal, and through a role-based model requesting/reserving resources such as physical (bare-metal) or virtual (virtualized) clusters. These cluster can be as simple as an LSF-managed multi-node system with an imbedded parallel file system running MPI jobs, to something as complicated as a hybrid cluster with latest accelerators or coprocessors.

2. Workflow
Cloud-based HPC workflow management tools haven't caught up to the growing demand from scientific users especially those build and consume sophisticated and repeatable workflow. An example is the genomic sequencing analysis pipeline from the sequence assembly to variant calling. Without an obvious solution, users typically will rely on open-source and industry-focused tools such as Galaxy for genomic analysis. The two major players here are Accelrys' Pipeline Pilot and Platform Computing's Process Manager. One common (and important) feature shared between Pipeline Pilot and Process Manager is the ability to use GUI-based tool to design, edit and manage workflows as a piece of digital asset. These asset will be stored in XML format and can be shared, published and run repeatedly by users with access to the system. Given the integration of Process Manager with PCM-AE, it has the advantage of being part of an overall HPC Cloud framework. What's still needed is the development and cataloging of industry-specific workflows so that new users don't have to start from scratch.

To be continued ...

3. Workload Management

4. Storage

5. Networking



Last Update:
2013.02.26 - first draft written in Chicago Palmer House

No comments:

Post a Comment