Dr Frank Blog: My Top10 Best Practice for HPC Architecture

Thursday, April 5, 2012

TGIF Series-1: My Top10 Best Practice for HPC Architecture, adopted from >50 HPC cluster/cloud design, implementation and expansion projects.

scalable (start small/middle size but can scale up-to petabyte for storage and hundreds of teraflops for compute)
usable - the supercomputers I architected has a single interface or landing spot for researchers, analyst or programmers to log in and launch job. Sometime it takes leap of faith to understand why a HPC cluster is not a pool of 1000 computers for individual use but a single machine with 1000-node horsepower.
modular - every component should be building block-based, plug-and-play, including network
accessible (all software tools, user data, applications and scripts, should be accessible from anywhere)
resilient (fault-tolerance should be a key consideration for networking, storage and file system)
recoverable (backup and archival are necessity, not luxury)
manageable (cloud-like manageability such as dynamic provisioning, virtualization)
efficient (striving for 100% utilization, no IO wait, no CPU core idling)
high-performance (obvious)
low-cost (more obvious)

Updated on 2012.04.05 at 11:46PM (Thursday)

Dr Frank Blog