Thursday, April 5, 2012

My Top10 Best Practice for HPC Architecture

TGIF Series-1: My Top10 Best Practice for HPC Architecture, adopted from >50 HPC cluster/cloud design, implementation and expansion projects.
  1. scalable (start small/middle size but can scale up-to petabyte for storage and hundreds of teraflops for compute)
  2. usable - the supercomputers I architected has a single interface or landing spot for researchers, analyst or programmers to log in and launch job. Sometime it takes leap of faith to understand why a HPC cluster is not a pool of 1000 computers for individual use but a single machine with 1000-node horsepower.
  3. modular - every component should be building block-based, plug-and-play, including network
  4. accessible (all software tools, user data, applications and scripts, should be accessible from anywhere)
  5. resilient (fault-tolerance should be a key consideration for networking, storage and file system)
  6. recoverable (backup and archival are necessity, not luxury)
  7. manageable (cloud-like manageability such as dynamic provisioning, virtualization)
  8. efficient (striving for 100% utilization, no IO wait, no CPU core idling)
  9. high-performance (obvious)
  10. low-cost (more obvious)

Updated on 2012.04.05 at 11:46PM (Thursday)

No comments:

Post a Comment