Tuesday, April 7, 2015

Fast and Furious Engine for Computing

As we are designing and building PowerGene Pipeline v2 (PG-P2), I started to document the feature and function that can define a true workflow engine that can empower the world of scientific and analytical computing. If our data scientists, researchers, and even technologists desire a next generation of race car to take them to the next level of competition, then there should be minds thinking about inventing and building a fast and furious engine.

So here are my top 10 list for software defined workflow engine (SDWE). 

Abstraction — Of workloads from their physical implementation, thus decoupling a resource from its consumer. Abstraction enables definition of logical models of application or workflow (flow definition) that can be instantiated at the time of provisioning, thus enforcing standardization and enabling reusability. .

Orchestration - as applied to workflow, going beyond a single server or a cluster such that workloads with various architectural requirement can be optimally linked to available resources that now become transparent to the users and applications.

Automation — Beyond script-based automation, enabling automation of tasks, jobs and workflow across resource domains with built-in policy management for enforcement and optimization.

Standardization - Of workflows by a common set of naming standards, version control, runtime logging and provenance tracking

Customization - Of workloads into functional building blocks then connecting them into logical network, thus enabling workflow to be quickly composed or recomposed from proven workloads or subflow.

Visualization - Of the runtime environment through graphical user interface, as well as the final output using third-party visualization engine.

Scalability - Leverage world-class software defined storage infrastructure for extreme scalability. Supporting hundreds of pipelines that can run in parallel and scales to hundreds of thousands of concurrent jobs in a shared resource pool.

Manageability - The ability to start, suspend, restart and completely terminate the workflow manually (by user), as well as policy-based management of pipeline events (job success, failure, branching, convergence, etc).

Reusability - The ability to rerun the same pipelines (manually or by policy) or redeploy it as an embedded elements in a higher-level workflow.

Accessibility - Fine-grain, role-based access to make the solution available to only those who needs it.

No comments:

Post a Comment