|
Use Case #2 the EC2 cluster
A cluster hosted on an Amazon EC2 datacentre -or other cloud datacentre with a compatible API- is to be created running hadoop -The RPMs are pushed up to S3 -A single cluster image is brought up from a predefined template -On that machine, the relevant RPMs are copied down from S3 and installed -the machine is saved as a public or private template -Any input data is copied up to S3 -a predefined number of virtual machines are deployed using this image -one of them becomes the namenode, another the job tracker, the rest datanode/tasktrackers -the MR job(s) are executed -the results returned to S3 -the results may be copied from S3 to a local site -the cluster is terminated Dependent components (and external dependencies)
* Hadoop (hadoop-core) * EC2 (typica, jetset) * Jetty * SSH * RPM * Anubis * Logging-services This project is very much an integration/test component; the use cases drive the components we depend on. There should be no new SF components for this project, merely new .sf files Use Case #3: Cirrus deployment
Here the data is stored somewhere already in the datacentre, possibly in a different filesystem. you want to do work against it, and you have a limited budget which you don't want to go over. we need to work out addresses
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
A local cluster of live systems is to have hadoop installed or upgraded
-a configuration .sf file lists the nodes to be managed, and their roles
-SmartFrog and hadoop RPMs get SCP'd in to the systems
-the cluster specific .sf files are copied up
-anubis is used to monitor the state of the cluster (and to push configurations out)
-the nodes come up in their chosen roles