Issue Details (XML | Word | Printable)

Key: SFOS-1040
Type: New Feature New Feature
Status: Open Open
Priority: Major Major
Assignee: Steve Loughran
Reporter: Steve Loughran
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
SmartFrog

Create a hadoop-cluster component that includes everything needed to run hadoop on a real or virtual cluster

Created: 04/Dec/08 12:45 PM (GMT)   Updated: 22/Jun/10 04:08 PM (BST)
Component/s: _service_hadoop
Affects Version/s: 3.17.010
Fix Version/s: None

Time Tracking:
Issue & Sub-Tasks
Issue Only
Not Specified

Issue Links:
Depends
 

Compatibility: new feature

Sub-Tasks  All   Open   

 Description  « Hide
the hadoop component can run hadoop. But I would like a self-contained extras/ that uses this package to install and run a Hadoop cluster



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Steve Loughran added a comment - 04/Dec/08 12:48 PM (GMT)
Use Case #1: The local cluster

A local cluster of live systems is to have hadoop installed or upgraded

-a configuration .sf file lists the nodes to be managed, and their roles
-SmartFrog and hadoop RPMs get SCP'd in to the systems
-the cluster specific .sf files are copied up
-anubis is used to monitor the state of the cluster (and to push configurations out)
-the nodes come up in their chosen roles

Steve Loughran added a comment - 04/Dec/08 12:52 PM (GMT)
Use Case #2 the EC2 cluster

A cluster hosted on an Amazon EC2 datacentre -or other cloud datacentre with a compatible API- is to be created running hadoop

-The RPMs are pushed up to S3
-A single cluster image is brought up from a predefined template
-On that machine, the relevant RPMs are copied down from S3 and installed
-the machine is saved as a public or private template
-Any input data is copied up to S3
-a predefined number of virtual machines are deployed using this image
-one of them becomes the namenode, another the job tracker, the rest datanode/tasktrackers
-the MR job(s) are executed
-the results returned to S3
-the results may be copied from S3 to a local site
-the cluster is terminated

Steve Loughran added a comment - 04/Dec/08 12:54 PM (GMT)
Dependent components (and external dependencies)
* Hadoop (hadoop-core)
* EC2 (typica, jetset)
* Jetty
* SSH
* RPM
* Anubis
* Logging-services

This project is very much an integration/test component; the use cases drive the components we depend on. There should be no new SF components for this project, merely new .sf files

Steve Loughran added a comment - 08/Dec/08 11:37 AM (GMT)
Use Case #3: Cirrus deployment

Here the data is stored somewhere already in the datacentre, possibly in a different filesystem. you want to do work against it, and you have a limited budget which you don't want to go over.

Steve Loughran added a comment - 17/Jun/09 06:05 PM (BST)
we need to work out addresses