|
It's a lot simpler to not play games with multiple inheritance, and instead have the cluster child be either a LAZY reference to a deployed configuration *or* a CD to actually deploy as a child.
This requires 1 all HadoopComponents to become workflow compounds 2 the cluster component to get deployed early if it is a CD, terminated during termination 3 the ManagedConfiguration to get its config from the cluster() data, and not the local prim (#3) is going to break existing code/tests There are some usability constraints to consider here. Imagine a client component -such as one that copies files in and out, or submits jobs. These want to take a cluster definition, but then override any value in there with anything set locally.
eg. CopyHadoopFile extends DfsCopyFileIn { src "/tmp/data.gzip"; dest "/project/analysis.gzip"; cluster LAZY livecluster; dfs.replication.factor 1; } where the replication factor is throttled back. Without that local override, it would be something like Cluster2 extends livecluster { dfs.replication.factor 1; } But even that is limited, as the cluster state comes at deploy time, whereas we may want to pick up some other facts from a live, running cluster. Proposal. DFS client applications will support a cluster reference that provides the basis for their values, but everything can override any of these properties locally. When the component is started, it copies in all current information from the (deployed) cluster reference, adding it to the local node, except for that which is already deployed. Then the config remains bound to the node for the rest of its life, changes to the Configuration instance propagating back. with the changes to ManagedConfiguration, we can move to this
There's some fun here with directories; those components that resolve directories to work with will currently pick them up locally, and not go via an (optional) cluster configuration.
Arguably, that's good: different nodes should have different directories. But it will be inconsistent. Also: need to add logic to pick up a list of required attrs from the different clusters, and use sfResolve to pull them in from any parent
This is done. It was hard work, so marked as Major. There are now cluster-driven components as well as the inline ones, and everything is working.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
There's a need to catch race-condition problems by ensuring that the HadoopConfiguration is live before the copying is done.
* All the attributes could be copied over, or they could be left as is for lazy-evaluation, especially for relative values. That would be safer.
* the HadoopConfiguration component does an early binding
* we could do checks in managedconfiguration, using a list (which could be in the override class) to say what is going on.