org.apache.pig.impl.plan
Class OperatorPlan<E extends Operator>

java.lang.Object
  extended by org.apache.pig.impl.plan.OperatorPlan<E>
All Implemented Interfaces:
Serializable, Cloneable, Iterable<E>
Direct Known Subclasses:
DotMRPrinter.InnerPlan, LogicalPlan, MROperPlan, PhysicalPlan, RulePlan

public abstract class OperatorPlan<E extends Operator>
extends Object
implements Iterable<E>, Serializable, Cloneable

A generic graphing class for use by LogicalPlan, PhysicalPlan, etc. One important aspect of this package is that it guarantees that once a graph is constructed, manipulations on that graph will maintain the ordering of inputs and outputs for a given node. That is, if a node has two inputs, 0 and 1, it is guaranteed that everytime it asks for its inputs, it will receive them in the same order. This allows operators that need to distinguish their inputs (such as binary operators that need to know left from right) to work without needing to store their inputs themselves. This is an extra burden on the graph package and not in line with the way graphs are generally understood mathematically. But it greatly reducing the need for graph manipulators (such as the validators and optimizers) to understand the internals of various nodes.

See Also:
Serialized Form

Nested Class Summary
static class OperatorPlan.IndexHelper<E>
           
 
Field Summary
protected static org.apache.commons.logging.Log log
           
protected  MultiMap<E,E> mFromEdges
           
protected  Map<OperatorKey,E> mKeys
           
protected  Map<E,OperatorKey> mOps
           
protected  MultiMap<E,E> mSoftFromEdges
           
protected  MultiMap<E,E> mSoftToEdges
           
protected  MultiMap<E,E> mToEdges
           
 
Constructor Summary
OperatorPlan()
           
 
Method Summary
 void add(E op)
          Insert an operator into the plan.
 void addAsLeaf(E leaf)
          Utility method heavily used in the MRCompiler Adds the leaf operator to the plan and connects all existing leaves to the new leaf
 void connect(E from, E to)
          Create an edge between two nodes.
 void createSoftLink(E from, E to)
          Create an soft edge between two nodes.
 boolean disconnect(E from, E to)
          Remove an edge from between two nodes.
 void doInsertBetween(E after, E newNode, E before, boolean rewire)
           
 void dump(PrintStream ps)
           
 void explain(OutputStream out, PrintStream ps)
           
 Map<OperatorKey,E> getKeys()
          Get the map of operator key and associated operators
 List<E> getLeaves()
          Get a list of all nodes in the graph that are leaves.
 E getOperator(OperatorKey opKey)
          Given an OperatorKey, find the associated operator.
 OperatorKey getOperatorKey(E op)
          Given an operator, find its OperatorKey.
 List<E> getPredecessors(E op)
          Find all of the nodes that have edges to the indicated node from themselves.
 List<E> getRoots()
          Get a list of all nodes in the graph that are roots.
 List<E> getSoftLinkPredecessors(E op)
          Find all of the nodes that have soft edges to the indicated node from themselves.
 List<E> getSoftLinkSuccessors(E op)
          Find all of the nodes that have soft edges from the indicated node to themselves.
 List<E> getSuccessors(E op)
          Find all of the nodes that have edges from the indicated node to themselves.
 void insertBetween(E after, E newNode, E before)
          Given two connected nodes add another node between them.
 boolean isSingleLeafPlan()
           
 Iterator<E> iterator()
           
 OperatorPlan<E> merge(OperatorPlan<E> inpPlan)
          Merges the operators in the incoming operPlan with this plan's operators.
 OperatorPlan<E> mergeSharedPlan(OperatorPlan<E> inpPlan)
          Merges the operators in the incoming plan with this plan's operators.
 boolean pathExists(E from, E to)
          A method to check if there is a path from a given node to another node
 void pushAfter(E first, E second, int outputNum)
          Push one operator after another.
 void pushBefore(E first, E second, int inputNum)
          Push one operator in front of another.
 void remove(E op)
          Remove an operator from the plan.
 void removeAndReconnect(E node)
          Remove a node in a way that connects the node's predecessor (if any) with the node's successor (if any).
 void removeAndReconnectMultiSucc(E node)
          Remove a node in a way that connects the node's predecessor (if any) with the node's successors (if any).
 void removeSoftLink(E from, E to)
          Remove an soft edge
 void replace(E oldNode, E newNode)
          Replace an existing node in the graph with a new node.
 int size()
           
 void swap(E first, E second)
          Swap two operators in a plan.
 void trimAbove(E op)
          Trim everything above a given operator.
 void trimBelow(E op)
          Trim everything below a given operator.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mOps

protected Map<E extends Operator,OperatorKey> mOps

mKeys

protected Map<OperatorKey,E extends Operator> mKeys

mFromEdges

protected MultiMap<E extends Operator,E extends Operator> mFromEdges

mToEdges

protected MultiMap<E extends Operator,E extends Operator> mToEdges

mSoftFromEdges

protected MultiMap<E extends Operator,E extends Operator> mSoftFromEdges

mSoftToEdges

protected MultiMap<E extends Operator,E extends Operator> mSoftToEdges

log

protected static final org.apache.commons.logging.Log log
Constructor Detail

OperatorPlan

public OperatorPlan()
Method Detail

getRoots

public List<E> getRoots()
Get a list of all nodes in the graph that are roots. A root is defined to be a node that has no input.


getLeaves

public List<E> getLeaves()
Get a list of all nodes in the graph that are leaves. A leaf is defined to be a node that has no output.


getOperatorKey

public OperatorKey getOperatorKey(E op)
Given an operator, find its OperatorKey.

Parameters:
op - Logical operator.
Returns:
associated OperatorKey

getOperator

public E getOperator(OperatorKey opKey)
Given an OperatorKey, find the associated operator.

Parameters:
opKey - OperatorKey
Returns:
associated operator.

getKeys

public Map<OperatorKey,E> getKeys()
Get the map of operator key and associated operators

Returns:
map of operator key and operators.

add

public void add(E op)
Insert an operator into the plan. This only inserts it as a node in the graph, it does not connect it to any other operators. That should be done as a separate step using connect.

Parameters:
op - Operator to add to the plan.

connect

public void connect(E from,
                    E to)
             throws PlanException
Create an edge between two nodes. The direction of the edge implies data flow.

Parameters:
from - Operator data will flow from.
to - Operator data will flow to.
Throws:
PlanException - if this edge will create multiple inputs for an operator that does not support multiple inputs or create multiple outputs for an operator that does not support multiple outputs.

createSoftLink

public void createSoftLink(E from,
                           E to)
                    throws PlanException
Create an soft edge between two nodes.

Parameters:
from - Operator dependent upon.
to - Operator having the dependency.
Throws:
PlanException - if the nodes is not in plan

removeSoftLink

public void removeSoftLink(E from,
                           E to)
Remove an soft edge

Parameters:
from - Operator dependent upon
to - Operator having the dependency

disconnect

public boolean disconnect(E from,
                          E to)
Remove an edge from between two nodes. Use insertBetween(Operator, Operator, Operator) if disconnect is used in the process of inserting a new node between two nodes by calling disconnect followed by a connect.

Parameters:
from - Operator data would flow from.
to - Operator data would flow to.
Returns:
true if the nodes were connected according to the specified data flow, false otherwise.

remove

public void remove(E op)
Remove an operator from the plan. Any edges that the node has will be removed as well.

Parameters:
op - Operator to remove.

trimBelow

public void trimBelow(E op)
Trim everything below a given operator. The specified operator will NOT be removed.

Parameters:
op - Operator to trim everything after.

trimAbove

public void trimAbove(E op)
Trim everything above a given operator. The specified operator will NOT be removed.

Parameters:
op - Operator to trim everything before.

getPredecessors

public List<E> getPredecessors(E op)
Find all of the nodes that have edges to the indicated node from themselves.

Parameters:
op - Node to look to
Returns:
Collection of nodes.

getSuccessors

public List<E> getSuccessors(E op)
Find all of the nodes that have edges from the indicated node to themselves.

Parameters:
op - Node to look from
Returns:
Collection of nodes.

getSoftLinkPredecessors

public List<E> getSoftLinkPredecessors(E op)
Find all of the nodes that have soft edges to the indicated node from themselves.

Parameters:
op - Node to look to
Returns:
Collection of nodes.

getSoftLinkSuccessors

public List<E> getSoftLinkSuccessors(E op)
Find all of the nodes that have soft edges from the indicated node to themselves.

Parameters:
op - Node to look from
Returns:
Collection of nodes.

pathExists

public boolean pathExists(E from,
                          E to)
A method to check if there is a path from a given node to another node

Parameters:
from - the start node for checking
to - the end node for checking
Returns:
true if path exists, false otherwise

iterator

public Iterator<E> iterator()
Specified by:
iterator in interface Iterable<E extends Operator>

merge

public OperatorPlan<E> merge(OperatorPlan<E> inpPlan)
                                       throws PlanException
Merges the operators in the incoming operPlan with this plan's operators. By merging I mean just making a combined graph with each one as a component It doesn't support merging of shared plans

Parameters:
inpPlan -
Returns:
this pointer
Throws:
PlanException

mergeSharedPlan

public OperatorPlan<E> mergeSharedPlan(OperatorPlan<E> inpPlan)
                                                 throws PlanException
Merges the operators in the incoming plan with this plan's operators. The plans can have shared components.

Parameters:
inpPlan -
Returns:
this pointer
Throws:
PlanException

addAsLeaf

public void addAsLeaf(E leaf)
               throws PlanException
Utility method heavily used in the MRCompiler Adds the leaf operator to the plan and connects all existing leaves to the new leaf

Parameters:
leaf -
Throws:
PlanException

isSingleLeafPlan

public boolean isSingleLeafPlan()

size

public int size()

insertBetween

public void insertBetween(E after,
                          E newNode,
                          E before)
                   throws PlanException
Given two connected nodes add another node between them. 'newNode' will be placed in same position in predecessor list as 'before' (old node).

Parameters:
after - Node to insert this node after
newNode - new node to insert. This node must have already been added to the plan.
before - Node to insert this node before
Throws:
PlanException - if it encounters trouble disconnecting or connecting nodes.

doInsertBetween

public void doInsertBetween(E after,
                            E newNode,
                            E before,
                            boolean rewire)
                     throws PlanException
Throws:
PlanException

replace

public void replace(E oldNode,
                    E newNode)
             throws PlanException
Replace an existing node in the graph with a new node. The new node will be connected to all the nodes the old node was. The old node will be removed. The new node is assumed to have no incoming or outgoing edges

Parameters:
oldNode - Node to be replaced
newNode - Node to add in place of oldNode
Throws:
PlanException

removeAndReconnect

public void removeAndReconnect(E node)
                        throws PlanException
Remove a node in a way that connects the node's predecessor (if any) with the node's successor (if any). This function does not handle the case where the node has multiple predecessors or successors.

Parameters:
node - Node to be removed
Throws:
PlanException - if the node has more than one predecessor or successor.

removeAndReconnectMultiSucc

public void removeAndReconnectMultiSucc(E node)
                                 throws PlanException
Remove a node in a way that connects the node's predecessor (if any) with the node's successors (if any). This function handles the case where the node has *one* predecessor and one or more successors. It replaces the predecessor in same position as node was in each of the successors predecessor list(getPredecessors()), to preserve input ordering for eg, it is used to remove redundant project(*) from plan which will have only one predecessor,but can have multiple success

Parameters:
node - Node to be removed
Throws:
PlanException - if the node has more than one predecessor

dump

public void dump(PrintStream ps)

explain

public void explain(OutputStream out,
                    PrintStream ps)
             throws VisitorException,
                    IOException
Throws:
VisitorException
IOException

swap

public void swap(E first,
                 E second)
          throws PlanException
Swap two operators in a plan. Both of the operators must have single inputs and single outputs.

Parameters:
first - operator
second - operator
Throws:
PlanException - if either operator is not single input and output.

pushBefore

public void pushBefore(E first,
                       E second,
                       int inputNum)
                throws PlanException
Push one operator in front of another. This function is for use when the first operator has multiple inputs. The caller can specify which input of the first operator the second operator should be pushed to.

Parameters:
first - operator, assumed to have multiple inputs.
second - operator, will be pushed in front of first
inputNum - indicates which input of the first operator the second operator will be pushed onto. Numbered from 0.
Throws:
PlanException - if inputNum does not exist for first operator

pushAfter

public void pushAfter(E first,
                      E second,
                      int outputNum)
               throws PlanException
Push one operator after another. This function is for use when the second operator has multiple outputs. The caller can specify which output of the second operator the first operator should be pushed to.

Parameters:
first - operator, assumed to have multiple outputs
second - operator, will be pushed after the first operator
outputNum - indicates which output of the first operator the second operator will be pushed onto. Numbered from 0.
Throws:
PlanException - if outputNum does not exist for first operator


Copyright © ${year} The Apache Software Foundation