org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators
Class POSplit

java.lang.Object
  extended by org.apache.pig.impl.plan.Operator<PhyPlanVisitor>
      extended by org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
          extended by org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
All Implemented Interfaces:
Serializable, Cloneable, Comparable<Operator>, Illustrable

public class POSplit
extends PhysicalOperator

The MapReduce Split operator.

The assumption here is that the logical to physical translation will create this dummy operator with just the filename using which the input branch will be stored and used for loading Also the translation should make sure that appropriate filter operators are configured as outputs of this operator using the conditions specified in the LOSplit. So LOSplit will be converted into: | | | Filter1 Filter2 ... Filter3 | | ... | | | ... | ---- POSplit -... ---- This is different than the existing implementation where the POSplit writes to sidefiles after filtering and then loads the appropriate file.

The approach followed here is as good as the old approach if not better in many cases because of the availability of attachinInputs. An optimization that can ensue is if there are multiple loads that load the same file, they can be merged into one and then the operators that take input from the load can be stored. This can be used when the mapPlan executes to read the file only once and attach the resulting tuple as inputs to all the operators that take input from this load. In some cases where the conditions are exclusive and some outputs are ignored, this approach can be worse. But this leads to easier management of the Split and also allows to reuse this data stored from the split job whenever necessary.

See Also:
Serialized Form

Field Summary
 
Fields inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
alias, dummyBag, dummyBool, dummyDBA, dummyDouble, dummyFloat, dummyInt, dummyLong, dummyMap, dummyString, dummyTuple, illustrator, input, inputAttached, inputs, lineageTracer, outputs, parentPlan, pigLogger, reporter, requestedParallelism, res, resultType
 
Fields inherited from class org.apache.pig.impl.plan.Operator
mKey
 
Constructor Summary
POSplit(OperatorKey k)
          Constructs an operator with the specified key
POSplit(OperatorKey k, int rp)
          Constructs an operator with the specified key and degree of parallelism
POSplit(OperatorKey k, int rp, List<PhysicalOperator> inp)
          Constructs an operator with the specified key, degree of parallelism and inputs
POSplit(OperatorKey k, List<PhysicalOperator> inp)
          Constructs an operator with the specified key and inputs
 
Method Summary
 void addPlan(PhysicalPlan inPlan)
          Appends the specified plan to the end of the nested input plan list
 Result getNext(Tuple t)
           
 List<PhysicalPlan> getPlans()
          Returns the list of nested plans.
 FileSpec getSplitStore()
          Returns the name of the file associated with this operator
 Tuple illustratorMarkup(Object in, Object out, int eqClassIndex)
          input tuple mark up to be illustrate-able
 String name()
           
 void removePlan(PhysicalPlan plan)
          Removes plan from the nested input plan list
 void setSplitStore(FileSpec splitStore)
          Sets the name of the file associated with this operator
 boolean supportsMultipleInputs()
          Indicates whether this operator supports multiple inputs.
 boolean supportsMultipleOutputs()
          Indicates whether this operator supports multiple outputs.
 void visit(PhyPlanVisitor v)
          Visit this node with the provided visitor.
 
Methods inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
attachInput, clone, cloneHelper, detachInput, getAlias, getAliasString, getDummy, getIllustrator, getInputs, getLogger, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getNext, getPigLogger, getRequestedParallelism, getResultType, isAccumStarted, isAccumulative, isBlocking, isInputAttached, processInput, reset, setAccumEnd, setAccumStart, setAccumulative, setAlias, setIllustrator, setInputs, setParentPlan, setPigLogger, setReporter, setRequestedParallelism, setResultType
 
Methods inherited from class org.apache.pig.impl.plan.Operator
compareTo, equals, getOperatorKey, getProjectionMap, hashCode, regenerateProjectionMap, rewire, toString, unsetProjectionMap
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

POSplit

public POSplit(OperatorKey k)
Constructs an operator with the specified key

Parameters:
k - the operator key

POSplit

public POSplit(OperatorKey k,
               int rp)
Constructs an operator with the specified key and degree of parallelism

Parameters:
k - the operator key
rp - the degree of parallelism requested

POSplit

public POSplit(OperatorKey k,
               List<PhysicalOperator> inp)
Constructs an operator with the specified key and inputs

Parameters:
k - the operator key
inp - the inputs that this operator will read data from

POSplit

public POSplit(OperatorKey k,
               int rp,
               List<PhysicalOperator> inp)
Constructs an operator with the specified key, degree of parallelism and inputs

Parameters:
k - the operator key
rp - the degree of parallelism requested
inp - the inputs that this operator will read data from
Method Detail

visit

public void visit(PhyPlanVisitor v)
           throws VisitorException
Description copied from class: Operator
Visit this node with the provided visitor. This should only be called by the visitor class itself, never directly.

Specified by:
visit in class PhysicalOperator
Parameters:
v - Visitor to visit with.
Throws:
VisitorException - if the visitor has a problem.

name

public String name()
Specified by:
name in class Operator<PhyPlanVisitor>

supportsMultipleInputs

public boolean supportsMultipleInputs()
Description copied from class: Operator
Indicates whether this operator supports multiple inputs.

Specified by:
supportsMultipleInputs in class Operator<PhyPlanVisitor>
Returns:
true if it does, otherwise false.

supportsMultipleOutputs

public boolean supportsMultipleOutputs()
Description copied from class: Operator
Indicates whether this operator supports multiple outputs.

Specified by:
supportsMultipleOutputs in class Operator<PhyPlanVisitor>
Returns:
true if it does, otherwise false.

getSplitStore

public FileSpec getSplitStore()
Returns the name of the file associated with this operator

Returns:
the FileSpec associated with this operator

setSplitStore

public void setSplitStore(FileSpec splitStore)
Sets the name of the file associated with this operator

Parameters:
splitStore - the FileSpec used to store the data

getPlans

public List<PhysicalPlan> getPlans()
Returns the list of nested plans.

Returns:
the list of the nested plans
See Also:
PlanPrinter

addPlan

public void addPlan(PhysicalPlan inPlan)
Appends the specified plan to the end of the nested input plan list

Parameters:
inPlan - plan to be appended to the list

removePlan

public void removePlan(PhysicalPlan plan)
Removes plan from the nested input plan list

Parameters:
plan - plan to be removed

getNext

public Result getNext(Tuple t)
               throws ExecException
Overrides:
getNext in class PhysicalOperator
Throws:
ExecException

illustratorMarkup

public Tuple illustratorMarkup(Object in,
                               Object out,
                               int eqClassIndex)
Description copied from interface: Illustrable
input tuple mark up to be illustrate-able

Parameters:
in - input tuple
out - output tuple before wrapped in ExampleTuple
eqClassIndex - index into equivalence classes in illustrator
Returns:
tuple


Copyright © 2007-2012 The Apache Software Foundation