public class POSplit extends PhysicalOperator
The assumption here is that the logical to physical translation will create this dummy operator with just the filename using which the input branch will be stored and used for loading Also the translation should make sure that appropriate filter operators are configured as outputs of this operator using the conditions specified in the LOSplit. So LOSplit will be converted into: | | | Filter1 Filter2 ... Filter3 | | ... | | | ... | ---- POSplit -... ---- This is different than the existing implementation where the POSplit writes to sidefiles after filtering and then loads the appropriate file.
The approach followed here is as good as the old approach if not better in many cases because of the availability of attachinInputs. An optimization that can ensue is if there are multiple loads that load the same file, they can be merged into one and then the operators that take input from the load can be stored. This can be used when the mapPlan executes to read the file only once and attach the resulting tuple as inputs to all the operators that take input from this load. In some cases where the conditions are exclusive and some outputs are ignored, this approach can be worse. But this leads to easier management of the Split and also allows to reuse this data stored from the split job whenever necessary.
PhysicalOperator.OriginalLocation
alias, illustrator, input, inputAttached, inputs, lineageTracer, mBagFactory, mTupleFactory, outputs, parentPlan, pigLogger, reporter, requestedParallelism, res, RESULT_EMPTY, RESULT_EOP, resultType
Constructor and Description |
---|
POSplit(OperatorKey k)
Constructs an operator with the specified key
|
POSplit(OperatorKey k,
int rp)
Constructs an operator with the specified key
and degree of parallelism
|
POSplit(OperatorKey k,
int rp,
List<PhysicalOperator> inp)
Constructs an operator with the specified key,
degree of parallelism and inputs
|
POSplit(OperatorKey k,
List<PhysicalOperator> inp)
Constructs an operator with the specified key and inputs
|
Modifier and Type | Method and Description |
---|---|
void |
addPlan(PhysicalPlan inPlan)
Appends the specified plan to the end of
the nested input plan list
|
POSplit |
clone()
Make a copy of this operator.
|
Result |
getNextTuple() |
List<PhysicalPlan> |
getPlans()
Returns the list of nested plans.
|
FileSpec |
getSplitStore()
Returns the name of the file associated with this operator
|
Tuple |
illustratorMarkup(Object in,
Object out,
int eqClassIndex)
input tuple mark up to be illustrate-able
|
String |
name() |
void |
removePlan(PhysicalPlan plan)
Removes plan from
the nested input plan list
|
void |
setSplitStore(FileSpec splitStore)
Sets the name of the file associated with this operator
|
boolean |
supportsMultipleInputs()
Indicates whether this operator supports multiple inputs.
|
boolean |
supportsMultipleOutputs()
Indicates whether this operator supports multiple outputs.
|
void |
visit(PhyPlanVisitor v)
Visit this node with the provided visitor.
|
addOriginalLocation, addOriginalLocation, attachInput, cloneHelper, clonePlans, copyAliasFrom, detachInput, getAlias, getAliasString, getIllustrator, getInputs, getLogger, getNext, getNextBigDecimal, getNextBigInteger, getNextBoolean, getNextDataBag, getNextDataByteArray, getNextDateTime, getNextDouble, getNextFloat, getNextInteger, getNextLong, getNextMap, getNextString, getOriginalLocations, getParentPlan, getPigLogger, getReporter, getRequestedParallelism, getResultType, isAccumStarted, isAccumulative, isBlocking, isEndOfAllInput, isInputAttached, processInput, reset, setAccumEnd, setAccumStart, setAccumulative, setIllustrator, setInputs, setParentPlan, setPigLogger, setReporter, setRequestedParallelism, setResultType, staticDataCleanup
compareTo, equals, getOperatorKey, getProjectionMap, hashCode, regenerateProjectionMap, rewire, toString, unsetProjectionMap
public POSplit(OperatorKey k)
k
- the operator keypublic POSplit(OperatorKey k, int rp)
k
- the operator keyrp
- the degree of parallelism requestedpublic POSplit(OperatorKey k, List<PhysicalOperator> inp)
k
- the operator keyinp
- the inputs that this operator will read data frompublic POSplit(OperatorKey k, int rp, List<PhysicalOperator> inp)
k
- the operator keyrp
- the degree of parallelism requestedinp
- the inputs that this operator will read data frompublic void visit(PhyPlanVisitor v) throws VisitorException
Operator
visit
in class PhysicalOperator
v
- Visitor to visit with.VisitorException
- if the visitor has a problem.public String name()
name
in class Operator<PhyPlanVisitor>
public boolean supportsMultipleInputs()
Operator
supportsMultipleInputs
in class Operator<PhyPlanVisitor>
public boolean supportsMultipleOutputs()
Operator
supportsMultipleOutputs
in class Operator<PhyPlanVisitor>
public FileSpec getSplitStore()
public void setSplitStore(FileSpec splitStore)
splitStore
- the FileSpec used to store the datapublic List<PhysicalPlan> getPlans()
PlanPrinter
public void addPlan(PhysicalPlan inPlan)
inPlan
- plan to be appended to the listpublic void removePlan(PhysicalPlan plan)
plan
- plan to be removedpublic Result getNextTuple() throws ExecException
getNextTuple
in class PhysicalOperator
ExecException
public POSplit clone() throws CloneNotSupportedException
PhysicalOperator
public Tuple illustratorMarkup(Object in, Object out, int eqClassIndex)
Illustrable
in
- input tupleout
- output tuple before wrapped in ExampleTupleeqClassIndex
- index into equivalence classes in illustratorCopyright © 2007-2017 The Apache Software Foundation