- All Known Implementing Classes:
- AlgebraicBigDecimalMathBase, AlgebraicBigIntegerMathBase, AlgebraicByteArrayMathBase, AlgebraicDoubleMathBase, AlgebraicEvalFunc, AlgebraicFloatMathBase, AlgebraicIntMathBase, AlgebraicLongMathBase, AVG, BigDecimalAvg, BigDecimalMax, BigDecimalMin, BigDecimalSum, BigIntegerAvg, BigIntegerMax, BigIntegerMin, BigIntegerSum, BuildBloom, COR, COR, COUNT, COUNT_STAR, COV, COV, DateTimeMax, DateTimeMin, Distinct, DoubleAvg, DoubleMax, DoubleMin, DoubleSum, ExtremalTupleByNthField, FloatAvg, FloatMax, FloatMin, FloatSum, GroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.BigDecimalGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.BigIntegerGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.BooleanGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.ChararrayGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.DataBagGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.DataByteArrayGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.DateTimeGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.DoubleGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.FloatGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.IntegerGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.LongGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.MapGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.TupleGroovyAlgebraicEvalFunc, IntAvg, IntMax, IntMin, IntSum, JrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.BagJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.ChararrayJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.DataByteArrayJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.DoubleJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.FloatJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.IntegerJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.LongJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.MapJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.TupleJrubyAlgebraicEvalFunc, LongAvg, LongMax, LongMin, LongSum, MAX, MaxTupleBy1stField, MIN, StringMax, StringMin, SUM, Top, TOP
public interface Algebraic
An interface to declare that an EvalFunc's
calculation can be decomposed into intitial, intermediate, and final steps.
More formally, suppose we have to compute an function f over a bag X. In general, we need to know the entire X
before we can make any progress on f. However, some functions are algebraic e.g. SUM. In
these cases, you can apply some initital function f_init on subsets of X to get partial results.
You can then combine partial results from different subsets of X using an intermediate function
f_intermed. To get the final answers, several partial results can be combined by invoking a final
f_final function. For the function SUM, f_init, f_intermed, and f_final are all SUM.
See the code for builtin AVG to get a better idea of how algebraic works.
When eval functions implement this interface, Pig will attempt to use MapReduce's combiner.
The initial funciton will be called in the map phase and be passed a single tuple. The
intermediate function will be called 0 or more times in the combiner phase. And the final
function will be called once in the reduce phase. It is important that the results be the same
whether the intermediate function is called 0, 1, or more times. Hadoop makes no guarantees
about how many times the combiner will be called in a job.
- Get the initial function.
- A function name of f_init. f_init should be an eval func.
The return type of f_init.exec() has to be Tuple
- Get the intermediate function.
- A function name of f_intermed. f_intermed should be an eval func.
The return type of f_intermed.exec() has to be Tuple
- Get the final function.
- A function name of f_final. f_final should be an eval func parametrized by
the same datum as the eval func implementing this interface.
Copyright © 2007-2012 The Apache Software Foundation