- All Known Implementing Classes:
- AlgebraicBigDecimalMathBase, AlgebraicBigIntegerMathBase, AlgebraicByteArrayMathBase, AlgebraicDoubleMathBase, AlgebraicEvalFunc, AlgebraicFloatMathBase, AlgebraicIntMathBase, AlgebraicLongMathBase, AVG, BigDecimalAvg, BigDecimalMax, BigDecimalMin, BigDecimalSum, BigIntegerAvg, BigIntegerMax, BigIntegerMin, BigIntegerSum, BuildBloom, COR, COR, COUNT, COUNT_STAR, COV, COV, DateTimeMax, DateTimeMin, Distinct, DoubleAvg, DoubleMax, DoubleMin, DoubleSum, ExtremalTupleByNthField, FloatAvg, FloatMax, FloatMin, FloatSum, GroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.BigDecimalGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.BigIntegerGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.BooleanGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.ChararrayGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.DataBagGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.DataByteArrayGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.DateTimeGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.DoubleGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.FloatGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.IntegerGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.LongGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.MapGroovyAlgebraicEvalFunc, GroovyAlgebraicEvalFunc.TupleGroovyAlgebraicEvalFunc, IntAvg, IntMax, IntMin, IntSum, JrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.BagJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.ChararrayJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.DataByteArrayJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.DoubleJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.FloatJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.IntegerJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.LongJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.MapJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.TupleJrubyAlgebraicEvalFunc, LongAvg, LongMax, LongMin, LongSum, MAX, MaxTupleBy1stField, MIN, StringMax, StringMin, SUM, Top, TOP
@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Algebraic
An interface to declare that an EvalFunc's
calculation can be decomposed into intitial, intermediate, and final steps.
More formally, suppose we have to compute an function f over a bag X. In general, we need to know the entire X
before we can make any progress on f. However, some functions are algebraic e.g. SUM. In
these cases, you can apply some initital function f_init on subsets of X to get partial results.
You can then combine partial results from different subsets of X using an intermediate function
f_intermed. To get the final answers, several partial results can be combined by invoking a final
f_final function. For the function SUM, f_init, f_intermed, and f_final are all SUM.
See the code for builtin AVG to get a better idea of how algebraic works.
When eval functions implement this interface, Pig will attempt to use MapReduce's combiner.
The initial funciton will be called in the map phase and be passed a single tuple. The
intermediate function will be called 0 or more times in the combiner phase. And the final
function will be called once in the reduce phase. It is important that the results be the same
whether the intermediate function is called 0, 1, or more times. Hadoop makes no guarantees
about how many times the combiner will be called in a job.