org.apache.pig
Interface Algebraic

All Known Implementing Classes:
AVG, COR, COR, COUNT, COUNT_STAR, COV, COV, Distinct, DoubleAvg, DoubleMax, DoubleMin, DoubleSum, ExtremalTupleByNthField, FloatAvg, FloatMax, FloatMin, FloatSum, IntAvg, IntMax, IntMin, IntSum, LongAvg, LongMax, LongMin, LongSum, MAX, MaxTupleBy1stField, MIN, StringMax, StringMin, SUM, Top, TOP

@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Algebraic

An interface to declare that an EvalFunc's calculation can be decomposed into intitial, intermediate, and final steps. More formally, suppose we have to compute an function f over a bag X. In general, we need to know the entire X before we can make any progress on f. However, some functions are algebraic e.g. SUM. In these cases, you can apply some initital function f_init on subsets of X to get partial results. You can then combine partial results from different subsets of X using an intermediate function f_intermed. To get the final answers, several partial results can be combined by invoking a final f_final function. For the function SUM, f_init, f_intermed, and f_final are all SUM. See the code for builtin AVG to get a better idea of how algebraic works. When eval functions implement this interface, Pig will attempt to use MapReduce's combiner. The initial funciton will be called in the map phase and be passed a single tuple. The intermediate function will be called 0 or more times in the combiner phase. And the final function will be called once in the reduce phase. It is important that the results be the same whether the intermediate function is called 0, 1, or more times. Hadoop makes no guarantees about how many times the combiner will be called in a job.


Method Summary
 String getFinal()
          Get the final function.
 String getInitial()
          Get the initial function.
 String getIntermed()
          Get the intermediate function.
 

Method Detail

getInitial

String getInitial()
Get the initial function.

Returns:
A function name of f_init. f_init should be an eval func.

getIntermed

String getIntermed()
Get the intermediate function.

Returns:
A function name of f_intermed. f_intermed should be an eval func.

getFinal

String getFinal()
Get the final function.

Returns:
A function name of f_final. f_final should be an eval func parametrized by the same datum as the eval func implementing this interface.


Copyright © ${year} The Apache Software Foundation