org.apache.pig
Interface Accumulator<T>

All Known Implementing Classes:
AccumulatorEvalFunc, AlgebraicEvalFunc, AVG, COUNT, COUNT_STAR, DoubleAvg, DoubleMax, DoubleMin, DoubleSum, ExtremalTupleByNthField, FloatAvg, FloatMax, FloatMin, FloatSum, IntAvg, IntMax, IntMin, IntSum, JrubyAccumulatorEvalFunc, JrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.BagJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.ChararrayJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.DataByteArrayJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.DoubleJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.FloatJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.IntegerJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.LongJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.MapJrubyAlgebraicEvalFunc, JrubyAlgebraicEvalFunc.TupleJrubyAlgebraicEvalFunc, LongAvg, LongMax, LongMin, LongSum, MAX, MIN, StringMax, StringMin, SUM

@InterfaceAudience.Public
@InterfaceStability.Stable
public interface Accumulator<T>

An interface that allows UDFs that take a bag to accumulate tuples in chunks rather than take the whole set at once. This is intended for UDFs that do not need to see all of the tuples together but cannot be used with the combiner. This lowers the memory needs, avoiding the need to spill large bags, and thus speeds up the query. An example is something like session analysis. It cannot be used with the combiner because all it's inputs must first be ordered. But it does not need to see all the tuples at once. UDF implementors might also choose to implement this interface so that if other UDFs in the FOREACH implement it it can be used.

Since:
Pig 0.6

Method Summary
 void accumulate(Tuple b)
          Pass tuples to the UDF.
 void cleanup()
          Called after getValue() to prepare processing for next key.
 T getValue()
          Called when all tuples from current key have been passed to accumulate.
 

Method Detail

accumulate

void accumulate(Tuple b)
                throws IOException
Pass tuples to the UDF.

Parameters:
b - A tuple containing a single field, which is a bag. The bag will contain the set of tuples being passed to the UDF in this iteration.
Throws:
IOException

getValue

T getValue()
Called when all tuples from current key have been passed to accumulate.

Returns:
the value for the UDF for this key.

cleanup

void cleanup()
Called after getValue() to prepare processing for next key.



Copyright © 2007-2012 The Apache Software Foundation