org.apache.pig
Class AlgebraicEvalFunc<T>

java.lang.Object
  extended by org.apache.pig.EvalFunc<T>
      extended by org.apache.pig.AccumulatorEvalFunc<T>
          extended by org.apache.pig.AlgebraicEvalFunc<T>
All Implemented Interfaces:
Accumulator<T>, Algebraic
Direct Known Subclasses:
JrubyAlgebraicEvalFunc

public abstract class AlgebraicEvalFunc<T>
extends AccumulatorEvalFunc<T>
implements Algebraic

This class is used to provide a free implementation of the Accumulator interface and EvalFunc class in the case of an Algebraic function. Instead of having to provide redundant implementations for Accumulator and EvalFunc, implementing the getInitial, getIntermed, and getFinal methods (which implies implementing the static classes they reference) will give you an implementation of each of those for free.

One key thing to note is that if a subclass of AlgebraicEvalFunc wishes to use any constructor arguments, it MUST call super(args).

IMPORTANT: the implementation of the Accumulator interface that this class provides is good, but it is simulated. For maximum efficiency, it is important to manually implement the accumulator interface. See Accumulator for more information on how to do so.


Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
AlgebraicEvalFunc(String... constructorArgs)
          It is key that if a subclass has a constructor, that it calls super(args...) or else this class will not instantiate properly.
 
Method Summary
 void accumulate(Tuple input)
          This is the free accumulate implementation based on the static classes provided by the Algebraic static classes.
 void cleanup()
          Per the Accumulator interface, this clears all of the variables used in the implementation.
abstract  String getFinal()
          This must be implement as per a normal Algebraic interface.
abstract  String getInitial()
          This must be implement as per a normal Algebraic interface.
abstract  String getIntermed()
          This must be implement as per a normal Algebraic interface.
 T getValue()
          This function returns the ultimate result.
 
Methods inherited from class org.apache.pig.AccumulatorEvalFunc
exec
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, outputSchema, progress, setPigLogger, setReporter, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AlgebraicEvalFunc

public AlgebraicEvalFunc(String... constructorArgs)
It is key that if a subclass has a constructor, that it calls super(args...) or else this class will not instantiate properly.

Method Detail

getFinal

public abstract String getFinal()
This must be implement as per a normal Algebraic interface. See Algebraic for more information.

Specified by:
getFinal in interface Algebraic
Returns:
A function name of f_final. f_final should be an eval func parametrized by the same datum as the eval func implementing this interface.

getInitial

public abstract String getInitial()
This must be implement as per a normal Algebraic interface. See Algebraic for more information.

Specified by:
getInitial in interface Algebraic
Returns:
A function name of f_init. f_init should be an eval func. The return type of f_init.exec() has to be Tuple

getIntermed

public abstract String getIntermed()
This must be implement as per a normal Algebraic interface. See Algebraic for more information.

Specified by:
getIntermed in interface Algebraic
Returns:
A function name of f_intermed. f_intermed should be an eval func. The return type of f_intermed.exec() has to be Tuple

accumulate

public void accumulate(Tuple input)
                throws IOException
This is the free accumulate implementation based on the static classes provided by the Algebraic static classes. This implemention works by leveraging the initial, intermediate, and final classes provided by the algebraic interface. The exec function of the Initial EvalFunc will be called on every Tuple of the input and the output will be collected in an intermediate state. Periodically, this intermediate state will have the Intermediate EvalFunc called on it 1 or more times. The Final EvalFunc is not called until getValue() is called.

Specified by:
accumulate in interface Accumulator<T>
Specified by:
accumulate in class AccumulatorEvalFunc<T>
Parameters:
input - A tuple containing a single field, which is a bag. The bag will contain the set of tuples being passed to the UDF in this iteration.
Throws:
IOException

cleanup

public void cleanup()
Per the Accumulator interface, this clears all of the variables used in the implementation.

Specified by:
cleanup in interface Accumulator<T>
Specified by:
cleanup in class AccumulatorEvalFunc<T>

getValue

public T getValue()
This function returns the ultimate result. It is when getValue() is called that the Final EvalFunc's exec function is called on the accumulated data.

Specified by:
getValue in interface Accumulator<T>
Specified by:
getValue in class AccumulatorEvalFunc<T>
Returns:
the value for the UDF for this key.


Copyright © 2007-2012 The Apache Software Foundation