org.apache.pig.builtin
Class SUM

java.lang.Object
  extended by org.apache.pig.EvalFunc<Double>
      extended by org.apache.pig.builtin.SUM
All Implemented Interfaces:
Accumulator<Double>, Algebraic

public class SUM
extends EvalFunc<Double>
implements Algebraic, Accumulator<Double>

Generates the sum of a set of values. This class implements Algebraic, so if possible the execution will performed in a distributed fashion.

SUM can operate on any numeric type. It can also operate on bytearrays, which it will cast to doubles. It expects a bag of tuples of one record each. If Pig knows from the schema that this function will be passed a bag of integers or longs, it will use a specially adapted version of SUM that uses integer arithmetic for summing the data. The return type of SUM is double for float, double, or bytearray arguments and long for int or long arguments.

SUM implements the Accumulator interface as well. While this will never be the preferred method of usage it is available in case the combiner can not be used for a given calculation.


Nested Class Summary
static class SUM.Final
           
static class SUM.Initial
           
static class SUM.Intermediate
           
 
Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
SUM()
           
 
Method Summary
 void accumulate(Tuple b)
          Pass tuples to the UDF.
 void cleanup()
          Called after getValue() to prepare processing for next key.
 Double exec(Tuple input)
          This callback method must be implemented by all subclasses.
 List<FuncSpec> getArgToFuncMapping()
          Allow a UDF to specify type specific implementations of itself.
 String getFinal()
          Get the final function.
 String getInitial()
          Get the initial function.
 String getIntermed()
          Get the intermediate function.
 Double getValue()
          Called when all tuples from current key have been passed to accumulate.
 Schema outputSchema(Schema input)
          Report the schema of the output of this UDF.
protected static Double sum(Tuple input)
           
protected static Double sumDoubles(Tuple input)
           
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getCacheFiles, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, isAsynchronous, progress, setPigLogger, setReporter, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SUM

public SUM()
Method Detail

exec

public Double exec(Tuple input)
            throws IOException
Description copied from class: EvalFunc
This callback method must be implemented by all subclasses. This is the method that will be invoked on every Tuple of a given dataset. Since the dataset may be divided up in a variety of ways the programmer should not make assumptions about state that is maintained between invocations of this method.

Specified by:
exec in class EvalFunc<Double>
Parameters:
input - the Tuple to be processed.
Returns:
result, of type T.
Throws:
IOException

getInitial

public String getInitial()
Description copied from interface: Algebraic
Get the initial function.

Specified by:
getInitial in interface Algebraic
Returns:
A function name of f_init. f_init should be an eval func. The return type of f_init.exec() has to be Tuple

getIntermed

public String getIntermed()
Description copied from interface: Algebraic
Get the intermediate function.

Specified by:
getIntermed in interface Algebraic
Returns:
A function name of f_intermed. f_intermed should be an eval func. The return type of f_intermed.exec() has to be Tuple

getFinal

public String getFinal()
Description copied from interface: Algebraic
Get the final function.

Specified by:
getFinal in interface Algebraic
Returns:
A function name of f_final. f_final should be an eval func parametrized by the same datum as the eval func implementing this interface.

sum

protected static Double sum(Tuple input)
                     throws ExecException
Throws:
ExecException

sumDoubles

protected static Double sumDoubles(Tuple input)
                            throws ExecException
Throws:
ExecException

outputSchema

public Schema outputSchema(Schema input)
Description copied from class: EvalFunc
Report the schema of the output of this UDF. Pig will make use of this in error checking, optimization, and planning. The schema of input data to this UDF is provided.

Overrides:
outputSchema in class EvalFunc<Double>
Parameters:
input - Schema of the input
Returns:
Schema of the output

getArgToFuncMapping

public List<FuncSpec> getArgToFuncMapping()
                                   throws FrontendException
Description copied from class: EvalFunc
Allow a UDF to specify type specific implementations of itself. For example, an implementation of arithmetic sum might have int and float implementations, since integer arithmetic performs much better than floating point arithmetic. Pig's typechecker will call this method and using the returned list plus the schema of the function's input data, decide which implementation of the UDF to use.

Overrides:
getArgToFuncMapping in class EvalFunc<Double>
Returns:
A List containing FuncSpec objects representing the EvalFunc class which can handle the inputs corresponding to the schema in the objects. Each FuncSpec should be constructed with a schema that describes the input for that implementation. For example, the sum function above would return two elements in its list:
  1. FuncSpec(this.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.DOUBLE)))
  2. FuncSpec(IntSum.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.INTEGER)))
This would indicate that the main implementation is used for doubles, and the special implementation IntSum is used for ints.
Throws:
FrontendException

accumulate

public void accumulate(Tuple b)
                throws IOException
Description copied from interface: Accumulator
Pass tuples to the UDF.

Specified by:
accumulate in interface Accumulator<Double>
Parameters:
b - A tuple containing a single field, which is a bag. The bag will contain the set of tuples being passed to the UDF in this iteration.
Throws:
IOException

cleanup

public void cleanup()
Description copied from interface: Accumulator
Called after getValue() to prepare processing for next key.

Specified by:
cleanup in interface Accumulator<Double>

getValue

public Double getValue()
Description copied from interface: Accumulator
Called when all tuples from current key have been passed to accumulate.

Specified by:
getValue in interface Accumulator<Double>
Returns:
the value for the UDF for this key.


Copyright © 2007-2012 The Apache Software Foundation