Class EvalFunc<T>

  extended by org.apache.pig.EvalFunc<T>
Direct Known Subclasses:
ABS, ABS, ARITY, AVG, AVG.Final, AVG.Initial, AVG.Intermediate, BagSize, Base, Base, Bin, BinCond, CONCAT, ConstantSize, copySign, COR, COR, COR.Final, COR.Final, COR.Initial, COR.Initial, COR.Intermed, COR.Intermed, COUNT, COUNT_STAR, COUNT_STAR.Final, COUNT_STAR.Initial, COUNT_STAR.Intermediate, COUNT.Final, COUNT.Initial, COUNT.Intermediate, COV, COV, COV.Final, COV.Final, COV.Initial, COV.Initial, COV.Intermed, COV.Intermed, CustomFormatToISO, DateExtractor, Decode, DIFF, DiffDate, Distinct, Distinct.Final, Distinct.Initial, Distinct.Intermediate, DoubleAbs, DoubleAbs, DoubleAvg, DoubleAvg.Final, DoubleAvg.Initial, DoubleAvg.Intermediate, DoubleCopySign, DoubleGetExponent, DoubleMax, DoubleMax, DoubleMax.Final, DoubleMax.Initial, DoubleMax.Intermediate, DoubleMin, DoubleMin, DoubleMin.Final, DoubleMin.Initial, DoubleMin.Intermediate, DoubleNextAfter, DoubleNextup, DoubleRound, DoubleRound, DoubleSignum, DoubleSum, DoubleSum.Final, DoubleSum.Initial, DoubleSum.Intermediate, DoubleUlp, ExtremalTupleByNthField, ExtremalTupleByNthField.HelperClass, FilterFunc, FindQuantiles, FloatAbs, FloatAbs, FloatAvg, FloatAvg.Final, FloatAvg.Initial, FloatAvg.Intermediate, FloatCopySign, FloatGetExponent, FloatMax, FloatMax, FloatMax.Final, FloatMax.Initial, FloatMax.Intermediate, FloatMin, FloatMin, FloatMin.Final, FloatMin.Initial, FloatMin.Intermediate, FloatNextAfter, FloatNextup, FloatRound, FloatRound, FloatSignum, FloatSum, FloatSum.Final, FloatSum.Initial, FloatSum.Intermediate, FloatUlp, GenericInvoker, getExponent, GetMemNumRows, GFAny, GFCross, GFReplicate, HashFNV, HostExtractor, IdentityColumn, INDEXOF, INDEXOF, IntAbs, IntAbs, IntAvg, IntAvg.Final, IntAvg.Initial, IntAvg.Intermediate, IntMax, IntMax, IntMax.Final, IntMax.Initial, IntMax.Intermediate, IntMin, IntMin, IntMin.Final, IntMin.Initial, IntMin.Intermediate, IntSum, IntSum.Final, IntSum.Initial, IntSum.Intermediate, ISODaysBetween, ISOHoursBetween, ISOMinutesBetween, ISOMonthsBetween, ISOSecondsBetween, ISOToDay, ISOToHour, ISOToMinute, ISOToMonth, ISOToSecond, ISOToUnix, ISOToWeek, ISOToYear, ISOYearsBetween, JsFunction, JythonFunction, LAST_INDEX_OF, LASTINDEXOF, LcFirst, LCFIRST, LENGTH, LongAbs, LongAbs, LongAvg, LongAvg.Final, LongAvg.Initial, LongAvg.Intermediate, LongMax, LongMax, LongMax.Final, LongMax.Initial, LongMax.Intermediate, LongMin, LongMin, LongMin.Final, LongMin.Initial, LongMin.Intermediate, LongSum, LongSum.Final, LongSum.Initial, LongSum.Intermediate, LookupInFiles, LOWER, LOWER, MapSize, MAX, MAX, MAX.Final, MAX.Initial, MAX.Intermediate, MaxTupleBy1stField, MaxTupleBy1stField.Final, MaxTupleBy1stField.Initial, MaxTupleBy1stField.Intermediate, MIN, MIN, MIN.Final, MIN.Initial, MIN.Intermediate, nextAfter, NEXTUP, PartitionSkewedKeys, RANDOM, RANDOM, ReadScalars, REGEX_EXTRACT, REGEX_EXTRACT_ALL, RegexExtract, RegexExtractAll, RegexMatch, REPLACE, REPLACE, Reverse, ROUND, ROUND, SCALB, SearchEngineExtractor, SearchQuery, SearchTermExtractor, SIGNUM, SIZE, Split, StringConcat, StringMax, StringMax.Final, StringMax.Initial, StringMax.Intermediate, StringMin, StringMin.Final, StringMin.Initial, StringMin.Intermediate, StringSize, STRSPLIT, SUBSTRING, SUBSTRING, SUM, SUM.Final, SUM.Initial, SUM.Intermediate, ToBag, TOBAG, TOKENIZE, TOMAP, Top, TOP, Top.Final, TOP.Final, Top.Initial, TOP.Initial, Top.Intermed, TOP.Intermed, ToTuple, TOTUPLE, Trim, TRIM, TupleSize, UcFirst, UCFIRST, ULP, UnixToISO, UPPER, UPPER

public abstract class EvalFunc<T>
extends Object

The class is used to implement functions to be applied to fields in a dataset. The function is applied to each Tuple in the set. The programmer should not make assumptions about state maintained between invocations of the exec() method since the Pig runtime will schedule and localize invocations based on information provided at runtime. The programmer also should not make assumptions about when or how many times the class will be instantiated, since it may be instantiated multiple times in both the front and back end.

Field Summary
protected  org.apache.commons.logging.Log log
          Logging object.
protected  PigLogger pigLogger
          Logger for aggregating warnings.
protected  PigProgressable reporter
          Reporter to send heartbeats to Hadoop.
protected  Type returnType
          Return type of this instance of EvalFunc.
Constructor Summary
Method Summary
abstract  T exec(Tuple input)
          This callback method must be implemented by all subclasses.
 void finish()
          Placeholder for cleanup to be performed at the end.
 List<FuncSpec> getArgToFuncMapping()
          Allow a UDF to specify type specific implementations of itself.
 List<String> getCacheFiles()
          Allow a UDF to specify a list of files it would like placed in the distributed cache.
 org.apache.commons.logging.Log getLogger()
 PigLogger getPigLogger()
 PigProgressable getReporter()
 Type getReturnType()
          Get the Type that this EvalFunc returns.
protected  String getSchemaName(String name, Schema input)
 boolean isAsynchronous()
 Schema outputSchema(Schema input)
          Report the schema of the output of this UDF.
 void progress()
          Utility method to allow UDF to report progress.
 void setPigLogger(PigLogger pigLogger)
          Set the PigLogger object.
 void setReporter(PigProgressable reporter)
          Set the reporter.
 void warn(String msg, Enum warningEnum)
          Issue a warning.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail


protected PigProgressable reporter
Reporter to send heartbeats to Hadoop. If exec will take more than a a few seconds PigProgressable.progress() should be called occasionally to avoid timeouts. Default Hadoop timeout is 600 seconds.


protected org.apache.commons.logging.Log log
Logging object. Log calls made on the front end will be sent to pig's log on the client. Log calls made on the backend will be sent to stdout and can be seen in the Hadoop logs.


protected PigLogger pigLogger
Logger for aggregating warnings. Any warnings to be sent to the user should be logged to this via PigLogger.warn(java.lang.Object, java.lang.String, java.lang.Enum).


protected Type returnType
Return type of this instance of EvalFunc.

Constructor Detail


public EvalFunc()
Method Detail


protected String getSchemaName(String name,
                               Schema input)


public Type getReturnType()
Get the Type that this EvalFunc returns.



public final void progress()
Utility method to allow UDF to report progress. If exec will take more than a a few seconds PigProgressable.progress() should be called occasionally to avoid timeouts. Default Hadoop timeout is 600 seconds.


public final void warn(String msg,
                       Enum warningEnum)
Issue a warning. Warning messages are aggregated and reported to the user.

msg - String message of the warning
warningEnum - type of warning


public void finish()
Placeholder for cleanup to be performed at the end. User defined functions can override. Default implementation is a no-op.


public abstract T exec(Tuple input)
                throws IOException
This callback method must be implemented by all subclasses. This is the method that will be invoked on every Tuple of a given dataset. Since the dataset may be divided up in a variety of ways the programmer should not make assumptions about state that is maintained between invocations of this method.

input - the Tuple to be processed.
result, of type T.


public Schema outputSchema(Schema input)
Report the schema of the output of this UDF. Pig will make use of this in error checking, optimization, and planning. The schema of input data to this UDF is provided.

input - Schema of the input
Schema of the output


public boolean isAsynchronous()

This function should be overriden to return true for functions that return their values asynchronously. Currently pig never attempts to execute a function asynchronously.

true if the function can be executed asynchronously.


public PigProgressable getReporter()


public final void setReporter(PigProgressable reporter)
Set the reporter. Called by Pig to provide a reference of the reporter to the UDF.

reporter - Hadoop reporter


public List<FuncSpec> getArgToFuncMapping()
                                   throws FrontendException
Allow a UDF to specify type specific implementations of itself. For example, an implementation of arithmetic sum might have int and float implementations, since integer arithmetic performs much better than floating point arithmetic. Pig's typechecker will call this method and using the returned list plus the schema of the function's input data, decide which implementation of the UDF to use.

A List containing FuncSpec objects representing the EvalFunc class which can handle the inputs corresponding to the schema in the objects. Each FuncSpec should be constructed with a schema that describes the input for that implementation. For example, the sum function above would return two elements in its list:
  1. FuncSpec(this.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.DOUBLE)))
  2. FuncSpec(IntSum.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.INTEGER)))
This would indicate that the main implementation is used for doubles, and the special implementation IntSum is used for ints.


public List<String> getCacheFiles()
Allow a UDF to specify a list of files it would like placed in the distributed cache. These files will be put in the cache for every job the UDF is used in. The default implementation returns null.

A list of files


public PigLogger getPigLogger()


public final void setPigLogger(PigLogger pigLogger)
Set the PigLogger object. Called by Pig to provide a reference to the UDF.

pigLogger - PigLogger object.


public org.apache.commons.logging.Log getLogger()

Copyright © ${year} The Apache Software Foundation