Package org.apache.pig.builtin

This package contains builtin Pig UDFs.

See:
          Description

Interface Summary
InvokerFunction  
 

Class Summary
ABS ABS implements a binding to the Java function Math.abs(double) for computing the absolute value of the argument.
ACOS ACOS implements a binding to the Java function Math.acos(double) for computing the arc cosine of value of the argument.
AddDuration AddDuration returns the result of a DateTime object plus a Duration object
AlgebraicBigDecimalMathBase Core logic for applying an SUM function to a bag of BigDecimals.
AlgebraicBigDecimalMathBase.Final  
AlgebraicBigDecimalMathBase.Intermediate  
AlgebraicBigIntegerMathBase Core logic for applying an SUM function to a bag of BigIntegers.
AlgebraicBigIntegerMathBase.Final  
AlgebraicBigIntegerMathBase.Intermediate  
AlgebraicByteArrayMathBase Core logic for applying an accumulative/algebraic math function to a bag of doubles.
AlgebraicByteArrayMathBase.Final  
AlgebraicByteArrayMathBase.Initial  
AlgebraicByteArrayMathBase.Intermediate  
AlgebraicDoubleMathBase Core logic for applying an accumulative/algebraic math function to a bag of doubles.
AlgebraicDoubleMathBase.Final  
AlgebraicDoubleMathBase.Intermediate  
AlgebraicFloatMathBase Core logic for applying an accumulative/algebraic math function to a bag of Floats.
AlgebraicFloatMathBase.Final  
AlgebraicFloatMathBase.Intermediate  
AlgebraicIntMathBase Core logic for applying an accumulative/algebraic math function to a bag of doubles.
AlgebraicIntMathBase.Final  
AlgebraicIntMathBase.Intermediate  
AlgebraicLongMathBase Core logic for applying an accumulative/algebraic math function to a bag of Longs.
AlgebraicLongMathBase.Final  
AlgebraicLongMathBase.Intermediate  
ARITY Deprecated. Use SIZE instead.
ASIN ASIN implements a binding to the Java function Math.asin(double) for computing the arc sine of value of the argument.
Assert  
ATAN ATAN implements a binding to the Java function Math.atan(double) for computing the arc tangent of value of the argument.
AVG Generates the average of a set of values.
AVG.Final  
AVG.Initial  
AVG.Intermediate  
AvroStorage Pig UDF for reading and writing Avro data.
BagSize This method should never be used directly, use SIZE.
BagToString Flatten a bag into a string.
BagToTuple Flatten a bag into a tuple.
Base base class for math udfs
BigDecimalAbs  
BigDecimalAvg This method should never be used directly, use AVG.
BigDecimalAvg.Final  
BigDecimalAvg.Initial  
BigDecimalAvg.Intermediate  
BigDecimalMax This method should never be used directly, use MAX.
BigDecimalMax.Final  
BigDecimalMax.Intermediate  
BigDecimalMin This method should never be used directly, use MIN.
BigDecimalMin.Final  
BigDecimalMin.Intermediate  
BigDecimalSum This method should never be used directly, use SUM.
BigDecimalSum.Final  
BigDecimalSum.Intermediate  
BigDecimalWrapper Max and min seeds cannot be defined to BigDecimal as the value could go as large as The computer allows.
BigIntegerAbs  
BigIntegerAvg This method should never be used directly, use AVG.
BigIntegerAvg.Final  
BigIntegerAvg.Initial  
BigIntegerAvg.Intermediate  
BigIntegerMax This method should never be used directly, use MAX.
BigIntegerMax.Final  
BigIntegerMax.Intermediate  
BigIntegerMin This method should never be used directly, use MIN.
BigIntegerMin.Final  
BigIntegerMin.Intermediate  
BigIntegerSum This method should never be used directly, use SUM.
BigIntegerSum.Final  
BigIntegerSum.Intermediate  
BigIntegerWrapper Max and min seeds cannot be defined to BigInteger as the value could go as large as The computer allows.
BinStorage Load and store data in a binary format.
Bloom Use a Bloom filter build previously by BuildBloom.
BuildBloom Build a bloom filter for use later in Bloom.
BuildBloom.Final  
BuildBloom.Initial  
BuildBloom.Intermediate  
BuildBloomBase<T> A Base class for BuildBloom and its Algebraic implementations.
CBRT CBRT implements a binding to the Java function Math.cbrt(double) for computing the cube root of the argument.
CEIL CEIL implements a binding to the Java function Math.ceil(double).
CONCAT Generates the concatenation of two or more arguments.
ConstantSize This method should never be used directly, use SIZE.
COR Computes the correlation between sets of data.
COR.Final  
COR.Initial  
COR.Intermed  
COS COS implements a binding to the Java function Math.cos(double).
COSH COSH implements a binding to the Java function Math.cosh(double).
COUNT Generates the count of the number of values in a bag.
COUNT_STAR Generates the count of the values of the first field of a tuple.
COUNT_STAR.Final  
COUNT_STAR.Initial  
COUNT_STAR.Intermediate  
COUNT.Final  
COUNT.Initial  
COUNT.Intermediate  
COV Computes the covariance between sets of data.
COV.Final  
COV.Initial  
COV.Intermed  
CubeDimensions Produces a DataBag with all combinations of the argument tuple members as in a data cube.
CurrentTime  
DateTimeMax This method should never be used directly, use MAX.
DateTimeMax.Final  
DateTimeMax.Initial  
DateTimeMax.Intermediate  
DateTimeMin This method should never be used directly, use MAX.
DateTimeMin.Final  
DateTimeMin.Initial  
DateTimeMin.Intermediate  
DaysBetween DaysBetween returns the number of days between two DateTime objects
DIFF DIFF takes two bags as arguments and compares them.
Distinct Find the distinct set of tuples in a bag.
Distinct.Final  
Distinct.Initial  
Distinct.Intermediate  
DoubleAbs  
DoubleAvg This method should never be used directly, use AVG.
DoubleAvg.Final  
DoubleAvg.Initial  
DoubleAvg.Intermediate  
DoubleBase base class for math udfs that return Double value
DoubleMax This method should never be used directly, use MAX.
DoubleMax.Final  
DoubleMax.Intermediate  
DoubleMin This method should never be used directly, use MIN.
DoubleMin.Final  
DoubleMin.Intermediate  
DoubleRound Given a single data atom it Returns the closest long to the argument.
DoubleRoundTo ROUND_TO safely rounds a number to a given precision by using an intermediate BigDecimal.
DoubleSum This method should never be used directly, use SUM.
DoubleSum.Final  
DoubleSum.Intermediate  
ENDSWITH Pig UDF to test input tuple.get(0) against tuple.get(1) to determine if the first argument ends with the string in the second.
EqualsIgnoreCase Compares two Strings ignoring case considerations.
EXP Given a single data atom it returns the Euler's number e raised to the power of input
FloatAbs  
FloatAvg This method should never be used directly, use AVG.
FloatAvg.Final  
FloatAvg.Initial  
FloatAvg.Intermediate  
FloatMax This method should never be used directly, use MAX.
FloatMax.Final  
FloatMax.Intermediate  
FloatMin This method should never be used directly, use MIN.
FloatMin.Final  
FloatMin.Intermediate  
FloatRound ROUND implements a binding to the Java function Math.round(float).
FloatRoundTo ROUND_TO safely rounds a number to a given precision by using an intermediate BigDecimal.
FloatSum This method should never be used directly, use SUM.
FLOOR FLOOR implements a binding to the Java function Math.floor(double).
FunctionWrapperEvalFunc EvalFunc that wraps an implementation of the Function interface, which is passed as a String in the constructor.
GenericInvoker<T> The generic Invoker class does all the common grunt work of setting up an invoker.
GetDay GetDay extracts the day of a month from a DateTime object.
GetHour GetHour extracts the hour of a day from a DateTime object.
GetMilliSecond GetSecond extracts the millisecond of a second from a DateTime object.
GetMinute GetMinute extracts the minute of an hour from a DateTime object.
GetMonth GetMonth extracts the month of a year from a DateTime object.
GetSecond GetSecond extracts the second of a minute from a DateTime object.
GetWeek GetMonth extracts the week of a week year from a DateTime object.
GetWeekYear GetMonth extracts the week year from a DateTime object.
GetYear GetYear extracts the year from a DateTime object.
HoursBetween HoursBetween returns the number of hours between two DateTime objects
INDEXOF INDEXOF implements eval function to search for a string Example: A = load 'mydata' as (name); B = foreach A generate INDEXOF(name, ",");
IntAbs ABS implements a binding to the Java function Math.abs(int) for computing the absolute value of the argument.
IntAvg This method should never be used directly, use AVG.
IntAvg.Final  
IntAvg.Initial  
IntAvg.Intermediate  
IntMax This method should never be used directly, use MAX.
IntMax.Final  
IntMax.Intermediate  
IntMin This method should never be used directly, use MIN.
IntMin.Final  
IntMin.Intermediate  
IntSum This method should never be used directly, use SUM.
INVERSEMAP This UDF accepts a Map as input with values of any primitive data type.
InvokeForDouble  
InvokeForFloat  
InvokeForInt  
InvokeForLong  
InvokeForString  
Invoker<T>  
InvokerGenerator  
IsEmpty Determine whether a bag or map is empty.
JsonLoader A loader for data stored using JsonStorage.
JsonMetadata Reads and Writes metadata using JSON in metafiles next to the data.
JsonStorage A JSON Pig store function.
KEYSET This UDF takes a Map and returns a Bag containing the keyset.
LAST_INDEX_OF string.INSTR implements eval function to search for the last occurrence of a string Returns null on error Example: A = load 'mydata' as (name); B = foreach A generate LASTINDEXOF(name, ",");
LCFIRST lower-case the first character of a string
LOG LOG implements a binding to the Java function Math.log(double).
LOG10 LOG10 implements a binding to the Java function Math.log10(double).
LongAbs  
LongAvg This method should never be used directly, use AVG.
LongAvg.Final  
LongAvg.Initial  
LongAvg.Intermediate  
LongMax This method should never be used directly, use MAX.
LongMax.Final  
LongMax.Intermediate  
LongMin This method should never be used directly, use MIN.
LongMin.Final  
LongMin.Intermediate  
LongSum This method should never be used directly, use SUM.
LongSum.Final  
LongSum.Intermediate  
LOWER LOWER implements eval function to convert a string to lower case Example: A = load 'mydata' as (name); B = foreach A generate LOWER(name);
LTRIM Returns a string, with only leading whitespace omitted.
MapSize This method should never be used directly, use SIZE.
MAX Generates the maximum of a set of values.
MAX.Final  
MAX.Intermediate  
MilliSecondsBetween MilliSecondsBetween returns the number of milliseconds between two DateTime objects
MIN Generates the minimum of a set of values.
MIN.Final  
MIN.Intermediate  
MinutesBetween MinutesBetween returns the number of minutes between two DateTime objects
MonthsBetween MonthsBetween returns the number of months between two DateTime objects
ParquetLoader Wrapper class which will delegate calls to parquet.pig.ParquetLoader
ParquetStorer Wrapper class which will delegate calls to parquet.pig.ParquetStorer
PigStorage A load function that parses a line of input into fields using a character delimiter.
PigStreaming The default implementation of PigStreamingBase.
PluckTuple This is a UDF which allows the user to specify a string prefix, and then filter for the columns in a relation that begin with that prefix.
RANDOM Return a random double value.
REGEX_EXTRACT Syntax: String RegexExtract(String expression, String regex, int match_index). Input: expression-source string. regex-regular expression. match_index-index of the group to extract. Output: extracted group, if fail, return null. Matching strategy: Try to only match the first sequence by using Matcher.find() instead of Matcher.matches() (default useMatches=false). DEFINE NON_GREEDY_EXTRACT REGEX_EXTRACT('true');
REGEX_EXTRACT_ALL Syntax: String RegexExtractAll(String expression, String regex). Input: expression-source string. regex-regular expression. Output: A tuple of matched strings. Matching strategy: Trying to match the entire input by using Matcher.matches() instead of Matcher.find() (default useMatches=true). DEFINE GREEDY_EXTRACT REGEX_EXTRACT_ALL('false');
REPLACE REPLACE implements eval function to replace part of a string.
RollupDimensions Produces a DataBag with hierarchy of values (from the most detailed level of aggregation to most general level of aggregation) of the specified dimensions For example, (a, b, c) will produce the following bag:
ROUND ROUND implements a binding to the Java function Math.round(double).
ROUND_TO ROUND_TO safely rounds a number to a given precision by using an intermediate BigDecimal.
RTRIM Returns a string, with only tailing whitespace omitted.
SecondsBetween SecondsBetween returns the number of seconds between two DateTime objects
SIN SIN implements a binding to the Java function Math.sin(double).
SINH SINH implements a binding to the Java function Math.sinh(double).
SIZE Generates the size of the argument passed to it.
SQRT SQRT implements a binding to the Java function Math.sqrt(double).
STARTSWITH Pig UDF to test input tuple.get(0) against tuple.get(1) to determine if the first argument starts with the string in the second.
StringConcat This method should never be used directly, use CONCAT.
StringMax This method should never be used directly, use MAX.
StringMax.Final  
StringMax.Initial  
StringMax.Intermediate  
StringMin This method should never be used directly, use MIN.
StringMin.Final  
StringMin.Initial  
StringMin.Intermediate  
StringSize This method should never be used directly, use SIZE.
STRSPLIT Wrapper around Java's String.split
input tuple: first column is assumed to have a string to split;
the optional second column is assumed to have the delimiter or regex to split on;
if not provided, it's assumed to be '\s' (space)
the optional third column may provide a limit to the number of results.
If limit is not provided, 0 is assumed, as per Java's split().
SUBSTRING SUBSTRING implements eval function to get a part of a string.
SUBTRACT SUBTRACT takes two bags as arguments and returns a new bag composed of tuples of first bag not in the second bag.
If null, bag arguments are replaced by empty bags.
SubtractDuration SubtractDuration returns the result of a DateTime object plus a Duration object
SUM Generates the sum of a set of values.
SUM.Final  
SUM.Intermediate  
TAN TAN implements a binding to the Java function Math.tan(double).
TANH TANH implements a binding to the Java function Math.tanh(double).
TextLoader This load function simply creates a tuple for each line of text that has a single chararray field that contains the line of text.
TOBAG This class takes a list of items and puts them into a bag T = foreach U generate TOBAG($0, $1, $2); It's like saying this: T = foreach U generate {($0), ($1), ($2)} All arguments that are not of tuple type are inserted into a tuple before being added to the bag.
ToDate ToDate converts the ISO or the customized string or the Unix timestamp to the DateTime object.
ToDate2ARGS This method should never be used directly, use ToDate.
ToDate3ARGS This method should never be used directly, use ToDate.
ToDateISO This method should never be used directly, use ToDate.
TOKENIZE Given a chararray as an argument, this method will split the chararray and return a bag with a tuple for each chararray that results from the split.
TOMAP This class makes a map out of the parameters passed to it T = foreach U generate TOMAP($0, $1, $2, $3); It generates a map $0->1, $2->$3
ToMilliSeconds ToMilliSeconds converts the DateTime to the number of milliseconds that have passed since January 1, 1970 00:00:00.000 GMT.
TOP Top UDF accepts a bag of tuples and returns top-n tuples depending upon the tuple field value of type long.
TOP.Final  
TOP.Initial  
TOP.Intermed  
ToString ToString converts the DateTime object of the ISO or the customized string.
TOTUPLE This class makes a tuple out of the parameter T = foreach U generate TOTUPLE($0, $1, $2); It generates a tuple containing $0, $1, and $2
ToUnixTime ToUnixTime converts the DateTime to the Unix Time Long
TrevniStorage Pig Store/Load Function for Trevni.
TRIM Returns a string, with leading and trailing whitespace omitted.
TupleSize This method should never be used directly, use SIZE.
UCFIRST upper-case the first character of a string
UPPER UPPER implements eval function to convert a string to upper case Example: A = load 'mydata' as (name); B = foreach A generate UPPER(name);
Utf8StorageConverter This abstract class provides standard conversions between utf8 encoded data and pig data types.
VALUELIST This UDF takes a Map and returns a Bag containing the values from map.
VALUESET This UDF takes a Map and returns a Tuple containing the value set.
WeeksBetween WeeksBetween returns the number of weeks between two DateTime objects
YearsBetween YearsBetween returns the number of years between two DateTime objects
 

Annotation Types Summary
MonitoredUDF Describes how the execution of a UDF should be monitored, and what to do if it times out.
Nondeterministic A non-deterministic UDF is one that can produce different results when invoked on the same input.
OutputSchema An EvalFunc can annotated with an OutputSchema to tell Pig what the expected output is.
 

Package org.apache.pig.builtin Description

This package contains builtin Pig UDFs. This includes EvalFuncs, LoadFuncs and StoreFuncs.



Copyright © 2007-2012 The Apache Software Foundation