ExtremalTupleByNthField (Pig 0.17.0 API)

java.lang.Object
- org.apache.pig.EvalFunc<Tuple>
- - org.apache.pig.piggybank.evaluation.ExtremalTupleByNthField

All Implemented Interfaces:

Accumulator<Tuple>, Algebraic
```
public class ExtremalTupleByNthField
extends EvalFunc<Tuple>
implements Algebraic, Accumulator<Tuple>
```
This class is similar to MaxTupleBy1stField except that it allows you to specify with field to use (instead of just using 1st field) and to specify ascending or descending. The first parameter in the constructor specifies which field to use and the second parameter to the constructor specifies which extremal to retrieve. Strings prefixed by "min", "least", "desc", "small" and "-", irrespective of capitalization and leading white spacing, specifies the computation of the minimum and all other strings means maximum;
Example1: Invoking the UDF
e.g. Using this udf:
define myMin ExtremalTupleByNthField( '4', 'min' ); T = group G ALL; R = foreach T generate myMin(G); is equivalent to:
T = order G by $3 asc; R = limit G 1; Note above 4 indicates the field with index 3 in the tuple. The 4th field can be any comparable type, so you can use float, int, string, or even tuples. By default constructor, this UDF behaves as MaxTupleBy1stField in that it chooses the max tuple by the comparable in the first field.
Example 2: Default behavior
This class also has a one parameter constructor that specifies the index and takes the max tuple from the bag. define myMax ExtremalTupleByNthField( '3' ); T = group G ALL; R = foreach T generate myMax(G); is equivalent to:
T = order G by $2 desc; R = limit G 1;
Example 3: Choosing a Large Bag or Tuple
Another possible use case is the choosing of larger or smaller bags/tuples. In pig, bags and tuples are comparable and the comparison is based on size. define biggestBag ExtremalTupleByNthField('1', max); R = group TABLE by (key1, key2); G = cogroup L by key1, R by group.key1; V = foreach G generate L, biggestBag(R); This results in each L(eft) bag associated with only the largest bag from the R(ight) table. If all bags in R are of equal size, the comparator continues on to perform element-wise comparison. In case of a complete tie in the comparison, which result is returned is nondeterministic. But because this class is able to compare any comparable we are able to specify a secondary key.
Example 4: Secondary Sort Key
define biggestBag ExtremalTupleByNthField('1', max); G = cogroup L by key1, M by key1, R by key1; V = foreach G generate FLATTEN(L), biggestBag(R.($0, $1, $2, $5)) as best_result_by_0, biggestBag(R.($3, $1, $2, $5)) as best_result_by_3, biggestBag(M.($0, $2)) as best_misc_data; this will generate two sets of results and misc data based on two separate criterion. Since all tuples in the bags have the same size (4, 4, 2 respectively), the tuple comparator continues on and compares the members of tuples until it finds one. best_result_by_0 and best_result_by3 are ordered by 1st and 4th member of the tuples. Within each group, ties are broken by second and third field. Finally, note that the udf implements both Algebraic and Accumulator, so it is relatively efficient because it's a one-pass algorithm.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class ExtremalTupleByNthField.HelperClass
Utility classes and methods
- Nested classes/interfaces inherited from class org.apache.pig.EvalFunc
  EvalFunc.SchemaType

Nested Classes
Modifier and Type	Class and Description
`static class`	`ExtremalTupleByNthField.HelperClass` Utility classes and methods

Field Summary
- Fields inherited from class org.apache.pig.EvalFunc
  log, pigLogger, reporter, returnType

Constructor Summary

Constructors
Constructor and Description
`ExtremalTupleByNthField()` Constructors
`ExtremalTupleByNthField(String fieldIndexString)`
`ExtremalTupleByNthField(String fieldIndexString, String order)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`accumulate(Tuple b)` Pass tuples to the UDF.
`void`	`cleanup()` Called after getValue() to prepare processing for next key.
`Tuple`	`exec(Tuple input)` The EvalFunc interface
`protected static Tuple`	`extreme(int pind, int psign, Tuple input, PigProgressable reporter)`
`String`	`getFinal()` Get the final function.
`String`	`getInitial()` Algebraic interface
`String`	`getIntermed()` Get the intermediate function.
`Type`	`getReturnType()` Get the Type that this EvalFunc returns.
`Tuple`	`getValue()` Called when all tuples from current key have been passed to accumulate.
`Schema`	`outputSchema(Schema input)` Report the schema of the output of this UDF.
`protected static int`	`parseFieldIndex(String inputFieldIndex)`
`protected static int`	`parseOrdering(String order)`

Methods inherited from class org.apache.pig.EvalFunc
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLoadCaster, getLogger, getPigLogger, getReporter, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, needEndOfAllInputProcessing, progress, setEndOfAllInput, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ExtremalTupleByNthField
```
public ExtremalTupleByNthField()
                        throws ExecException
```
    Constructors
    
    Throws:
    
    ExecException
  - ExtremalTupleByNthField
```
public ExtremalTupleByNthField(String fieldIndexString)
                        throws ExecException
```
    Throws:
    
    ExecException
  - ExtremalTupleByNthField
```
public ExtremalTupleByNthField(String fieldIndexString,
                               String order)
                        throws ExecException
```
    Throws:
    
    ExecException
- Method Detail
  - exec
```
public Tuple exec(Tuple input)
           throws IOException
```
    The EvalFunc interface
    
    Specified by:
    
    exec in class EvalFunc<Tuple>
    
    Parameters:
    
    input - the Tuple to be processed.
    
    Returns:
    
    result, of type T.
    
    Throws:
    
    IOException
  - getReturnType
```
public Type getReturnType()
```
    Description copied from class: EvalFunc
    
    Get the Type that this EvalFunc returns.
    
    Overrides:
    
    getReturnType in class EvalFunc<Tuple>
    
    Returns:
    
    Type
  - outputSchema
```
public Schema outputSchema(Schema input)
```
    Description copied from class: EvalFunc
    
    Report the schema of the output of this UDF. Pig will make use of this in error checking, optimization, and planning. The schema of input data to this UDF is provided.
    The default implementation interprets the OutputSchema annotation, if one is present. Otherwise, it returns null (no known output schema).
    
    Overrides:
    
    outputSchema in class EvalFunc<Tuple>
    
    Parameters:
    
    input - Schema of the input
    
    Returns:
    
    Schema of the output
  - getInitial
```
public String getInitial()
```
    Algebraic interface
    
    Specified by:
    
    getInitial in interface Algebraic
    
    Returns:
    
    A function name of f_init. f_init should be an eval func. The return type of f_init.exec() has to be Tuple
  - getIntermed
```
public String getIntermed()
```
    Description copied from interface: Algebraic
    
    Get the intermediate function.
    
    Specified by:
    
    getIntermed in interface Algebraic
    
    Returns:
    
    A function name of f_intermed. f_intermed should be an eval func. The return type of f_intermed.exec() has to be Tuple
  - getFinal
```
public String getFinal()
```
    Description copied from interface: Algebraic
    
    Get the final function.
    
    Specified by:
    
    getFinal in interface Algebraic
    
    Returns:
    
    A function name of f_final. f_final should be an eval func parametrized by the same datum as the eval func implementing this interface.
  - accumulate
```
public void accumulate(Tuple b)
                throws IOException
```
    Description copied from interface: Accumulator
    
    Pass tuples to the UDF.
    
    Specified by:
    
    accumulate in interface Accumulator<Tuple>
    
    Parameters:
    
    b - A tuple containing a single field, which is a bag. The bag will contain the set of tuples being passed to the UDF in this iteration.
    
    Throws:
    
    IOException
  - cleanup
```
public void cleanup()
```
    Description copied from interface: Accumulator
    
    Called after getValue() to prepare processing for next key.
    
    Specified by:
    
    cleanup in interface Accumulator<Tuple>
  - getValue
```
public Tuple getValue()
```
    Description copied from interface: Accumulator
    
    Called when all tuples from current key have been passed to accumulate.
    
    Specified by:
    
    getValue in interface Accumulator<Tuple>
    
    Returns:
    
    the value for the UDF for this key.
  - extreme
```
protected static final Tuple extreme(int pind,
                                     int psign,
                                     Tuple input,
                                     PigProgressable reporter)
                              throws ExecException
```
    Throws:
    
    ExecException
  - parseFieldIndex
```
protected static int parseFieldIndex(String inputFieldIndex)
                              throws ExecException
```
    Throws:
    
    ExecException
  - parseOrdering
```
protected static int parseOrdering(String order)
```

Class ExtremalTupleByNthField

Example1: Invoking the UDF

Example 2: Default behavior

Example 3: Choosing a Large Bag or Tuple

Example 4: Secondary Sort Key

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.pig.EvalFunc

Field Summary

Fields inherited from class org.apache.pig.EvalFunc

Constructor Summary

Method Summary

Methods inherited from class org.apache.pig.EvalFunc

Methods inherited from class java.lang.Object

Constructor Detail

ExtremalTupleByNthField

ExtremalTupleByNthField

ExtremalTupleByNthField

Method Detail

exec

getReturnType

outputSchema

getInitial

getIntermed

getFinal

accumulate

cleanup

getValue

extreme

parseFieldIndex

parseOrdering