org.apache.pig.builtin
Class Distinct

java.lang.Object
  extended by org.apache.pig.EvalFunc<DataBag>
      extended by org.apache.pig.builtin.Distinct
All Implemented Interfaces:
Algebraic

public class Distinct
extends EvalFunc<DataBag>
implements Algebraic

Find the distinct set of tuples in a bag. This is a blocking operator. All the input is put in the hashset implemented in DistinctDataBag which also provides the other DataBag interfaces.


Nested Class Summary
static class Distinct.Final
           
static class Distinct.Initial
           
static class Distinct.Intermediate
           
 
Nested classes/interfaces inherited from class org.apache.pig.EvalFunc
EvalFunc.SchemaType
 
Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
Distinct()
           
 
Method Summary
 DataBag exec(Tuple input)
          This callback method must be implemented by all subclasses.
protected  DataBag getDistinct(Tuple input)
           
 String getFinal()
          Get the final function.
 String getInitial()
          Get the initial function.
 String getIntermed()
          Get the intermediate function.
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, outputSchema, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Distinct

public Distinct()
Method Detail

exec

public DataBag exec(Tuple input)
             throws IOException
Description copied from class: EvalFunc
This callback method must be implemented by all subclasses. This is the method that will be invoked on every Tuple of a given dataset. Since the dataset may be divided up in a variety of ways the programmer should not make assumptions about state that is maintained between invocations of this method.

Specified by:
exec in class EvalFunc<DataBag>
Parameters:
input - the Tuple to be processed.
Returns:
result, of type T.
Throws:
IOException

getFinal

public String getFinal()
Description copied from interface: Algebraic
Get the final function.

Specified by:
getFinal in interface Algebraic
Returns:
A function name of f_final. f_final should be an eval func parametrized by the same datum as the eval func implementing this interface.

getInitial

public String getInitial()
Description copied from interface: Algebraic
Get the initial function.

Specified by:
getInitial in interface Algebraic
Returns:
A function name of f_init. f_init should be an eval func. The return type of f_init.exec() has to be Tuple

getIntermed

public String getIntermed()
Description copied from interface: Algebraic
Get the intermediate function.

Specified by:
getIntermed in interface Algebraic
Returns:
A function name of f_intermed. f_intermed should be an eval func. The return type of f_intermed.exec() has to be Tuple

getDistinct

protected DataBag getDistinct(Tuple input)
                       throws IOException
Throws:
IOException


Copyright © 2007-2012 The Apache Software Foundation