org.apache.pig.builtin
Class COV

java.lang.Object
  extended by org.apache.pig.EvalFunc<DataBag>
      extended by org.apache.pig.builtin.COV
All Implemented Interfaces:
Algebraic

public class COV
extends EvalFunc<DataBag>
implements Algebraic

Computes the covariance between sets of data. The returned value will be a bag which will contain a tuple for each combination of input schema and inside tuple we will have two schema name and covariance between those two schemas. A = load 'input.xml' using PigStorage(':');
B = group A all;
D = foreach B generate group,COV(A.$0,A.$1,A.$2);


Nested Class Summary
static class COV.Final
           
static class COV.Initial
           
static class COV.Intermed
           
 
Nested classes/interfaces inherited from class org.apache.pig.EvalFunc
EvalFunc.SchemaType
 
Field Summary
protected  Vector<String> schemaName
           
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
COV()
           
COV(String... schemaName)
           
 
Method Summary
protected static Tuple combine(DataBag values)
          combine results of different data chunk
protected static Tuple computeAll(DataBag first, DataBag second)
          compute sum(XY), sum(X), sum(Y) from given data sets
 DataBag exec(Tuple input)
          Function to compute covariance between data sets.
 String getFinal()
          Get the final function.
 String getInitial()
          Get the initial function.
 String getIntermed()
          Get the intermediate function.
 Schema outputSchema(Schema input)
          Report the schema of the output of this UDF.
 String toString()
          Function to return argument of constructor as string.
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

schemaName

protected Vector<String> schemaName
Constructor Detail

COV

public COV()

COV

public COV(String... schemaName)
Method Detail

exec

public DataBag exec(Tuple input)
             throws IOException
Function to compute covariance between data sets.

Specified by:
exec in class EvalFunc<DataBag>
Parameters:
input - input tuple which contains data sets.
Returns:
output output dataBag which contain covariance between each pair of data sets.
Throws:
IOException

toString

public String toString()
Function to return argument of constructor as string. It append ( and ) at starting and end or argument respectively. If default constructor is called is returns empty string.

Overrides:
toString in class Object
Returns:
argument of constructor

getInitial

public String getInitial()
Description copied from interface: Algebraic
Get the initial function.

Specified by:
getInitial in interface Algebraic
Returns:
A function name of f_init. f_init should be an eval func. The return type of f_init.exec() has to be Tuple

getIntermed

public String getIntermed()
Description copied from interface: Algebraic
Get the intermediate function.

Specified by:
getIntermed in interface Algebraic
Returns:
A function name of f_intermed. f_intermed should be an eval func. The return type of f_intermed.exec() has to be Tuple

getFinal

public String getFinal()
Description copied from interface: Algebraic
Get the final function.

Specified by:
getFinal in interface Algebraic
Returns:
A function name of f_final. f_final should be an eval func parametrized by the same datum as the eval func implementing this interface.

combine

protected static Tuple combine(DataBag values)
                        throws IOException
combine results of different data chunk

Parameters:
values - DataBag containing partial results computed on different data chunks
Returns:
output Tuple containing combined data
Throws:
IOException

computeAll

protected static Tuple computeAll(DataBag first,
                                  DataBag second)
                           throws IOException
compute sum(XY), sum(X), sum(Y) from given data sets

Parameters:
first - DataBag containing first data set
second - DataBag containing second data set
Returns:
tuple containing sum(XY), sum(X), sum(Y)
Throws:
IOException

outputSchema

public Schema outputSchema(Schema input)
Description copied from class: EvalFunc
Report the schema of the output of this UDF. Pig will make use of this in error checking, optimization, and planning. The schema of input data to this UDF is provided.

The default implementation interprets the OutputSchema annotation, if one is present. Otherwise, it returns null (no known output schema).

Overrides:
outputSchema in class EvalFunc<DataBag>
Parameters:
input - Schema of the input
Returns:
Schema of the output


Copyright © 2007-2012 The Apache Software Foundation