org.apache.pig.builtin
Class TOBAG

java.lang.Object
  extended by org.apache.pig.EvalFunc<DataBag>
      extended by org.apache.pig.builtin.TOBAG

public class TOBAG
extends EvalFunc<DataBag>

This class takes a list of items and puts them into a bag T = foreach U generate TOBAG($0, $1, $2); It's like saying this: T = foreach U generate {($0), ($1), ($2)} All arguments that are not of tuple type are inserted into a tuple before being added to the bag. This is because bag is always a bag of tuples. Output schema: The output schema for this udf depends on the schema of its arguments. If all the arguments have same type and same inner schema (for bags/tuple columns), then the udf output schema would be a bag of tuples having a column of the type and inner-schema (if any) of the arguments. If the arguments are of type tuple/bag, then their inner schemas should match, though schema field aliases may differ. If these conditions are not met the output schema will be a bag with null inner schema. example 1 grunt> describe a; a: {a0: int,a1: int} grunt> b = foreach a generate TOBAG(a0,a1); grunt> describe b; b: {{int}} example 2 grunt> describe a; a: {a0: (x: int),a1: (x: int)} grunt> b = foreach a generate TOBAG(a0,a1); grunt> describe b; b: {{(x: int)}} example 3 grunt> describe a; a: {a0: (x: int),a1: (y: int)} -- note that the inner schemas have matching types but different field aliases. -- the aliases of the first argument (a0) will be used in output schema: grunt> b = foreach a generate TOBAG(a0,a1); grunt> describe b; b: {{(x: int)}} example 4 grunt> describe a; a: {a0: (x: int),a1: (x: chararray)} -- here the inner schemas do not match, so output schema is not well defined: grunt> b = foreach a generate TOBAG(a0,a1); grunt> describe b; b: {{NULL}}


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.pig.EvalFunc
EvalFunc.SchemaType
 
Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
TOBAG()
           
 
Method Summary
 DataBag exec(Tuple input)
          This callback method must be implemented by all subclasses.
 Schema outputSchema(Schema inputSch)
          Report the schema of the output of this UDF.
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TOBAG

public TOBAG()
Method Detail

exec

public DataBag exec(Tuple input)
             throws IOException
Description copied from class: EvalFunc
This callback method must be implemented by all subclasses. This is the method that will be invoked on every Tuple of a given dataset. Since the dataset may be divided up in a variety of ways the programmer should not make assumptions about state that is maintained between invocations of this method.

Specified by:
exec in class EvalFunc<DataBag>
Parameters:
input - the Tuple to be processed.
Returns:
result, of type T.
Throws:
IOException

outputSchema

public Schema outputSchema(Schema inputSch)
Description copied from class: EvalFunc
Report the schema of the output of this UDF. Pig will make use of this in error checking, optimization, and planning. The schema of input data to this UDF is provided.

The default implementation interprets the OutputSchema annotation, if one is present. Otherwise, it returns null (no known output schema).

Overrides:
outputSchema in class EvalFunc<DataBag>
Parameters:
inputSch - Schema of the input
Returns:
Schema of the output


Copyright © 2007-2012 The Apache Software Foundation