public class CubeDimensions extends EvalFunc<DataBag>
{ (a, b, c), (null, null, null), (a, b, null), (a, null, c), (a, null, null), (null, b, c), (null, null, c), (null, b, null) }
The "all" marker is null by default, but can be set to an arbitrary string by invoking a constructor (via a DEFINE). The constructor takes a single argument, the string you want to represent "all".
Usage goes something like this:
events = load '/logs/events' using EventLoader() as (lang, event, app_id);
cubed = foreach x generate
FLATTEN(piggybank.CubeDimensions(lang, event, app_id))
as (lang, event, app_id),
measure;
cube = foreach (group cubed
by (lang, event, app_id) parallel $P)
generate
flatten(group) as (lang, event, app_id),
COUNT_STAR(cubed),
SUM(measure);
store cube into 'event_cube';
Note: doing this with non-algebraic aggregations on large data can result in very slow reducers, since one of the groups is going to get all the records in your relation.
EvalFunc.SchemaType
log, pigLogger, reporter, returnType
Constructor and Description |
---|
CubeDimensions() |
CubeDimensions(String allMarker) |
Modifier and Type | Method and Description |
---|---|
boolean |
allowCompileTimeCalculation()
Whether the UDF should be evaluated at compile time if all inputs are constant.
|
static void |
convertNullToUnknown(Tuple tuple) |
DataBag |
exec(Tuple tuple)
This callback method must be implemented by all subclasses.
|
Schema |
outputSchema(Schema input)
Report the schema of the output of this UDF.
|
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, needEndOfAllInputProcessing, progress, setEndOfAllInput, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
public CubeDimensions()
public CubeDimensions(String allMarker)
public DataBag exec(Tuple tuple) throws IOException
EvalFunc
exec
in class EvalFunc<DataBag>
tuple
- the Tuple to be processed.IOException
public static void convertNullToUnknown(Tuple tuple) throws ExecException
ExecException
public Schema outputSchema(Schema input)
EvalFunc
The default implementation interprets the OutputSchema
annotation,
if one is present. Otherwise, it returns null
(no known output schema).
outputSchema
in class EvalFunc<DataBag>
input
- Schema of the inputpublic boolean allowCompileTimeCalculation()
EvalFunc
allowCompileTimeCalculation
in class EvalFunc<DataBag>
Copyright © 2007-2012 The Apache Software Foundation