org.apache.pig.piggybank.evaluation.util.apachelogparser
Class DateExtractor

java.lang.Object
  extended by org.apache.pig.EvalFunc<String>
      extended by org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor

public class DateExtractor
extends EvalFunc<String>

DateExtractor has four different constructors which each allow for different functionality. The incomingDateFormat ("dd/MMM/yyyy:HH:mm:ss Z" by default) is used to match the date string that gets passed in from the log. The outgoingDateFormat ("yyyy-MM-dd" by default) is used to format the returned string. Different constructors exist for each combination; please use the appropriate respective constructor. Note that any data that exists in the SimpleDateFormat schema can be supported. For example, if you were starting with the default incoming format and wanted to extract just the year, you would use the single string constructor DateExtractor("yyyy"). From pig latin you will need to use aliases to use a non-default format, like define MyDateExtractor org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor("yyyy-MM"); A = FOREACH row GENERATE DateExtractor(dayTime); If a string cannot be parsed, null will be returned and an error message printed to stderr. By default, the DateExtractor uses the GMT timezone. You can use the three-parameter constructor to override the timezone.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.pig.EvalFunc
EvalFunc.SchemaType
 
Field Summary
 
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
 
Constructor Summary
DateExtractor()
          forms the formats based on default incomingDateFormat and default outgoingDateFormat
DateExtractor(String outgoingDateString)
          forms the formats based on passed outgoingDateString and the default incomingDateFormat
DateExtractor(String incomingDateString, String outgoingDateString)
          forms the formats based on passed incomingDateString and outgoingDateString
DateExtractor(String incomingDateString, String outgoingDateString, String timeZoneID)
          forms the formats based on passed incomingDateString and outgoingDateString
 
Method Summary
 String exec(Tuple input)
          This callback method must be implemented by all subclasses.
 List<FuncSpec> getArgToFuncMapping()
          Allow a UDF to specify type specific implementations of itself.
 
Methods inherited from class org.apache.pig.EvalFunc
finish, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, outputSchema, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DateExtractor

public DateExtractor()
forms the formats based on default incomingDateFormat and default outgoingDateFormat

Parameters:
outgoingDateString - outgoingDateFormat is based on outgoingDateString

DateExtractor

public DateExtractor(String outgoingDateString)
forms the formats based on passed outgoingDateString and the default incomingDateFormat

Parameters:
outgoingDateString - outgoingDateFormat is based on outgoingDateString

DateExtractor

public DateExtractor(String incomingDateString,
                     String outgoingDateString)
forms the formats based on passed incomingDateString and outgoingDateString

Parameters:
incomingDateString - incomingDateFormat is based on incomingDateString
outgoingDateString - outgoingDateFormat is based on outgoingDateString

DateExtractor

public DateExtractor(String incomingDateString,
                     String outgoingDateString,
                     String timeZoneID)
forms the formats based on passed incomingDateString and outgoingDateString

Parameters:
incomingDateString - incomingDateFormat is based on incomingDateString
outgoingDateString - outgoingDateFormat is based on outgoingDateString
timeZoneID - time zone id in which dates should be expressed.
Method Detail

exec

public String exec(Tuple input)
            throws IOException
Description copied from class: EvalFunc
This callback method must be implemented by all subclasses. This is the method that will be invoked on every Tuple of a given dataset. Since the dataset may be divided up in a variety of ways the programmer should not make assumptions about state that is maintained between invocations of this method.

Specified by:
exec in class EvalFunc<String>
Parameters:
input - the Tuple to be processed.
Returns:
result, of type T.
Throws:
IOException

getArgToFuncMapping

public List<FuncSpec> getArgToFuncMapping()
                                   throws FrontendException
Description copied from class: EvalFunc
Allow a UDF to specify type specific implementations of itself. For example, an implementation of arithmetic sum might have int and float implementations, since integer arithmetic performs much better than floating point arithmetic. Pig's typechecker will call this method and using the returned list plus the schema of the function's input data, decide which implementation of the UDF to use.

Overrides:
getArgToFuncMapping in class EvalFunc<String>
Returns:
A List containing FuncSpec objects representing the EvalFunc class which can handle the inputs corresponding to the schema in the objects. Each FuncSpec should be constructed with a schema that describes the input for that implementation. For example, the sum function above would return two elements in its list:
  1. FuncSpec(this.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.DOUBLE)))
  2. FuncSpec(IntSum.getClass().getName(), new Schema(new Schema.FieldSchema(null, DataType.INTEGER)))
This would indicate that the main implementation is used for doubles, and the special implementation IntSum is used for ints.
Throws:
FrontendException


Copyright © 2007-2012 The Apache Software Foundation