org.apache.pig.builtin
Class TextLoader

java.lang.Object
  extended by org.apache.pig.LoadFunc
      extended by org.apache.pig.builtin.TextLoader
All Implemented Interfaces:
LoadCaster

public class TextLoader
extends LoadFunc
implements LoadCaster

This load function simply creates a tuple for each line of text that has a single chararray field that contains the line of text.


Field Summary
protected  org.apache.hadoop.mapreduce.RecordReader in
           
 
Constructor Summary
TextLoader()
           
 
Method Summary
 DataBag bytesToBag(byte[] b, ResourceSchema.ResourceFieldSchema schema)
          TextLoader does not support conversion to Bag
 BigDecimal bytesToBigDecimal(byte[] b)
          Cast data from bytearray to BigDecimal value.
 BigInteger bytesToBigInteger(byte[] b)
          Cast data from bytearray to BigInteger value.
 Boolean bytesToBoolean(byte[] b)
          TextLoader does not support conversion to Boolean
 String bytesToCharArray(byte[] b)
          Cast data from bytes to chararray value.
 org.joda.time.DateTime bytesToDateTime(byte[] b)
          TextLoader does not support conversion to DateTime
 Double bytesToDouble(byte[] b)
          TextLoader does not support conversion to Double
 Float bytesToFloat(byte[] b)
          TextLoader does not support conversion to Float
 Integer bytesToInteger(byte[] b)
          TextLoader does not support conversion to Integer
 Long bytesToLong(byte[] b)
          TextLoader does not support conversion to Long
 Map<String,Object> bytesToMap(byte[] b, ResourceSchema.ResourceFieldSchema schema)
          Cast data from bytearray to map value.
 Tuple bytesToTuple(byte[] b, ResourceSchema.ResourceFieldSchema schema)
          TextLoader does not support conversion to Tuple
 org.apache.hadoop.mapreduce.InputFormat getInputFormat()
          This will be called during planning on the front end.
 LoadCaster getLoadCaster()
          This will be called on the front end during planning and not on the back end during execution.
 Tuple getNext()
          Retrieves the next tuple to be processed.
 void prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader, PigSplit split)
          Initializes LoadFunc for reading data.
 void setLocation(String location, org.apache.hadoop.mapreduce.Job job)
          Communicate to the loader the location of the object(s) being loaded.
 byte[] toBytes(Boolean b)
           
 byte[] toBytes(DataBag bag)
           
 byte[] toBytes(org.joda.time.DateTime dt)
           
 byte[] toBytes(Double d)
           
 byte[] toBytes(Float f)
           
 byte[] toBytes(Integer i)
           
 byte[] toBytes(Long l)
           
 byte[] toBytes(Map<String,Object> m)
           
 byte[] toBytes(String s)
           
 byte[] toBytes(Tuple t)
           
 
Methods inherited from class org.apache.pig.LoadFunc
getAbsolutePath, getPathStrings, join, relativeToAbsolutePath, setUDFContextSignature, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

in

protected org.apache.hadoop.mapreduce.RecordReader in
Constructor Detail

TextLoader

public TextLoader()
Method Detail

getNext

public Tuple getNext()
              throws IOException
Description copied from class: LoadFunc
Retrieves the next tuple to be processed. Implementations should NOT reuse tuple objects (or inner member objects) they return across calls and should return a different tuple object in each call.

Specified by:
getNext in class LoadFunc
Returns:
the next tuple to be processed or null if there are no more tuples to be processed.
Throws:
IOException - if there is an exception while retrieving the next tuple

bytesToBoolean

public Boolean bytesToBoolean(byte[] b)
                       throws IOException
TextLoader does not support conversion to Boolean

Specified by:
bytesToBoolean in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Boolean value.
Throws:
IOException - if the value cannot be cast.

bytesToInteger

public Integer bytesToInteger(byte[] b)
                       throws IOException
TextLoader does not support conversion to Integer

Specified by:
bytesToInteger in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Double value.
Throws:
IOException - if the value cannot be cast.

bytesToLong

public Long bytesToLong(byte[] b)
                 throws IOException
TextLoader does not support conversion to Long

Specified by:
bytesToLong in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Long value.
Throws:
IOException - if the value cannot be cast.

bytesToFloat

public Float bytesToFloat(byte[] b)
                   throws IOException
TextLoader does not support conversion to Float

Specified by:
bytesToFloat in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Float value.
Throws:
IOException - if the value cannot be cast.

bytesToDouble

public Double bytesToDouble(byte[] b)
                     throws IOException
TextLoader does not support conversion to Double

Specified by:
bytesToDouble in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Double value.
Throws:
IOException - if the value cannot be cast.

bytesToDateTime

public org.joda.time.DateTime bytesToDateTime(byte[] b)
                                       throws IOException
TextLoader does not support conversion to DateTime

Specified by:
bytesToDateTime in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
datetime value.
Throws:
IOException - if the value cannot be cast.

bytesToCharArray

public String bytesToCharArray(byte[] b)
                        throws IOException
Cast data from bytes to chararray value.

Specified by:
bytesToCharArray in interface LoadCaster
Parameters:
b - byte array to be cast.
Returns:
String value.
Throws:
IOException - if the value cannot be cast.

bytesToMap

public Map<String,Object> bytesToMap(byte[] b,
                                     ResourceSchema.ResourceFieldSchema schema)
                              throws IOException
Description copied from interface: LoadCaster
Cast data from bytearray to map value.

Specified by:
bytesToMap in interface LoadCaster
Parameters:
b - bytearray to be cast.
schema - field schema for the output map
Returns:
Map value.
Throws:
IOException - if the value cannot be cast.

bytesToTuple

public Tuple bytesToTuple(byte[] b,
                          ResourceSchema.ResourceFieldSchema schema)
                   throws IOException
TextLoader does not support conversion to Tuple

Specified by:
bytesToTuple in interface LoadCaster
Parameters:
b - bytearray to be cast.
schema - field schema for the output tuple
Returns:
Tuple value.
Throws:
IOException - if the value cannot be cast.

bytesToBag

public DataBag bytesToBag(byte[] b,
                          ResourceSchema.ResourceFieldSchema schema)
                   throws IOException
TextLoader does not support conversion to Bag

Specified by:
bytesToBag in interface LoadCaster
Parameters:
b - bytearray to be cast.
schema - field schema for the output bag
Returns:
Bag value.
Throws:
IOException - if the value cannot be cast.

toBytes

public byte[] toBytes(DataBag bag)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(String s)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Double d)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Float f)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Boolean b)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Integer i)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Long l)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(org.joda.time.DateTime dt)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Map<String,Object> m)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Tuple t)
               throws IOException
Throws:
IOException

bytesToBigInteger

public BigInteger bytesToBigInteger(byte[] b)
                             throws IOException
Description copied from interface: LoadCaster
Cast data from bytearray to BigInteger value.

Specified by:
bytesToBigInteger in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
BigInteger value.
Throws:
IOException - if the value cannot be cast.

bytesToBigDecimal

public BigDecimal bytesToBigDecimal(byte[] b)
                             throws IOException
Description copied from interface: LoadCaster
Cast data from bytearray to BigDecimal value.

Specified by:
bytesToBigDecimal in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
BigInteger value.
Throws:
IOException - if the value cannot be cast.

getInputFormat

public org.apache.hadoop.mapreduce.InputFormat getInputFormat()
Description copied from class: LoadFunc
This will be called during planning on the front end. This is the instance of InputFormat (rather than the class name) because the load function may need to instantiate the InputFormat in order to control how it is constructed.

Specified by:
getInputFormat in class LoadFunc
Returns:
the InputFormat associated with this loader.

getLoadCaster

public LoadCaster getLoadCaster()
Description copied from class: LoadFunc
This will be called on the front end during planning and not on the back end during execution.

Overrides:
getLoadCaster in class LoadFunc
Returns:
the LoadCaster associated with this loader. Returning null indicates that casts from byte array are not supported for this loader. construction

prepareToRead

public void prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader,
                          PigSplit split)
Description copied from class: LoadFunc
Initializes LoadFunc for reading data. This will be called during execution before any calls to getNext. The RecordReader needs to be passed here because it has been instantiated for a particular InputSplit.

Specified by:
prepareToRead in class LoadFunc
Parameters:
reader - RecordReader to be used by this instance of the LoadFunc
split - The input PigSplit to process

setLocation

public void setLocation(String location,
                        org.apache.hadoop.mapreduce.Job job)
                 throws IOException
Description copied from class: LoadFunc
Communicate to the loader the location of the object(s) being loaded. The location string passed to the LoadFunc here is the return value of LoadFunc.relativeToAbsolutePath(String, Path). Implementations should use this method to communicate the location (and any other information) to its underlying InputFormat through the Job object. This method will be called in the frontend and backend multiple times. Implementations should bear in mind that this method is called multiple times and should ensure there are no inconsistent side effects due to the multiple calls.

Specified by:
setLocation in class LoadFunc
Parameters:
location - Location as returned by LoadFunc.relativeToAbsolutePath(String, Path)
job - the Job object store or retrieve earlier stored information from the UDFContext
Throws:
IOException - if the location is not valid.


Copyright © 2007-2012 The Apache Software Foundation