org.apache.pig.builtin
Class TextLoader

java.lang.Object
  extended by org.apache.pig.LoadFunc
      extended by org.apache.pig.builtin.TextLoader
All Implemented Interfaces:
LoadCaster

public class TextLoader
extends LoadFunc
implements LoadCaster

This load function simply creates a tuple for each line of text that has a single chararray field that contains the line of text.


Field Summary
protected  org.apache.hadoop.mapreduce.RecordReader in
           
 
Constructor Summary
TextLoader()
           
 
Method Summary
 DataBag bytesToBag(byte[] b, ResourceSchema.ResourceFieldSchema schema)
          TextLoader does not support conversion to Bag
 String bytesToCharArray(byte[] b)
          Cast data from bytes to chararray value.
 Double bytesToDouble(byte[] b)
          TextLoader does not support conversion to Double
 Float bytesToFloat(byte[] b)
          TextLoader does not support conversion to Float
 Integer bytesToInteger(byte[] b)
          TextLoader does not support conversion to Integer
 Long bytesToLong(byte[] b)
          TextLoader does not support conversion to Long
 Map<String,Object> bytesToMap(byte[] b)
          TextLoader does not support conversion to Map
 Tuple bytesToTuple(byte[] b, ResourceSchema.ResourceFieldSchema schema)
          TextLoader does not support conversion to Tuple
 org.apache.hadoop.mapreduce.InputFormat getInputFormat()
          This will be called during planning on the front end.
 LoadCaster getLoadCaster()
          This will be called on the front end during planning and not on the back end during execution.
 Tuple getNext()
          Retrieves the next tuple to be processed.
 void prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader, PigSplit split)
          Initializes LoadFunc for reading data.
 void setLocation(String location, org.apache.hadoop.mapreduce.Job job)
          Communicate to the loader the location of the object(s) being loaded.
 byte[] toBytes(DataBag bag)
           
 byte[] toBytes(Double d)
           
 byte[] toBytes(Float f)
           
 byte[] toBytes(Integer i)
           
 byte[] toBytes(Long l)
           
 byte[] toBytes(Map<String,Object> m)
           
 byte[] toBytes(String s)
           
 byte[] toBytes(Tuple t)
           
 
Methods inherited from class org.apache.pig.LoadFunc
getAbsolutePath, getPathStrings, join, relativeToAbsolutePath, setUDFContextSignature
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

in

protected org.apache.hadoop.mapreduce.RecordReader in
Constructor Detail

TextLoader

public TextLoader()
Method Detail

getNext

public Tuple getNext()
              throws IOException
Description copied from class: LoadFunc
Retrieves the next tuple to be processed. Implementations should NOT reuse tuple objects (or inner member objects) they return across calls and should return a different tuple object in each call.

Specified by:
getNext in class LoadFunc
Returns:
the next tuple to be processed or null if there are no more tuples to be processed.
Throws:
IOException - if there is an exception while retrieving the next tuple

bytesToInteger

public Integer bytesToInteger(byte[] b)
                       throws IOException
TextLoader does not support conversion to Integer

Specified by:
bytesToInteger in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Double value.
Throws:
IOException - if the value cannot be cast.

bytesToLong

public Long bytesToLong(byte[] b)
                 throws IOException
TextLoader does not support conversion to Long

Specified by:
bytesToLong in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Long value.
Throws:
IOException - if the value cannot be cast.

bytesToFloat

public Float bytesToFloat(byte[] b)
                   throws IOException
TextLoader does not support conversion to Float

Specified by:
bytesToFloat in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Float value.
Throws:
IOException - if the value cannot be cast.

bytesToDouble

public Double bytesToDouble(byte[] b)
                     throws IOException
TextLoader does not support conversion to Double

Specified by:
bytesToDouble in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Double value.
Throws:
IOException - if the value cannot be cast.

bytesToCharArray

public String bytesToCharArray(byte[] b)
                        throws IOException
Cast data from bytes to chararray value.

Specified by:
bytesToCharArray in interface LoadCaster
Parameters:
b - byte array to be cast.
Returns:
String value.
Throws:
IOException - if the value cannot be cast.

bytesToMap

public Map<String,Object> bytesToMap(byte[] b)
                              throws IOException
TextLoader does not support conversion to Map

Specified by:
bytesToMap in interface LoadCaster
Parameters:
b - bytearray to be cast.
Returns:
Map value.
Throws:
IOException - if the value cannot be cast.

bytesToTuple

public Tuple bytesToTuple(byte[] b,
                          ResourceSchema.ResourceFieldSchema schema)
                   throws IOException
TextLoader does not support conversion to Tuple

Specified by:
bytesToTuple in interface LoadCaster
Parameters:
b - bytearray to be cast.
schema - field schema for the output tuple
Returns:
Tuple value.
Throws:
IOException - if the value cannot be cast.

bytesToBag

public DataBag bytesToBag(byte[] b,
                          ResourceSchema.ResourceFieldSchema schema)
                   throws IOException
TextLoader does not support conversion to Bag

Specified by:
bytesToBag in interface LoadCaster
Parameters:
b - bytearray to be cast.
schema - field schema for the output bag
Returns:
Bag value.
Throws:
IOException - if the value cannot be cast.

toBytes

public byte[] toBytes(DataBag bag)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(String s)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Double d)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Float f)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Integer i)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Long l)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Map<String,Object> m)
               throws IOException
Throws:
IOException

toBytes

public byte[] toBytes(Tuple t)
               throws IOException
Throws:
IOException

getInputFormat

public org.apache.hadoop.mapreduce.InputFormat getInputFormat()
Description copied from class: LoadFunc
This will be called during planning on the front end. This is the instance of InputFormat (rather than the class name) because the load function may need to instantiate the InputFormat in order to control how it is constructed.

Specified by:
getInputFormat in class LoadFunc
Returns:
the InputFormat associated with this loader.

getLoadCaster

public LoadCaster getLoadCaster()
Description copied from class: LoadFunc
This will be called on the front end during planning and not on the back end during execution.

Overrides:
getLoadCaster in class LoadFunc
Returns:
the LoadCaster associated with this loader. Returning null indicates that casts from byte array are not supported for this loader. construction

prepareToRead

public void prepareToRead(org.apache.hadoop.mapreduce.RecordReader reader,
                          PigSplit split)
Description copied from class: LoadFunc
Initializes LoadFunc for reading data. This will be called during execution before any calls to getNext. The RecordReader needs to be passed here because it has been instantiated for a particular InputSplit.

Specified by:
prepareToRead in class LoadFunc
Parameters:
reader - RecordReader to be used by this instance of the LoadFunc
split - The input PigSplit to process

setLocation

public void setLocation(String location,
                        org.apache.hadoop.mapreduce.Job job)
                 throws IOException
Description copied from class: LoadFunc
Communicate to the loader the location of the object(s) being loaded. The location string passed to the LoadFunc here is the return value of LoadFunc.relativeToAbsolutePath(String, Path). Implementations should use this method to communicate the location (and any other information) to its underlying InputFormat through the Job object. This method will be called in the backend multiple times. Implementations should bear in mind that this method is called multiple times and should ensure there are no inconsistent side effects due to the multiple calls.

Specified by:
setLocation in class LoadFunc
Parameters:
location - Location as returned by LoadFunc.relativeToAbsolutePath(String, Path)
job - the Job object store or retrieve earlier stored information from the UDFContext
Throws:
IOException - if the location is not valid.


Copyright © ${year} The Apache Software Foundation