org.apache.pig
Interface IndexableLoadFunc

All Known Implementing Classes:
DefaultIndexableLoader, IndexedStorage

@InterfaceAudience.Public
@InterfaceStability.Evolving
public interface IndexableLoadFunc

This class is intended for use by LoadFunc implementations which have an internal index for sorted data and can use the index to support merge join in Pig. Interaction with the index is abstracted away by the methods in this interface which the Pig runtime will call in a particular sequence to get the records it needs to perform the merge based join. The sequence of calls made from the Pig runtime are:

  1. LoadFunc.setUDFContextSignature(String)
  2. initialize(Configuration)
  3. LoadFunc.setLocation(String, org.apache.hadoop.mapreduce.Job)
  4. seekNear(Tuple)
  5. LoadFunc.getNext() called multiple times to retrieve data and perform the join
  6. close()

Since:
Pig 0.6

Method Summary
 void close()
          A method called by the Pig runtime to give an opportunity for implementations to perform cleanup actions like closing the underlying input stream.
 void initialize(org.apache.hadoop.conf.Configuration conf)
          This method is called by Pig run time to allow the IndexableLoadFunc to perform any initialization actions
 void seekNear(Tuple keys)
          This method is called by the Pig runtime to indicate to the LoadFunc to position its underlying input stream near the keys supplied as the argument.
 

Method Detail

initialize

void initialize(org.apache.hadoop.conf.Configuration conf)
                throws IOException
This method is called by Pig run time to allow the IndexableLoadFunc to perform any initialization actions

Parameters:
conf - The job configuration object
Throws:
IOException

seekNear

void seekNear(Tuple keys)
              throws IOException
This method is called by the Pig runtime to indicate to the LoadFunc to position its underlying input stream near the keys supplied as the argument. Specifically: 1) if the keys are present in the input stream, the loadfunc implementation should position its read position to a record where the key(s) is/are the biggest key(s) less than the key(s) supplied in the argument OR to the record with the first occurrence of the keys(s) supplied. 2) if the key(s) are absent in the input stream, the implementation should position its read position to a record where the key(s) is/are the biggest key(s) less than the key(s) supplied OR to the first record where the key(s) is/are the smallest key(s) greater than the keys(s) supplied. The description above holds for descending order data in a similar manner with "biggest" and "less than" replaced with "smallest" and "greater than" and vice versa.

Parameters:
keys - Tuple with join keys (which are a prefix of the sort keys of the input data). For example if the data is sorted on columns in position 2,4,5 any of the following Tuples are valid as an argument value: (fieldAt(2)) (fieldAt(2), fieldAt(4)) (fieldAt(2), fieldAt(4), fieldAt(5)) The following are some invalid cases: (fieldAt(4)) (fieldAt(2), fieldAt(5)) (fieldAt(4), fieldAt(5))
Throws:
IOException - When the loadFunc is unable to position to the required point in its input stream

close

void close()
           throws IOException
A method called by the Pig runtime to give an opportunity for implementations to perform cleanup actions like closing the underlying input stream. This is necessary since while performing a join the Pig run time may determine than no further join is possible with remaining records and may indicate to the IndexableLoader to cleanup by calling this method.

Throws:
IOException - if the loadfunc is unable to perform its close actions.


Copyright © 2007-2012 The Apache Software Foundation