Used by MergeJoin . Takes an index on sorted data
consisting of sorted tuples of the form
(key1,key2..., position,splitIndex) as input. For key given in seekNear(Tuple)
finds the splitIndex that can contain the key and initializes ReadToEndLoader
to read from that splitIndex onwards , in the sequence of splits in the index
This method is called by the Pig runtime to indicate
to the LoadFunc to position its underlying input stream
near the keys supplied as the argument. Specifically:
1) if the keys are present in the input stream, the loadfunc
implementation should position its read position to
a record where the key(s) is/are the biggest key(s) less than
the key(s) supplied in the argument OR to the record with the
first occurrence of the keys(s) supplied.
2) if the key(s) are absent in the input stream, the implementation
should position its read position to a record where the key(s)
is/are the biggest key(s) less than the key(s) supplied OR to the
first record where the key(s) is/are the smallest key(s) greater
than the keys(s) supplied.
The description above holds for descending order data in
a similar manner with "biggest" and "less than" replaced with
"smallest" and "greater than" and vice versa.
keys - Tuple with join keys (which are a prefix of the sort
keys of the input data). For example if the data is sorted on
columns in position 2,4,5 any of the following Tuples are
valid as an argument value:
(fieldAt(2), fieldAt(4), fieldAt(5))
The following are some invalid cases:
IOException - When the loadFunc is unable to position
to the required point in its input stream
A method called by the Pig runtime to give an opportunity
for implementations to perform cleanup actions like closing
the underlying input stream. This is necessary since while
performing a join the Pig run time may determine than no further
join is possible with remaining records and may indicate to the
IndexableLoader to cleanup by calling this method.
This will be called during planning on the front end. This is the
instance of InputFormat (rather than the class name) because the
load function may need to instantiate the InputFormat in order
to control how it is constructed.
Initializes LoadFunc for reading data. This will be called during execution
before any calls to getNext. The RecordReader needs to be passed here because
it has been instantiated for a particular InputSplit.
Communicate to the loader the location of the object(s) being loaded.
The location string passed to the LoadFunc here is the return value of
LoadFunc.relativeToAbsolutePath(String, Path). Implementations
should use this method to communicate the location (and any other information)
to its underlying InputFormat through the Job object.
This method will be called in the frontend and backend multiple times. Implementations
should bear in mind that this method is called multiple times and should
ensure there are no inconsistent side effects due to the multiple calls.