org.apache.pig
Interface OrderedLoadFunc

All Known Implementing Classes:
AllLoader, AvroStorage, BinStorage, CSVExcelStorage, CSVLoader, FileInputLoadFunc, HBaseStorage, HiveColumnarLoader, HiveColumnarStorage, IndexedStorage, InterStorage, PigStorage, PigStorageSchema, SequenceFileInterStorage, SequenceFileLoader, TFileStorage

@InterfaceAudience.Public
@InterfaceStability.Evolving
public interface OrderedLoadFunc

Implementing this interface indicates to Pig that a given loader can be used for MergeJoin. It does not mean the data itself is ordered, but rather that the splits returned by the underlying InputFormat can be ordered to match the order of the data they are loading. For example, files splits have a natural order (that of the file they are from) while splits of RDBMS does not (since tables have no inherent order). The position as represented by the WritableComparable object is stored in the index created by a MergeJoin sampling MapReduce job to get an ordered sequence of splits. It is necessary to read splits in order during a merge join to assure data is being read according to the sort order.

Since:
Pig 0.7

Method Summary
 org.apache.hadoop.io.WritableComparable<?> getSplitComparable(org.apache.hadoop.mapreduce.InputSplit split)
          The WritableComparable object returned will be used to compare the position of different splits in an ordered stream
 

Method Detail

getSplitComparable

org.apache.hadoop.io.WritableComparable<?> getSplitComparable(org.apache.hadoop.mapreduce.InputSplit split)
                                                              throws IOException
The WritableComparable object returned will be used to compare the position of different splits in an ordered stream

Parameters:
split - An InputSplit from the InputFormat underlying this loader.
Returns:
WritableComparable representing the position of the split in input
Throws:
IOException


Copyright © 2007-2012 The Apache Software Foundation