public class IndexedStorage extends PigStorage implements IndexableLoadFunc
IndexedStorage is a form of PigStorage that supports a
 per record seek.  IndexedStorage creates a separate (hidden) index file for
 every data file that is written.  The format of the index file is:
 | Header | | Index Body | | Footer |The Header contains the list of record indices (field numbers) that represent index keys. The Index Body contains a
Tuple for each record in the data.
 The fields of the Tuple are:
 Tuple Tuple in the index. Tuple in the index. IndexStorage implements IndexableLoadFunc and
 can be used as the 'right table' in a PIG 'merge' or 'merge-sparse' join.
 IndexStorage does not require the data to be globally partitioned & sorted
 by index keys.  Each partition (separate index) must be locally sorted.
 Also note IndexStorage is a loader to demonstrate "merge-sparse" join.| Modifier and Type | Class and Description | 
|---|---|
| static class  | IndexedStorage.IndexedStorageInputFormatInternal InputFormat class | 
| static class  | IndexedStorage.IndexedStorageOutputFormatInternal OutputFormat class | 
| static class  | IndexedStorage.IndexManagerIndexManagermanages the index file (both writing and reading)
 It keeps track of the last index read during reading. | 
LoadPushDown.OperatorSet, LoadPushDown.RequiredField, LoadPushDown.RequiredFieldList, LoadPushDown.RequiredFieldResponse| Modifier and Type | Field and Description | 
|---|---|
| protected int | currentReaderIndexStartIndex into the the list of readers to the current reader. | 
| protected byte | fieldDelimiterDelimiter to use between fields | 
| protected int[] | offsetsToIndexKeysOffsets to index keys in tuple | 
| protected java.util.Comparator<IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader> | readerComparatorComparator used to compare key tuples. | 
| protected IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader[] | readersList of record readers. | 
caster, in, mLog, mRequiredColumns, schema, signature, writer| Constructor and Description | 
|---|
| IndexedStorage(java.lang.String delimiter,
              java.lang.String offsetsToIndexKeys)Constructs a Pig Storer that uses specified regex as a field delimiter. | 
| Modifier and Type | Method and Description | 
|---|---|
| void | close()A method called by the Pig runtime to give an opportunity
 for implementations to perform cleanup actions like closing
 the underlying input stream. | 
| org.apache.hadoop.mapreduce.InputFormat | getInputFormat()This will be called during planning on the front end. | 
| Tuple | getNext()Retrieves the next tuple to be processed. | 
| org.apache.hadoop.mapreduce.OutputFormat | getOutputFormat()Return the OutputFormat associated with StoreFuncInterface. | 
| void | initialize(org.apache.hadoop.conf.Configuration conf)IndexableLoadFunc interface implementation | 
| void | seekNear(Tuple keys)This method is called by the Pig runtime to indicate
 to the LoadFunc to position its underlying input stream
 near the keys supplied as the argument. | 
checkSchema, cleanupOnFailure, cleanupOnSuccess, cleanupOutput, equals, equals, getFeatures, getPartitionKeys, getSchema, getStatistics, hashCode, prepareToRead, prepareToWrite, pushProjection, putNext, readField, relToAbsPathForStoreLocation, setLocation, setPartitionFilter, setStoreFuncUDFContextSignature, setStoreLocation, setUDFContextSignature, shouldOverwrite, storeSchema, storeStatisticsgetSplitComparablegetAbsolutePath, getCacheFiles, getLoadCaster, getPathStrings, getShipFiles, join, relativeToAbsolutePath, warnprotected IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader[] readers
protected int currentReaderIndexStart
protected byte fieldDelimiter
protected final int[] offsetsToIndexKeys
protected java.util.Comparator<IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader> readerComparator
public IndexedStorage(java.lang.String delimiter,
              java.lang.String offsetsToIndexKeys)
delimiter - - field delimiter to useoffsetsToIndexKeys - - list of offset into Tuple for index keys (comma separated)public org.apache.hadoop.mapreduce.OutputFormat getOutputFormat()
StoreFuncInterfacegetOutputFormat in interface StoreFuncInterfacegetOutputFormat in class PigStorageOutputFormat associated with StoreFuncInterfacepublic org.apache.hadoop.mapreduce.InputFormat getInputFormat()
LoadFuncgetInputFormat in class PigStoragepublic Tuple getNext() throws java.io.IOException
LoadFuncgetNext in class PigStoragejava.io.IOException - if there is an exception while retrieving the next
 tuplepublic void initialize(org.apache.hadoop.conf.Configuration conf)
                throws java.io.IOException
initialize in interface IndexableLoadFuncconf - The job configuration objectjava.io.IOExceptionpublic void seekNear(Tuple keys) throws java.io.IOException
IndexableLoadFuncseekNear in interface IndexableLoadFunckeys - Tuple with join keys (which are a prefix of the sort
 keys of the input data). For example if the data is sorted on
 columns in position 2,4,5 any of the following Tuples are
 valid as an argument value:
 (fieldAt(2))
 (fieldAt(2), fieldAt(4))
 (fieldAt(2), fieldAt(4), fieldAt(5))
 
 The following are some invalid cases:
 (fieldAt(4))
 (fieldAt(2), fieldAt(5))
 (fieldAt(4), fieldAt(5))java.io.IOException - When the loadFunc is unable to position
 to the required point in its input streampublic void close()
           throws java.io.IOException
IndexableLoadFuncclose in interface IndexableLoadFuncjava.io.IOException - if the loadfunc is unable to perform
 its close actions.Copyright © 2007-2012 The Apache Software Foundation