Package org.apache.pig.piggybank.storage

Class Summary
AllLoader The AllLoader provides the ability to point pig at a folder that contains files in multiple formats e.g.
AllLoader.AllLoaderInputFormat InputFormat that encapsulates the correct input format based on the file type.
AllLoader.AllReader This is where the logic is for selecting the correct Loader.
CSVExcelStorage CSV loading and storing with support for multi-line fields, and escaping of delimiters and double quotes within fields; uses CSV conventions of Excel 2007.
CSVLoader A load function based on PigStorage that implements part of the CSV "standard" This loader properly supports double-quoted fields that contain commas and other double-quotes escaped with backslashes.
DBStorage  
HadoopJobHistoryLoader  
HadoopJobHistoryLoader.HadoopJobHistoryInputFormat  
HadoopJobHistoryLoader.HadoopJobHistoryReader  
HadoopJobHistoryLoader.JobHistoryPathFilter  
HadoopJobHistoryLoader.MRJobInfo  
HiveColumnarLoader Loader for Hive RC Columnar files.
Supports the following types:
* Hive Type Pig Type from DataType string CHARARRAY int INTEGER bigint or long LONG float float double DOUBLE boolean BOOLEAN byte BYTE array TUPLE map MAP

Partitions
The input paths are scanned by the loader for [partition name]=[value] patterns in the subdirectories.
If detected these partitions are appended to the table schema.
For example if you have the directory structure:

IndexedStorage IndexedStorage is a form of PigStorage that supports a per record seek.
IndexedStorage.IndexedStorageInputFormat Internal InputFormat class
IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader Internal RecordReader class
IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader.IndexedStorageLineReader  
IndexedStorage.IndexedStorageInputFormat.IndexedStorageRecordReader.IndexedStorageRecordReaderComparator Class to compare record readers using underlying indexes
IndexedStorage.IndexedStorageOutputFormat Internal OutputFormat class
IndexedStorage.IndexedStorageOutputFormat.IndexedStorageRecordWriter Internal class to do the actual record writing and index generation
IndexedStorage.IndexManager IndexManager manages the index file (both writing and reading) It keeps track of the last index read during reading.
JsonMetadata Deprecated.
MultiStorage The UDF is useful for splitting the output data into a bunch of directories and files dynamically based on user specified key field in the output tuple.
MultiStorage.MultiStorageOutputFormat  
MultiStorage.MultiStorageOutputFormat.MyLineRecordWriter  
MyRegExLoader  
PigStorageSchema Deprecated. Use PigStorage with a -schema option instead
RegExLoader RegExLoader is an abstract class used to parse logs based on a regular expression.
SequenceFileLoader A Loader for Hadoop-Standard SequenceFiles.
XMLLoader The load function to load the XML file This implements the LoadFunc interface which is used to parse records from a dataset.
XMLLoader.XMLFileInputFormat  
XMLLoader.XMLFileRecordReader  
 

Enum Summary
CSVExcelStorage.Linebreaks  
CSVExcelStorage.Multiline  
HadoopJobHistoryLoader.JobKeys Job Keys
 



Copyright © 2007-2012 The Apache Software Foundation