Modifier and Type | Class and Description |
---|---|
class |
nl.basjes.pig.input.apachehttpdlog.Loader |
Modifier and Type | Class and Description |
---|---|
class |
FileInputLoadFunc
This class provides an implementation of OrderedLoadFunc interface
which can be optionally re-used by LoadFuncs that use FileInputFormat, by
having this as a super class
|
class |
LoadFuncMetadataWrapper
Convenience class to extend when decorating a class that extends LoadFunc and
implements LoadMetadata.
|
class |
LoadFuncWrapper
Convenience class to extend when decorating a LoadFunc.
|
Modifier and Type | Method and Description |
---|---|
protected LoadFunc |
LoadFuncWrapper.loadFunc() |
Modifier and Type | Method and Description |
---|---|
protected void |
LoadFuncWrapper.setLoadFunc(LoadFunc loadFunc)
The wrapped LoadFunc object must be set before method calls are made on this object.
|
Modifier and Type | Class and Description |
---|---|
class |
AbstractAccumuloStorage
A LoadStoreFunc for retrieving data from and storing data to Accumulo
A Key/Val pair will be returned as tuples: (key, colfam, colqual, colvis,
timestamp, value).
|
class |
AccumuloStorage
Basic PigStorage implementation that uses Accumulo as the backing store.
|
Modifier and Type | Class and Description |
---|---|
class |
MergeJoinIndexer
Merge Join indexer is used to generate on the fly index for doing Merge Join efficiently.
|
Constructor and Description |
---|
PigRecordReader(org.apache.hadoop.mapreduce.InputFormat<?,?> inputformat,
PigSplit pigSplit,
LoadFunc loadFunc,
org.apache.hadoop.mapreduce.TaskAttemptContext context,
long limit) |
Modifier and Type | Method and Description |
---|---|
LoadFunc |
POLoad.getLoadFunc() |
Constructor and Description |
---|
POLoad(OperatorKey k,
LoadFunc lf) |
Constructor and Description |
---|
POSimpleTezLoad(OperatorKey k,
LoadFunc loader) |
Modifier and Type | Class and Description |
---|---|
class |
HBaseStorage
A HBase implementation of LoadFunc and StoreFunc.
|
Modifier and Type | Class and Description |
---|---|
class |
AvroStorage
Pig UDF for reading and writing Avro data.
|
class |
BinStorage
Load and store data in a binary format.
|
class |
JsonLoader
A loader for data stored using
JsonStorage . |
class |
OrcStorage
A load function and store function for ORC file.
|
class |
ParquetLoader
Wrapper class which will delegate calls to parquet.pig.ParquetLoader
|
class |
PigStorage
A load function that parses a line of input into fields using a character delimiter.
|
class |
TextLoader
This load function simply creates a tuple for each line of text that has a
single chararray field that
contains the line of text.
|
class |
TrevniStorage
Pig Store/Load Function for Trevni.
|
Modifier and Type | Class and Description |
---|---|
class |
Storage
A convenient mock Storage for unit tests
|
Modifier and Type | Class and Description |
---|---|
class |
DefaultIndexableLoader
Used by MergeJoin .
|
class |
PoissonSampleLoader
See "Skewed Join sampler" in http://wiki.apache.org/pig/PigSampler
|
class |
RandomSampleLoader
A loader that samples the data.
|
class |
SampleLoader
Abstract class that specifies the interface for sample loaders
|
Modifier and Type | Field and Description |
---|---|
protected LoadFunc |
SampleLoader.loader |
Modifier and Type | Class and Description |
---|---|
class |
InterStorage
LOAD FUNCTION FOR PIG INTERNAL USE ONLY!
This load function is used for storing intermediate data between MR jobs of
a pig query.
|
class |
ReadToEndLoader
This is wrapper Loader which wraps a real LoadFunc underneath and allows
to read a file completely starting a given split (indicated by a split index
which is used to look in the List
|
class |
SequenceFileInterStorage
Store tuples (BinSedesTuples, specifically) using sequence files to leverage
sequence file's compression features.
|
class |
TFileStorage
LOAD FUNCTION FOR PIG INTERNAL USE ONLY! This load function is used for
storing intermediate data between MR jobs of a pig query.
|
Modifier and Type | Method and Description |
---|---|
DataBag |
PigFile.load(LoadFunc lfunc,
PigContext pigContext) |
Constructor and Description |
---|
ReadToEndLoader(LoadFunc wrappedLoadFunc,
org.apache.hadoop.conf.Configuration conf,
String inputLocation,
int splitIndex) |
ReadToEndLoader(LoadFunc wrappedLoadFunc,
org.apache.hadoop.conf.Configuration conf,
String inputLocation,
int[] toReadSplitIdxs)
This constructor takes an array of split indexes (toReadSplitIdxs) of the
splits to be read.
|
ReadToEndLoader(LoadFunc wrappedLoadFunc,
org.apache.hadoop.conf.Configuration conf,
String inputLocation,
int splitIndex,
String signature) |
Modifier and Type | Method and Description |
---|---|
static ResourceSchema |
Utils.getSchema(LoadFunc wrappedLoadFunc,
String location,
boolean checkExistence,
org.apache.hadoop.mapreduce.Job job) |
Modifier and Type | Method and Description |
---|---|
LoadFunc |
LOLoad.getLoadFunc() |
Constructor and Description |
---|
LOLoad(FileSpec loader,
LogicalSchema schema,
LogicalPlan plan,
org.apache.hadoop.conf.Configuration conf,
LoadFunc loadFunc,
String signature)
Used from the LogicalPlanBuilder
|
Modifier and Type | Class and Description |
---|---|
class |
AllLoader
The AllLoader provides the ability to point pig at a folder that contains
files in multiple formats e.g.
|
class |
CSVExcelStorage
CSV loading and storing with support for multi-line fields,
and escaping of delimiters and double quotes within fields;
uses CSV conventions of Excel 2007.
|
class |
CSVLoader
A load function based on PigStorage that implements part of the CSV "standard"
This loader properly supports double-quoted fields that contain commas and other
double-quotes escaped with backslashes.
|
class |
FixedWidthLoader
A fixed-width file loader.
|
class |
HadoopJobHistoryLoader |
class |
HiveColumnarLoader
Loader for Hive RC Columnar files.
Supports the following types: * Hive Type Pig Type from DataType string CHARARRAY int INTEGER bigint or long LONG float float double DOUBLE boolean BOOLEAN byte BYTE array TUPLE map MAP Partitions The input paths are scanned by the loader for [partition name]=[value] patterns in the subdirectories. If detected these partitions are appended to the table schema. For example if you have the directory structure: |
class |
HiveColumnarStorage |
class |
IndexedStorage
IndexedStorage is a form of PigStorage that supports a
per record seek. |
class |
MyRegExLoader |
class |
PigStorageSchema
Deprecated.
Use PigStorage with a -schema option instead
|
class |
RegExLoader
RegExLoader is an abstract class used to parse logs based on a regular expression.
|
class |
SequenceFileLoader
A Loader for Hadoop-Standard SequenceFiles.
|
class |
XMLLoader
Parses an XML input file given a specified identifier of tags to be loaded.
|
Modifier and Type | Method and Description |
---|---|
LoadFunc |
AllLoader.AllReader.prepareLoadFuncForReading(PigSplit split) |
Modifier and Type | Class and Description |
---|---|
class |
CombinedLogLoader
CombinedLogLoader is used to load logs based on Apache's combined log format, based on a format like
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
The log filename ends up being access_log from a line like
CustomLog logs/combined_log combined
Example:
raw = LOAD 'combined_log' USING org.apache.pig.piggybank.storage.apachelog.CombinedLogLoader AS
(remoteAddr, remoteLogname, user, time, method, uri, proto, status, bytes, referer, userAgent);
|
class |
CommonLogLoader
CommonLogLoader is used to load logs based on Apache's common log format, based on a format like
LogFormat "%h %l %u %t \"%r\" %>s %b" common
The log filename ends up being access_log from a line like
CustomLog logs/access_log common
Example:
raw = LOAD 'access_log' USING org.apache.pig.piggybank.storage.apachelog.CommongLogLoader AS (remoteAddr,
remoteLogname, user, time, method, uri, proto, bytes);
|
class |
LogFormatLoader
This is a pig loader that can load Apache HTTPD access logs written in (almost) any
Apache HTTPD LogFormat.
Basic usage: Simply feed the loader your (custom) logformat specification and it will tell you which fields can be extracted from this logformat. For example: |
Modifier and Type | Method and Description |
---|---|
List<org.apache.hadoop.fs.FileStatus> |
PathPartitionHelper.listStatus(org.apache.hadoop.mapreduce.JobContext ctx,
Class<? extends LoadFunc> loaderClass,
String signature)
This method is called by the FileInputFormat to find the input paths for
which splits should be calculated.
If applyDateRanges == true: Then the HiveRCDateSplitter is used to apply filtering on the input files. Else the default FileInputFormat listStatus method is used. |
void |
PathPartitionHelper.setPartitionFilterExpression(String partitionFilterExpression,
Class<? extends LoadFunc> loaderClass,
String signature)
Sets the PARITITION_FILTER_EXPRESSION property in the UDFContext
identified by the loaderClass.
|
void |
PathPartitionHelper.setPartitionKeys(String location,
org.apache.hadoop.conf.Configuration conf,
Class<? extends LoadFunc> loaderClass,
String signature)
Reads the partition keys from the location i.e the base directory
|
Copyright © 2007-2012 The Apache Software Foundation