See: Description
Interface | Description |
---|---|
DataBag |
A collection of Tuples.
|
InterSedes |
A class to handle reading and writing of intermediate results of data
types.
|
Tuple |
An ordered list of Data.
|
TupleMaker<A extends Tuple> | |
TupleRawComparator |
This interface is intended to compare Tuples.
|
TypeAwareTuple |
Class | Description |
---|---|
AbstractTuple |
This class provides a convenient base for Tuple implementations.
|
AccumulativeBag | |
AmendableTuple | |
AppendableSchemaTuple<T extends AppendableSchemaTuple<T>> | |
BagFactory |
Factory for constructing different types of bags.
|
BinInterSedes |
A class to handle reading and writing of intermediate results of data types.
|
BinInterSedes.BinInterSedesTupleRawComparator | |
BinSedesTuple |
This tuple has a faster (de)serialization mechanism.
|
BinSedesTupleFactory |
Default implementation of TupleFactory.
|
DataByteArray |
An implementation of byte array.
|
DataReaderWriter |
This class was used to handle reading and writing of intermediate
results of data types.
|
DataType |
A class of static final values used to encode data type and a number of
static helper functions for manipulating data objects.
|
DefaultAbstractBag |
Default implementation of DataBag.
|
DefaultAbstractBag.BagDelimiterTuple | |
DefaultAbstractBag.EndBag | |
DefaultAbstractBag.StartBag | |
DefaultBagFactory |
Default implementation of BagFactory.
|
DefaultDataBag |
An unordered collection of Tuples (possibly) with multiples.
|
DefaultTuple |
A default implementation of Tuple.
|
DefaultTuple.DefaultTupleRawComparator | |
DefaultTupleFactory | Deprecated
Use
TupleFactory |
DistinctDataBag |
An unordered collection of Tuples with no multiples.
|
FileList |
This class extends ArrayList
|
InternalCachedBag | |
InternalDistinctBag |
An unordered collection of Tuples with no multiples.
|
InternalMap |
This class is an empty extension of Map
|
InternalSortedBag |
An ordered collection of Tuples (possibly) with multiples.
|
InterSedesFactory |
Used to get hold of the single instance of InterSedes .
|
LimitedSortedDataBag |
An ordered collection of Tuples (possibly) with multiples.
|
NonSpillableDataBag |
An unordered collection of Tuples (possibly) with multiples.
|
ReadOnceBag |
This bag does not store the tuples in memory, but has access to an iterator
typically provided by Hadoop.
|
SchemaTuple<T extends SchemaTuple<T>> |
A SchemaTuple is a type aware tuple that is much faster and more memory efficient.
|
SchemaTuple.SchemaTupleQuickGenerator<A> | |
SchemaTupleBackend | |
SchemaTupleClassGenerator |
This class encapsulates the generation of SchemaTuples, as well as some logic
around shipping code to the distributed cache.
|
SchemaTupleFactory |
This is an implementation of TupleFactory that will instantiate
SchemaTuple's.
|
SchemaTupleFrontend |
This class is to be used at job creation time.
|
SelfSpillBag |
Class to hold code common to self spilling bags such as InternalCachedBag
|
SelfSpillBag.MemoryLimits |
This class helps to compute the number of entries that should be held in
memory so that memory consumption is limited.
|
SingleTupleBag |
A simple performant implementation of the DataBag
interface which only holds a single tuple.
|
SizeUtil |
Utility functions for estimating size of objects of pig types
|
SortedDataBag |
An ordered collection of Tuples (possibly) with multiples.
|
SortedSpillBag |
Common functionality for proactively spilling bags that need to keep the data
sorted.
|
TargetedTuple |
A tuple composed with the operators to which
it needs be attached
|
TimestampedTuple | |
TupleFactory |
A factory to construct tuples.
|
UnlimitedNullTuple | |
WritableByteArray |
A reusable byte buffer implementation
|
Enum | Description |
---|---|
SchemaTupleClassGenerator.GenContext |
The GenContext mechanism provides a level of control in where SchemaTupleFactories
are used.
|
Exception | Description |
---|---|
FieldIsNullException |
Annotation Type | Description |
---|---|
SchemaTupleClassGenerator.GenContext.GenerateForceLoad | |
SchemaTupleClassGenerator.GenContext.GenerateForeach | |
SchemaTupleClassGenerator.GenContext.GenerateFrJoin | |
SchemaTupleClassGenerator.GenContext.GenerateMergeJoin | |
SchemaTupleClassGenerator.GenContext.GenerateUdf |
These annotations are used to mark a given SchemaTuple with
the context in which is was intended to be generated.
|
This package contains implementations of Pig specific data types as well as support functions for reading, writing, and using all Pig data types.
Whenever possible, Pig utilizes Java provided data types. These include Integer, Long, Float, Double, Boolean, String, and Map. Tuple, Bag, and DataByteArray are implemented in this package.
The choice was made to utilize Java provided types for two main reasons. One, it minimizes the burden on UDF developers, as they will have full access to these types with no need to convert to and from Pig specific types. Two, maintenance costs will be lower as there is no need to implement and maintain Pig specific data classes. The drawback is that the only common parent of all these types is Object. Thus Pig is often required to treat its data objects as Objects and then implement static methods to manipulate these Objects, rather than being able to define a PigDatum class with common funcitons.
Three data types were implemented as Pig specific classes:
DataByteArray
, Tuple
,
and DataBag
.
DataByteArray represents an array of bytes, with no interpretation of those bytes provided or assumed. This could have been represented as byte[], but a separate class was constructed to provide common functions needed to manipulate these objects.
Tuple represents an ordered collection of data elements. Every field in a
tuple can contain any Pig data type. Tuple is presented as an interface to
allow differing implementations in cases where users have unique
representations of their data that they wish to preserve in their in memory
representations. The TupleFactory
is an
abstract class, to enable a user who has defined his own tuples to provide a
factory that creates those tuples. Default implementations of Tuple and
TupleFactory are provided and used by default.
DataBag represents a collection of Tuples. DataBags can be of default type
(no extra features), sorted (tuples are sorted according to a provided
comparator function), or distinct (no duplicate tuples). As with Tuple,
DataBag is presented as an interface, and
BagFactory
is an abstract class. Default implementations of DataBag,
BagFactory, and all three types of bags are provided.
Copyright © 2007-2017 The Apache Software Foundation