org.apache.pig.data
Class TupleFactory

java.lang.Object
  extended by org.apache.pig.data.TupleFactory
Direct Known Subclasses:
BinSedesTupleFactory

@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class TupleFactory
extends Object

A factory to construct tuples. This class is abstract so that users can override the tuple factory if they desire to provide their own that returns their implementation of a tuple. If the property pig.data.tuple.factory.name is set to a class name and pig.data.tuple.factory.jar is set to a URL pointing to a jar that contains the above named class, then getInstance() will create a an instance of the named class using the indicated jar. Otherwise, it will create an instance of DefaultTupleFactory.


Constructor Summary
protected TupleFactory()
           
 
Method Summary
static TupleFactory getInstance()
          Get a reference to the singleton factory.
abstract  Tuple newTuple()
          Create an empty tuple.
abstract  Tuple newTuple(int size)
          Create a tuple with size fields.
abstract  Tuple newTuple(List c)
          Create a tuple from the provided list of objects.
abstract  Tuple newTuple(Object datum)
          Create a tuple with a single element.
abstract  Tuple newTupleNoCopy(List list)
          Create a tuple from a provided list of objects, keeping the provided list.
static void resetSelf()
          Provided for testing purposes only.
abstract  Class<? extends Tuple> tupleClass()
          Return the actual class representing a tuple that the implementing factory will be returning.
 Class<? extends TupleRawComparator> tupleRawComparatorClass()
          Return the actual class implementing the raw comparator for tuples that the factory will be returning.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TupleFactory

protected TupleFactory()
Method Detail

getInstance

public static TupleFactory getInstance()
Get a reference to the singleton factory.

Returns:
The TupleFactory to use to construct tuples.

newTuple

public abstract Tuple newTuple()
Create an empty tuple. This should be used as infrequently as possible, use newTuple(int) instead.

Returns:
Empty new tuple.

newTuple

public abstract Tuple newTuple(int size)
Create a tuple with size fields. Whenever possible this is preferred over the null constructor, as the constructor can preallocate the size of the container holding the fields. Once this is called, it is legal to call Tuple.set(x, object), where x < size.

Parameters:
size - Number of fields in the tuple.
Returns:
Tuple with size fields

newTuple

public abstract Tuple newTuple(List c)
Create a tuple from the provided list of objects. The underlying list will be copied.

Parameters:
c - List of objects to use as the fields of the tuple.
Returns:
A tuple with the list objects as its fields

newTupleNoCopy

public abstract Tuple newTupleNoCopy(List list)
Create a tuple from a provided list of objects, keeping the provided list. The new tuple will take over ownership of the provided list.

Parameters:
list - List of objects that will become the fields of the tuple.
Returns:
A tuple with the list objects as its fields

newTuple

public abstract Tuple newTuple(Object datum)
Create a tuple with a single element. This is useful because of the fact that bags (currently) only take tuples, we often end up sticking a single element in a tuple in order to put it in a bag.

Parameters:
datum - Datum to put in the tuple.
Returns:
A tuple with one field

tupleClass

public abstract Class<? extends Tuple> tupleClass()
Return the actual class representing a tuple that the implementing factory will be returning. This is needed because Hadoop needs to know the exact class we will be using for input and output.

Returns:
Class that implements tuple.

resetSelf

public static void resetSelf()
Provided for testing purposes only. This function should never be called by anybody but the unit tests.


tupleRawComparatorClass

public Class<? extends TupleRawComparator> tupleRawComparatorClass()
Return the actual class implementing the raw comparator for tuples that the factory will be returning. Ovverride this to allow Hadoop to speed up tuple sorting. The actual returned class should know the serialization details for the tuple. The default implementation (PigTupleDefaultRawComparator) will serialize the data before comparison

Returns:
Class that implements tuple raw comparator.


Copyright © ${year} The Apache Software Foundation