org.apache.pig.data
Class SortedDataBag

java.lang.Object
  extended by org.apache.pig.data.DefaultAbstractBag
      extended by org.apache.pig.data.SortedDataBag
All Implemented Interfaces:
Serializable, Comparable, Iterable<Tuple>, org.apache.hadoop.io.Writable, org.apache.hadoop.io.WritableComparable, DataBag, Spillable

public class SortedDataBag
extends DefaultAbstractBag

An ordered collection of Tuples (possibly) with multiples. Data is stored unsorted as it comes in, and only sorted when it is time to dump it to a file or when the first iterator is requested. Experementation found this to be the faster than storing it sorted to begin with. We allow a user defined comparator, but provide a default comparator in cases where the user doesn't specify one.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.pig.data.DefaultAbstractBag
DefaultAbstractBag.BagDelimiterTuple, DefaultAbstractBag.EndBag, DefaultAbstractBag.StartBag
 
Field Summary
 
Fields inherited from class org.apache.pig.data.DefaultAbstractBag
endBag, MAX_SPILL_FILES, mContents, mSize, mSpillFiles, startBag
 
Constructor Summary
SortedDataBag(Comparator<Tuple> comp)
           
 
Method Summary
 boolean isDistinct()
          Find out if the bag is distinct.
 boolean isSorted()
          Find out if the bag is sorted.
 Iterator<Tuple> iterator()
          Get an iterator to the bag.
 long spill()
          Instructs an object to spill whatever it can to disk and release references to any data structures it spills.
 
Methods inherited from class org.apache.pig.data.DefaultAbstractBag
add, addAll, addAll, addAll, clear, compareTo, equals, getMemorySize, getSpillFile, hashCode, incSpillCount, incSpillCount, markSpillableIfNecessary, markStale, readFields, reportProgress, sampleContents, size, toString, warn, write
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SortedDataBag

public SortedDataBag(Comparator<Tuple> comp)
Parameters:
comp - Comparator to use to do the sorting. If null, DefaultComparator will be used.
Method Detail

isSorted

public boolean isSorted()
Description copied from interface: DataBag
Find out if the bag is sorted.

Returns:
true if this is a sorted data bag, false otherwise.

isDistinct

public boolean isDistinct()
Description copied from interface: DataBag
Find out if the bag is distinct.

Returns:
true if the bag is a distinct bag, false otherwise.

iterator

public Iterator<Tuple> iterator()
Description copied from interface: DataBag
Get an iterator to the bag. For default and distinct bags, no particular order is guaranteed. For sorted bags the order is guaranteed to be sorted according to the provided comparator.

Returns:
tuple iterator

spill

public long spill()
Description copied from interface: Spillable
Instructs an object to spill whatever it can to disk and release references to any data structures it spills.

Returns:
number of objects spilled.


Copyright © 2007-2012 The Apache Software Foundation