org.apache.pig.data
Class InternalSortedBag

java.lang.Object
  extended by org.apache.pig.data.DefaultAbstractBag
      extended by org.apache.pig.data.SortedSpillBag
          extended by org.apache.pig.data.InternalSortedBag
All Implemented Interfaces:
Serializable, Comparable, Iterable<Tuple>, org.apache.hadoop.io.Writable, org.apache.hadoop.io.WritableComparable, DataBag, Spillable

public class InternalSortedBag
extends SortedSpillBag

An ordered collection of Tuples (possibly) with multiples. Data is stored unsorted as it comes in, and only sorted when it is time to dump it to a file or when the first iterator is requested. Experementation found this to be the faster than storing it sorted to begin with. We allow a user defined comparator, but provide a default comparator in cases where the user doesn't specify one. This bag is not registered with SpillableMemoryManager. It calculates the number of tuples to hold in memory and spill pro-actively into files.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.pig.data.DefaultAbstractBag
DefaultAbstractBag.BagDelimiterTuple, DefaultAbstractBag.EndBag, DefaultAbstractBag.StartBag
 
Field Summary
 
Fields inherited from class org.apache.pig.data.DefaultAbstractBag
endBag, MAX_SPILL_FILES, mContents, mLastContentsSize, mMemSize, mSize, mSpillFiles, startBag
 
Constructor Summary
InternalSortedBag()
           
InternalSortedBag(Comparator<Tuple> comp)
           
InternalSortedBag(int bagCount, Comparator<Tuple> comp)
           
InternalSortedBag(int bagCount, double percent, Comparator<Tuple> comp)
           
 
Method Summary
 void add(Tuple t)
          Add a tuple to the bag.
 void addAll(Collection<Tuple> c)
          Add contents of a container to the bag.
 void addAll(DataBag b)
          Add contents of a bag to the bag.
 boolean isDistinct()
          Find out if the bag is distinct.
 boolean isSorted()
          Find out if the bag is sorted.
 Iterator<Tuple> iterator()
          Get an iterator to the bag.
 long spill()
          Instructs an object to spill whatever it can to disk and release references to any data structures it spills.
 
Methods inherited from class org.apache.pig.data.SortedSpillBag
proactive_spill
 
Methods inherited from class org.apache.pig.data.DefaultAbstractBag
clear, compareTo, equals, getMemorySize, getSpillFile, hashCode, incSpillCount, incSpillCount, markStale, readFields, reportProgress, size, toString, warn, write
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

InternalSortedBag

public InternalSortedBag()

InternalSortedBag

public InternalSortedBag(Comparator<Tuple> comp)

InternalSortedBag

public InternalSortedBag(int bagCount,
                         Comparator<Tuple> comp)

InternalSortedBag

public InternalSortedBag(int bagCount,
                         double percent,
                         Comparator<Tuple> comp)
Method Detail

add

public void add(Tuple t)
Description copied from class: DefaultAbstractBag
Add a tuple to the bag.

Specified by:
add in interface DataBag
Overrides:
add in class DefaultAbstractBag
Parameters:
t - tuple to add.

addAll

public void addAll(DataBag b)
Description copied from class: DefaultAbstractBag
Add contents of a bag to the bag.

Specified by:
addAll in interface DataBag
Overrides:
addAll in class DefaultAbstractBag
Parameters:
b - bag to add contents of.

addAll

public void addAll(Collection<Tuple> c)
Description copied from class: DefaultAbstractBag
Add contents of a container to the bag.

Overrides:
addAll in class DefaultAbstractBag
Parameters:
c - Collection to add contents of.

isSorted

public boolean isSorted()
Description copied from interface: DataBag
Find out if the bag is sorted.

Returns:
true if this is a sorted data bag, false otherwise.

isDistinct

public boolean isDistinct()
Description copied from interface: DataBag
Find out if the bag is distinct.

Returns:
true if the bag is a distinct bag, false otherwise.

iterator

public Iterator<Tuple> iterator()
Description copied from interface: DataBag
Get an iterator to the bag. For default and distinct bags, no particular order is guaranteed. For sorted bags the order is guaranteed to be sorted according to the provided comparator.

Returns:
tuple iterator

spill

public long spill()
Description copied from interface: Spillable
Instructs an object to spill whatever it can to disk and release references to any data structures it spills.

Returns:
number of objects spilled.


Copyright © ${year} The Apache Software Foundation