org.apache.pig.data
Class InternalSortedBag

java.lang.Object
  extended by org.apache.pig.data.DefaultAbstractBag
      extended by org.apache.pig.data.SelfSpillBag
          extended by org.apache.pig.data.SortedSpillBag
              extended by org.apache.pig.data.InternalSortedBag
All Implemented Interfaces:
Serializable, Comparable, Iterable<Tuple>, org.apache.hadoop.io.Writable, org.apache.hadoop.io.WritableComparable, DataBag, Spillable

public class InternalSortedBag
extends SortedSpillBag

An ordered collection of Tuples (possibly) with multiples. Data is stored unsorted as it comes in, and only sorted when it is time to dump it to a file or when the first iterator is requested. Experementation found this to be the faster than storing it sorted to begin with. We allow a user defined comparator, but provide a default comparator in cases where the user doesn't specify one. This bag is not registered with SpillableMemoryManager. It calculates the number of tuples to hold in memory and spill pro-actively into files.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.pig.data.SelfSpillBag
SelfSpillBag.MemoryLimits
 
Nested classes/interfaces inherited from class org.apache.pig.data.DefaultAbstractBag
DefaultAbstractBag.BagDelimiterTuple, DefaultAbstractBag.EndBag, DefaultAbstractBag.StartBag
 
Field Summary
 
Fields inherited from class org.apache.pig.data.SelfSpillBag
memLimit
 
Fields inherited from class org.apache.pig.data.DefaultAbstractBag
endBag, MAX_SPILL_FILES, mContents, mSize, mSpillFiles, startBag
 
Constructor Summary
InternalSortedBag()
           
InternalSortedBag(Comparator<Tuple> comp)
           
InternalSortedBag(int bagCount, Comparator<Tuple> comp)
           
InternalSortedBag(int bagCount, float percent, Comparator<Tuple> comp)
           
 
Method Summary
 void add(Tuple t)
          Add a tuple to the bag.
 boolean isDistinct()
          Find out if the bag is distinct.
 boolean isSorted()
          Find out if the bag is sorted.
 Iterator<Tuple> iterator()
          Get an iterator to the bag.
 long proactive_spill(Comparator<Tuple> comp)
          Sort contents of mContents and write them to disk
 long spill()
          Instructs an object to spill whatever it can to disk and release references to any data structures it spills.
 
Methods inherited from class org.apache.pig.data.DefaultAbstractBag
addAll, addAll, addAll, clear, compareTo, equals, getMemorySize, getSpillFile, hashCode, incSpillCount, incSpillCount, markSpillableIfNecessary, markStale, readFields, reportProgress, sampleContents, size, toString, warn, write
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

InternalSortedBag

public InternalSortedBag()

InternalSortedBag

public InternalSortedBag(Comparator<Tuple> comp)

InternalSortedBag

public InternalSortedBag(int bagCount,
                         Comparator<Tuple> comp)

InternalSortedBag

public InternalSortedBag(int bagCount,
                         float percent,
                         Comparator<Tuple> comp)
Method Detail

add

public void add(Tuple t)
Description copied from class: DefaultAbstractBag
Add a tuple to the bag.

Specified by:
add in interface DataBag
Overrides:
add in class DefaultAbstractBag
Parameters:
t - tuple to add.

isSorted

public boolean isSorted()
Description copied from interface: DataBag
Find out if the bag is sorted.

Returns:
true if this is a sorted data bag, false otherwise.

isDistinct

public boolean isDistinct()
Description copied from interface: DataBag
Find out if the bag is distinct.

Returns:
true if the bag is a distinct bag, false otherwise.

iterator

public Iterator<Tuple> iterator()
Description copied from interface: DataBag
Get an iterator to the bag. For default and distinct bags, no particular order is guaranteed. For sorted bags the order is guaranteed to be sorted according to the provided comparator.

Returns:
tuple iterator

spill

public long spill()
Description copied from interface: Spillable
Instructs an object to spill whatever it can to disk and release references to any data structures it spills.

Returns:
number of objects spilled.

proactive_spill

public long proactive_spill(Comparator<Tuple> comp)
Description copied from class: SortedSpillBag
Sort contents of mContents and write them to disk

Overrides:
proactive_spill in class SortedSpillBag
Parameters:
comp - Comparator to sort contents of mContents
Returns:
number of tuples spilled


Copyright © 2007-2012 The Apache Software Foundation