All Implemented Interfaces:
java.io.Serializable, java.lang.Comparable, java.lang.Iterable<
Tuple>, org.apache.hadoop.io.Writable, org.apache.hadoop.io.WritableComparable, DataBag, Spillable
An unordered collection of Tuples with no multiples. Data is
stored without duplicates as it comes in. When it is time to spill,
that data is sorted and written to disk. It must also be sorted upon
the first read, otherwise if a spill happened after that the iterators
would have no way to find their place in the new file. The data is
stored in a HashSet. When it is time to sort it is placed in an
ArrayList and then sorted. Dispite all these machinations, this was
found to be faster than storing it in a TreeSet.
See Also: Serialized Form
Modifier and Type
Method and Description
( add Tuple t)
Add a tuple to the bag.
Find out if the bag is distinct.
Find out if the bag is sorted.
Get an iterator to the bag.
Get the number of elements in the bag, both in memory and on disk.
Instructs an object to spill whatever it can to disk and release
references to any data structures it spills.
Methods inherited from class org.apache.pig.data.
DefaultAbstractBag addAll, addAll, addAll, clear, compareTo, equals, getMemorySize, getSpillFile, hashCode, incSpillCount, incSpillCount, markSpillableIfNecessary, markStale, readFields, reportProgress, sampleContents, toString, warn, write
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Description copied from interface: DataBag
Get an iterator to the bag. For default and distinct bags,
no particular order is guaranteed. For sorted bags the order
is guaranteed to be sorted according
to the provided comparator.
Returns: tuple iterator
Copyright © 2007-2012 The Apache Software Foundation