public abstract class DefaultAbstractBag extends Object implements DataBag
Modifier and Type | Class and Description |
---|---|
static class |
DefaultAbstractBag.BagDelimiterTuple |
static class |
DefaultAbstractBag.EndBag |
static class |
DefaultAbstractBag.StartBag |
Modifier and Type | Field and Description |
---|---|
static Tuple |
endBag |
protected static int |
MAX_SPILL_FILES |
protected Collection<Tuple> |
mContents |
protected long |
mSize |
protected FileList |
mSpillFiles |
static Tuple |
startBag |
Constructor and Description |
---|
DefaultAbstractBag() |
Modifier and Type | Method and Description |
---|---|
void |
add(Tuple t)
Add a tuple to the bag.
|
void |
addAll(Collection<Tuple> c) |
void |
addAll(DataBag b)
Add contents of a bag to the bag.
|
void |
addAll(Iterable<Tuple> iterable)
Add contents of an iterable (a collection or a DataBag)
|
void |
clear()
Clear out the contents of the bag, both on disk and in memory.
|
int |
compareTo(Object other)
This method is potentially very expensive since it may require a
sort of the bag; don't call it unless you have to.
|
boolean |
equals(Object other) |
long |
getMemorySize()
Return the size of memory usage.
|
protected DataOutputStream |
getSpillFile()
Get a file to spill contents to.
|
int |
hashCode() |
protected void |
incSpillCount(Enum counter) |
protected void |
incSpillCount(Enum counter,
long numRecsSpilled) |
protected void |
markSpillableIfNecessary()
All bag implementations that can get big enough to be spilled
should call this method after every time they add an element.
|
void |
markStale(boolean stale)
This is used by FuncEvalSpec.FakeDataBag.
|
void |
readFields(DataInput in)
Read a bag from disk.
|
protected void |
reportProgress()
Report progress to HDFS.
|
protected void |
sampleContents()
Sample every SPILL_SAMPLE_FREQUENCYth tuple
until we reach a max of SPILL_SAMPLE_SIZE
to get an estimate of the tuple sizes.
|
long |
size()
Get the number of elements in the bag, both in memory and on disk.
|
String |
toString()
Write the bag into a string.
|
protected void |
warn(String msg,
Enum warningEnum,
Throwable e) |
void |
write(DataOutput out)
Write a bag's contents to disk.
|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
isDistinct, isSorted, iterator
forEach, spliterator
protected Collection<Tuple> mContents
protected FileList mSpillFiles
protected long mSize
public static final Tuple startBag
public static final Tuple endBag
protected static final int MAX_SPILL_FILES
public long size()
protected void sampleContents()
public void add(Tuple t)
protected void markSpillableIfNecessary()
public void addAll(DataBag b)
DataBag
public void addAll(Collection<Tuple> c)
public void addAll(Iterable<Tuple> iterable)
iterable
- a Collection or DataBag to add contents ofpublic long getMemorySize()
getMemorySize
in interface Spillable
public void clear()
public int compareTo(Object other)
compareTo
in interface Comparable
public void write(DataOutput out) throws IOException
write
in interface org.apache.hadoop.io.Writable
out
- DataOutput to write data to.IOException
- (passes it on from underlying calls).public void readFields(DataInput in) throws IOException
readFields
in interface org.apache.hadoop.io.Writable
in
- DataInput to read data from.IOException
- (passes it on from underlying calls).public void markStale(boolean stale)
protected DataOutputStream getSpillFile() throws IOException
IOException
protected void reportProgress()
protected void incSpillCount(Enum counter)
protected void incSpillCount(Enum counter, long numRecsSpilled)
Copyright © 2007-2017 The Apache Software Foundation