org.apache.pig.data
Class DefaultAbstractBag

java.lang.Object
  extended by org.apache.pig.data.DefaultAbstractBag
All Implemented Interfaces:
Serializable, Comparable, Iterable<Tuple>, org.apache.hadoop.io.Writable, org.apache.hadoop.io.WritableComparable, DataBag, Spillable
Direct Known Subclasses:
DefaultDataBag, DistinctDataBag, InternalCachedBag, SortedDataBag, SortedSpillBag

public abstract class DefaultAbstractBag
extends Object
implements DataBag

Default implementation of DataBag. This is the an abstract class used as a parent for all three of the types of data bags.

See Also:
Serialized Form

Nested Class Summary
static class DefaultAbstractBag.BagDelimiterTuple
           
static class DefaultAbstractBag.EndBag
           
static class DefaultAbstractBag.StartBag
           
 
Field Summary
static Tuple endBag
           
protected static int MAX_SPILL_FILES
           
protected  Collection<Tuple> mContents
           
protected  int mLastContentsSize
           
protected  long mMemSize
           
protected  long mSize
           
protected  FileList mSpillFiles
           
static Tuple startBag
           
 
Constructor Summary
DefaultAbstractBag()
           
 
Method Summary
 void add(Tuple t)
          Add a tuple to the bag.
 void addAll(Collection<Tuple> c)
          Add contents of a container to the bag.
 void addAll(DataBag b)
          Add contents of a bag to the bag.
 void clear()
          Clear out the contents of the bag, both on disk and in memory.
 int compareTo(Object other)
          This method is potentially very expensive since it may require a sort of the bag; don't call it unless you have to.
 boolean equals(Object other)
           
 long getMemorySize()
          Return the size of memory usage.
protected  DataOutputStream getSpillFile()
          Get a file to spill contents to.
 int hashCode()
           
protected  void incSpillCount(Enum counter)
           
protected  void incSpillCount(Enum counter, long numRecsSpilled)
           
 void markStale(boolean stale)
          This is used by FuncEvalSpec.FakeDataBag.
 void readFields(DataInput in)
          Read a bag from disk.
protected  void reportProgress()
          Report progress to HDFS.
 long size()
          Get the number of elements in the bag, both in memory and on disk.
 String toString()
          Write the bag into a string.
protected  void warn(String msg, Enum warningEnum, Exception e)
           
 void write(DataOutput out)
          Write a bag's contents to disk.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface org.apache.pig.data.DataBag
isDistinct, isSorted, iterator
 
Methods inherited from interface org.apache.pig.impl.util.Spillable
spill
 

Field Detail

mContents

protected Collection<Tuple> mContents

mSpillFiles

protected FileList mSpillFiles

mSize

protected long mSize

mLastContentsSize

protected int mLastContentsSize

mMemSize

protected long mMemSize

startBag

public static final Tuple startBag

endBag

public static final Tuple endBag

MAX_SPILL_FILES

protected static final int MAX_SPILL_FILES
See Also:
Constant Field Values
Constructor Detail

DefaultAbstractBag

public DefaultAbstractBag()
Method Detail

size

public long size()
Get the number of elements in the bag, both in memory and on disk.

Specified by:
size in interface DataBag
Returns:
number of elements in the bag

add

public void add(Tuple t)
Add a tuple to the bag.

Specified by:
add in interface DataBag
Parameters:
t - tuple to add.

addAll

public void addAll(DataBag b)
Add contents of a bag to the bag.

Specified by:
addAll in interface DataBag
Parameters:
b - bag to add contents of.

addAll

public void addAll(Collection<Tuple> c)
Add contents of a container to the bag.

Parameters:
c - Collection to add contents of.

getMemorySize

public long getMemorySize()
Return the size of memory usage.

Specified by:
getMemorySize in interface Spillable
Returns:
estimated in memory size.

clear

public void clear()
Clear out the contents of the bag, both on disk and in memory. Any attempts to read after this is called will produce undefined results.

Specified by:
clear in interface DataBag

compareTo

public int compareTo(Object other)
This method is potentially very expensive since it may require a sort of the bag; don't call it unless you have to.

Specified by:
compareTo in interface Comparable

equals

public boolean equals(Object other)
Overrides:
equals in class Object

write

public void write(DataOutput out)
           throws IOException
Write a bag's contents to disk.

Specified by:
write in interface org.apache.hadoop.io.Writable
Parameters:
out - DataOutput to write data to.
Throws:
IOException - (passes it on from underlying calls).

readFields

public void readFields(DataInput in)
                throws IOException
Read a bag from disk.

Specified by:
readFields in interface org.apache.hadoop.io.Writable
Parameters:
in - DataInput to read data from.
Throws:
IOException - (passes it on from underlying calls).

markStale

public void markStale(boolean stale)
This is used by FuncEvalSpec.FakeDataBag.

Specified by:
markStale in interface DataBag
Parameters:
stale - Set stale state.

toString

public String toString()
Write the bag into a string.

Overrides:
toString in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

getSpillFile

protected DataOutputStream getSpillFile()
                                 throws IOException
Get a file to spill contents to. The file will be registered in the mSpillFiles array.

Returns:
stream to write tuples to.
Throws:
IOException

reportProgress

protected void reportProgress()
Report progress to HDFS.


warn

protected void warn(String msg,
                    Enum warningEnum,
                    Exception e)

incSpillCount

protected void incSpillCount(Enum counter)

incSpillCount

protected void incSpillCount(Enum counter,
                             long numRecsSpilled)


Copyright © ${year} The Apache Software Foundation