org.apache.hadoop.zebra.io
Class BasicTable.Reader

java.lang.Object
  extended by org.apache.hadoop.zebra.io.BasicTable.Reader
All Implemented Interfaces:
Closeable
Enclosing class:
BasicTable

public static class BasicTable.Reader
extends Object
implements Closeable

BasicTable reader.


Nested Class Summary
static class BasicTable.Reader.RangeSplit
          A range-based split on the metaReadertable.The content of the split is implementation-dependent.
static class BasicTable.Reader.RowSplit
          A row-based split on the zebra table;
 
Constructor Summary
BasicTable.Reader(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
          Create a BasicTable reader.
BasicTable.Reader(org.apache.hadoop.fs.Path path, String[] deletedCGs, org.apache.hadoop.conf.Configuration conf)
           
 
Method Summary
 void close()
          Close the BasicTable for reading.
 BlockDistribution getBlockDistribution(BasicTable.Reader.RangeSplit split)
          Given a split range, calculate how the file data that fall into the range are distributed among hosts.
 BlockDistribution getBlockDistribution(BasicTable.Reader.RowSplit split)
          Given a row-based split, calculate how the file data that fall into the split are distributed among hosts.
 String getDeletedCGs()
           
static String getDeletedCGs(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
           
 KeyDistribution getKeyDistribution(int n, int nTables, BlockDistribution lastBd)
          Collect some key samples and use them to partition the table.
 DataInputStream getMetaBlock(String name)
          Obtain an input stream for reading a meta block.
 String getName(int i)
           
 String getPath()
          Get the path to the table.
 org.apache.hadoop.fs.PathFilter getPathFilter(org.apache.hadoop.conf.Configuration conf)
          Get the path filter used by the table.
 int getRowSplitCGIndex()
          Get index of the column group that will be used for row-based split.
 TableScanner getScanner(BasicTable.Reader.RangeSplit split, boolean closeReader)
          Get a scanner that reads a consecutive number of rows as defined in the BasicTable.Reader.RangeSplit object, which should be obtained from previous calls of rangeSplit(int).
 TableScanner getScanner(boolean closeReader, BasicTable.Reader.RowSplit rowSplit)
          Get a scanner that reads a consecutive number of rows as defined in the BasicTable.Reader.RowSplit object.
 TableScanner getScanner(org.apache.hadoop.io.BytesWritable beginKey, org.apache.hadoop.io.BytesWritable endKey, boolean closeReader)
          Get a scanner that reads all rows whose row keys fall in a specific range.
 Schema getSchema()
          Get the schema of the table.
static Schema getSchema(org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf)
          Get the BasicTable schema without loading the full table index.
 SortInfo getSortInfo()
           
 BasicTableStatus getStatus()
          Get the status of the BasicTable.
 boolean isSorted()
          Is the Table sorted?
 List<BasicTable.Reader.RangeSplit> rangeSplit(int n)
          Split the table into at most n parts.
 void rearrangeFileIndices(org.apache.hadoop.fs.FileStatus[] fileStatus)
          Rearrange the files according to the column group index ordering
 List<BasicTable.Reader.RowSplit> rowSplit(long[] starts, long[] lengths, org.apache.hadoop.fs.Path[] paths, int splitCGIndex, int[] batchSizes, int numBatches)
          We already use FileInputFormat to create byte offset-based input splits.
 void setProjection(String projection)
          Set the projection for the reader.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BasicTable.Reader

public BasicTable.Reader(org.apache.hadoop.fs.Path path,
                         org.apache.hadoop.conf.Configuration conf)
                  throws IOException
Create a BasicTable reader.

Parameters:
path - The directory path to the BasicTable.
conf - Optional configuration parameters.
Throws:
IOException

BasicTable.Reader

public BasicTable.Reader(org.apache.hadoop.fs.Path path,
                         String[] deletedCGs,
                         org.apache.hadoop.conf.Configuration conf)
                  throws IOException
Throws:
IOException
Method Detail

isSorted

public boolean isSorted()
Is the Table sorted?

Returns:
Whether the table is sorted.

getSortInfo

public SortInfo getSortInfo()
Returns:
the list of sorted columns

getName

public String getName(int i)
Returns:
the name of i-th column group

setProjection

public void setProjection(String projection)
                   throws ParseException,
                          IOException
Set the projection for the reader. This will affect calls to getScanner(RangeSplit, boolean), getScanner(BytesWritable, BytesWritable, boolean), getStatus(), getSchema().

Parameters:
projection - The projection on the BasicTable for subsequent read operations. For this version of implementation, the projection is a comma separated list of column names, such as "FirstName, LastName, Sex, Department". If we want select all columns, pass projection==null.
Throws:
IOException
ParseException

getStatus

public BasicTableStatus getStatus()
                           throws IOException
Get the status of the BasicTable.

Throws:
IOException

getBlockDistribution

public BlockDistribution getBlockDistribution(BasicTable.Reader.RangeSplit split)
                                       throws IOException
Given a split range, calculate how the file data that fall into the range are distributed among hosts.

Parameters:
split - The range-based split. Can be null to indicate the whole TFile.
Returns:
An object that conveys how blocks fall in the split are distributed across hosts.
Throws:
IOException
See Also:
rangeSplit(int)

getBlockDistribution

public BlockDistribution getBlockDistribution(BasicTable.Reader.RowSplit split)
                                       throws IOException
Given a row-based split, calculate how the file data that fall into the split are distributed among hosts.

Parameters:
split - The row-based split. Cannot be null.
Returns:
An object that conveys how blocks fall into the split are distributed across hosts.
Throws:
IOException

getKeyDistribution

public KeyDistribution getKeyDistribution(int n,
                                          int nTables,
                                          BlockDistribution lastBd)
                                   throws IOException
Collect some key samples and use them to partition the table. Only applicable to sorted BasicTable. The returned KeyDistribution object also contains information on how data are distributed for each key-partitioned bucket.

Parameters:
n - Targeted size of the sampling.
nTables - Number of tables in union
Returns:
KeyDistribution object.
Throws:
IOException

getScanner

public TableScanner getScanner(org.apache.hadoop.io.BytesWritable beginKey,
                               org.apache.hadoop.io.BytesWritable endKey,
                               boolean closeReader)
                        throws IOException
Get a scanner that reads all rows whose row keys fall in a specific range. Only applicable to sorted BasicTable.

Parameters:
beginKey - The begin key of the scan range. If null, start from the first row in the table.
endKey - The end key of the scan range. If null, scan till the last row in the table.
closeReader - close the underlying Reader object when we close the scanner. Should be set to true if we have only one scanner on top of the reader, so that we should release resources after the scanner is closed.
Returns:
A scanner object.
Throws:
IOException

getScanner

public TableScanner getScanner(BasicTable.Reader.RangeSplit split,
                               boolean closeReader)
                        throws IOException,
                               ParseException
Get a scanner that reads a consecutive number of rows as defined in the BasicTable.Reader.RangeSplit object, which should be obtained from previous calls of rangeSplit(int).

Parameters:
split - The split range. If null, get a scanner to read the complete table.
closeReader - close the underlying Reader object when we close the scanner. Should be set to true if we have only one scanner on top of the reader, so that we should release resources after the scanner is closed.
Returns:
A scanner object.
Throws:
IOException
ParseException

getScanner

public TableScanner getScanner(boolean closeReader,
                               BasicTable.Reader.RowSplit rowSplit)
                        throws IOException,
                               ParseException,
                               ParseException
Get a scanner that reads a consecutive number of rows as defined in the BasicTable.Reader.RowSplit object.

Parameters:
closeReader - close the underlying Reader object when we close the scanner. Should be set to true if we have only one scanner on top of the reader, so that we should release resources after the scanner is closed.
rowSplit - split based on row numbers.
Returns:
A scanner object.
Throws:
IOException
ParseException

getSchema

public Schema getSchema()
Get the schema of the table. The schema may be different from getSchema(Path, Configuration) if a projection has been set on the table.

Returns:
The schema of the BasicTable.

getSchema

public static Schema getSchema(org.apache.hadoop.fs.Path path,
                               org.apache.hadoop.conf.Configuration conf)
                        throws IOException
Get the BasicTable schema without loading the full table index.

Parameters:
path - The path to the BasicTable.
conf -
Returns:
The logical Schema of the table (all columns).
Throws:
IOException

getPath

public String getPath()
Get the path to the table.

Returns:
The path string to the table.

getPathFilter

public org.apache.hadoop.fs.PathFilter getPathFilter(org.apache.hadoop.conf.Configuration conf)
Get the path filter used by the table.


rangeSplit

public List<BasicTable.Reader.RangeSplit> rangeSplit(int n)
                                              throws IOException
Split the table into at most n parts.

Parameters:
n - Maximum number of parts in the output list.
Returns:
A list of RangeSplit objects, each of which can be used to construct TableScanner later.
Throws:
IOException

rowSplit

public List<BasicTable.Reader.RowSplit> rowSplit(long[] starts,
                                                 long[] lengths,
                                                 org.apache.hadoop.fs.Path[] paths,
                                                 int splitCGIndex,
                                                 int[] batchSizes,
                                                 int numBatches)
                                          throws IOException
We already use FileInputFormat to create byte offset-based input splits. Their information is encoded in starts, lengths and paths. This method is to wrap this information to form RowSplit objects at basic table level.

Parameters:
starts - array of starting byte of fileSplits.
lengths - array of length of fileSplits.
paths - array of path of fileSplits.
splitCGIndex - index of column group that is used to create fileSplits.
Returns:
A list of RowSplit objects, each of which can be used to construct a TableScanner later.
Throws:
IOException

rearrangeFileIndices

public void rearrangeFileIndices(org.apache.hadoop.fs.FileStatus[] fileStatus)
                          throws IOException
Rearrange the files according to the column group index ordering

Parameters:
filestatus - array of FileStatus to be rearraged on
Throws:
IOException

getRowSplitCGIndex

public int getRowSplitCGIndex()
                       throws IOException
Get index of the column group that will be used for row-based split.

Throws:
IOException

close

public void close()
           throws IOException
Close the BasicTable for reading. Resources are released.

Specified by:
close in interface Closeable
Throws:
IOException

getDeletedCGs

public String getDeletedCGs()

getDeletedCGs

public static String getDeletedCGs(org.apache.hadoop.fs.Path path,
                                   org.apache.hadoop.conf.Configuration conf)
                            throws IOException
Throws:
IOException

getMetaBlock

public DataInputStream getMetaBlock(String name)
                             throws MetaBlockDoesNotExist,
                                    IOException
Obtain an input stream for reading a meta block.

Parameters:
name - The name of the meta block.
Returns:
The input stream for reading the meta block.
Throws:
IOException
MetaBlockDoesNotExist


Copyright © 2007-2012 The Apache Software Foundation