org.apache.hadoop.zebra.mapred
Class TableInputFormat

java.lang.Object
  extended by org.apache.hadoop.zebra.mapred.TableInputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>

Deprecated.

@Deprecated
public class TableInputFormat
extends Object
implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>

InputFormat class for reading one or more BasicTables. Usage Example:

In the main program, add the following code.

 jobConf.setInputFormat(TableInputFormat.class);
 TableInputFormat.setInputPaths(jobConf, new Path("path/to/table1", new Path("path/to/table2");
 TableInputFormat.setProjection(jobConf, "Name, Salary, BonusPct");
 
The above code does the following things: The user Mapper code should look like the following:
 static class MyMapClass implements Mapper<BytesWritable, Tuple, K, V> {
   // keep the tuple object for reuse.
   // indices of various fields in the input Tuple.
   int idxName, idxSalary, idxBonusPct;
 
   @Override
   public void configure(JobConf job) {
     Schema projection = TableInputFormat.getProjection(job);
     // determine the field indices.
     idxName = projection.getColumnIndex("Name");
     idxSalary = projection.getColumnIndex("Salary");
     idxBonusPct = projection.getColumnIndex("BonusPct");
   }
 
   @Override
   public void map(BytesWritable key, Tuple value, OutputCollector<K, V> output,
       Reporter reporter) throws IOException {
     try {
       String name = (String) value.get(idxName);
       int salary = (Integer) value.get(idxSalary);
       double bonusPct = (Double) value.get(idxBonusPct);
       // do something with the input data
     } catch (ExecException e) {
       e.printStackTrace();
     }
   }
 
   @Override
   public void close() throws IOException {
     // no-op
   }
 }
 
A little bit more explanation on the PIG Tuple objects. A Tuple is an ordered list of PIG datum objects. The permitted PIG datum types can be categorized as Scalar types and Composite types.

Supported Scalar types include seven native Java types: Boolean, Byte, Integer, Long, Float, Double, String, as well as one PIG class called DataByteArray that represents type-less byte array.

Supported Composite types include:


Field Summary
static String INPUT_DELETED_CGS
          Deprecated.  
static String INPUT_EXPR
          Deprecated.  
static String INPUT_FE
          Deprecated.  
static String INPUT_PROJ
          Deprecated.  
static String INPUT_SORT
          Deprecated.  
 
Constructor Summary
TableInputFormat()
          Deprecated.  
 
Method Summary
static String getProjection(org.apache.hadoop.mapred.JobConf conf)
          Deprecated. Get the projection from the JobConf
 org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.BytesWritable,Tuple> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.mapred.Reporter reporter)
          Deprecated.  
static Schema getSchema(org.apache.hadoop.mapred.JobConf conf)
          Deprecated. Get the schema of a table expr
static SortInfo getSortInfo(org.apache.hadoop.mapred.JobConf conf)
          Deprecated. Get the SortInfo object regarding a Zebra table
 org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf conf, int numSplits)
          Deprecated.  
static TableRecordReader getTableRecordReader(org.apache.hadoop.mapred.JobConf conf, String projection)
          Deprecated. Get a TableRecordReader on a single split
static void requireSortedTable(org.apache.hadoop.mapred.JobConf conf, ZebraSortInfo sortInfo)
          Deprecated. Requires sorted table or table union
static void setInputPaths(org.apache.hadoop.mapred.JobConf conf, org.apache.hadoop.fs.Path... paths)
          Deprecated. Set the paths to the input table.
static void setMinSplitSize(org.apache.hadoop.mapred.JobConf conf, long minSize)
          Deprecated. Set the minimum split size.
static void setProjection(org.apache.hadoop.mapred.JobConf conf, String projection)
          Deprecated. Use setProjection(JobConf, ZebraProjection) instead.
static void setProjection(org.apache.hadoop.mapred.JobConf conf, ZebraProjection projection)
          Deprecated. Set the input projection in the JobConf object.
 void validateInput(org.apache.hadoop.mapred.JobConf conf)
          Deprecated. 
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

INPUT_EXPR

public static final String INPUT_EXPR
Deprecated. 
See Also:
Constant Field Values

INPUT_PROJ

public static final String INPUT_PROJ
Deprecated. 
See Also:
Constant Field Values

INPUT_SORT

public static final String INPUT_SORT
Deprecated. 
See Also:
Constant Field Values

INPUT_FE

public static final String INPUT_FE
Deprecated. 
See Also:
Constant Field Values

INPUT_DELETED_CGS

public static final String INPUT_DELETED_CGS
Deprecated. 
See Also:
Constant Field Values
Constructor Detail

TableInputFormat

public TableInputFormat()
Deprecated. 
Method Detail

setInputPaths

public static void setInputPaths(org.apache.hadoop.mapred.JobConf conf,
                                 org.apache.hadoop.fs.Path... paths)
Deprecated. 
Set the paths to the input table.

Parameters:
conf - JobConf object.
paths - one or more paths to BasicTables. The InputFormat class will produce splits on the "union" of these BasicTables.

getSchema

public static Schema getSchema(org.apache.hadoop.mapred.JobConf conf)
                        throws IOException
Deprecated. 
Get the schema of a table expr

Parameters:
conf - JobConf object.
Throws:
IOException

setProjection

public static void setProjection(org.apache.hadoop.mapred.JobConf conf,
                                 String projection)
                          throws ParseException
Deprecated. Use setProjection(JobConf, ZebraProjection) instead.

Set the input projection in the JobConf object.

Parameters:
conf - JobConf object.
projection - A common separated list of column names. If we want select all columns, pass projection==null. The syntax of the projection conforms to the Schema string.
Throws:
ParseException

setProjection

public static void setProjection(org.apache.hadoop.mapred.JobConf conf,
                                 ZebraProjection projection)
                          throws ParseException
Deprecated. 
Set the input projection in the JobConf object.

Parameters:
conf - JobConf object.
projection - A common separated list of column names. If we want select all columns, pass projection==null. The syntax of the projection conforms to the Schema string.
Throws:
ParseException

getProjection

public static String getProjection(org.apache.hadoop.mapred.JobConf conf)
                            throws IOException,
                                   ParseException
Deprecated. 
Get the projection from the JobConf

Parameters:
conf - The JobConf object
Returns:
The projection schema. If projection has not been defined, or is not known at this time, null will be returned. Note that by the time when this method is called in Mapper code, the projection must already be known.
Throws:
IOException
ParseException

getSortInfo

public static SortInfo getSortInfo(org.apache.hadoop.mapred.JobConf conf)
                            throws IOException
Deprecated. 
Get the SortInfo object regarding a Zebra table

Parameters:
conf - JobConf object
Returns:
the zebra tables's SortInfo; null if the table is unsorted.
Throws:
IOException

requireSortedTable

public static void requireSortedTable(org.apache.hadoop.mapred.JobConf conf,
                                      ZebraSortInfo sortInfo)
                               throws IOException
Deprecated. 
Requires sorted table or table union

Parameters:
conf - JobConf object.
sortInfo - ZebraSortInfo object containing sorting information.
Throws:
IOException

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.BytesWritable,Tuple> getRecordReader(org.apache.hadoop.mapred.InputSplit split,
                                                                                                       org.apache.hadoop.mapred.JobConf conf,
                                                                                                       org.apache.hadoop.mapred.Reporter reporter)
                                                                                                throws IOException
Deprecated. 
Specified by:
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
Throws:
IOException
See Also:
InputFormat.getRecordReader(InputSplit, JobConf, Reporter)

getTableRecordReader

public static TableRecordReader getTableRecordReader(org.apache.hadoop.mapred.JobConf conf,
                                                     String projection)
                                              throws IOException,
                                                     ParseException
Deprecated. 
Get a TableRecordReader on a single split

Parameters:
conf - JobConf object.
projection - comma-separated column names in projection. null means all columns in projection
Throws:
IOException
ParseException

setMinSplitSize

public static void setMinSplitSize(org.apache.hadoop.mapred.JobConf conf,
                                   long minSize)
Deprecated. 
Set the minimum split size.

Parameters:
conf - The job conf object.
minSize - Minimum size.

getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf conf,
                                                       int numSplits)
                                                throws IOException
Deprecated. 
Specified by:
getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.BytesWritable,Tuple>
Throws:
IOException
See Also:
InputFormat.getSplits(JobConf, int)

validateInput

@Deprecated
public void validateInput(org.apache.hadoop.mapred.JobConf conf)
                   throws IOException
Deprecated. 

Throws:
IOException


Copyright © 2007-2012 The Apache Software Foundation