Package org.apache.hadoop.zebra

Hadoop Table - tabular data storage for Hadoop MapReduce and PIG.


Package org.apache.hadoop.zebra Description

Hadoop Table - tabular data storage for Hadoop MapReduce and PIG.

Hadoop Table provides tabular-type data storage for Hadoop MapReduce Framework. It is also planned to allow Table to be closely integrated with PIG.

For this release, the basic construct of HadoopTable is called BasicTable. A BasicTable is a create-once, read-only kind of persisten data storage entity. A BasicTable contains zero or more keyed rows.

The API uses Hadoop BytesWritable objects to represent row keys, and PIG Tuple objects to represent rows.

Each BasicTable maintains a Schema , which, for this release, is nothing but a collection of column names. Given a schema, we can deduce the integer index of a particular column, and use it to extract (get) the desired datum from PIG Tuple object (which only allows index-based access).

Typically, applications use BasicTableOutputFormat (which implements the Hadoop OutputFormat interface) to create BasicTables through MapReduce. And they use TableInputFormat (which implements the Hadoop InputFormat to feed the data as their MapReduce input.

The API is structured in three packages:

Copyright © ${year} The Apache Software Foundation