org.apache.pig.backend.hadoop.accumulo
Class AccumuloStorage

java.lang.Object
  extended by org.apache.pig.LoadFunc
      extended by org.apache.pig.backend.hadoop.accumulo.AbstractAccumuloStorage
          extended by org.apache.pig.backend.hadoop.accumulo.AccumuloStorage
All Implemented Interfaces:
StoreFuncInterface

public class AccumuloStorage
extends AbstractAccumuloStorage

Basic PigStorage implementation that uses Accumulo as the backing store.

When writing data, the first entry in the Tuple is treated as the row in the Accumulo key, while subsequent entries in the tuple are handled as columns in that row. Maps are expanded, placing the map key in the column family and the map value in the Accumulo value. Scalars are placed directly into the value with an empty column qualifier. If the columns argument on the constructor is omitted, null or the empty String, no column family is provided on the Keys created for Accumulo

When reading data, if aggregateColfams is true, elements in the same row and column family are aggregated into a single Map. This will result in a Tuple of length (unique_column_families + 1) for the given row. If aggregateColfams is false, column family and column qualifier are concatenated (separated by a colon), and placed into a Map. This will result in a Tuple with two entries, where the latter element has a number of elements equal to the number of columns in the given row.


Field Summary
 
Fields inherited from class org.apache.pig.backend.hadoop.accumulo.AbstractAccumuloStorage
ASTERISK, authorizations, caster, columns, columnSeparator, COMMA, commandLine, contextSignature, end, ignoreWhitespace, inst, maxLatency, maxMutationBufferSize, maxWriteThreads, password, schema, start, storageOptions, table, tableName, user, zookeepers
 
Constructor Summary
AccumuloStorage()
          Creates an AccumuloStorage which writes all values in a Tuple with an empty column family and doesn't group column families together on read (creates on Map for all columns)
AccumuloStorage(String columns)
          Create an AccumuloStorage with a CSV of columns-families to use on write and whether columns in a row should be grouped by family on read.
AccumuloStorage(String columnStr, String args)
           
 
Method Summary
protected  void addColumn(org.apache.accumulo.core.data.Mutation mutation, String colfam, String colqual, org.apache.accumulo.core.data.Value columnValue)
          Adds the given column family, column qualifier and value to the given mutation
protected  void configureInputFormat(org.apache.hadoop.mapreduce.Job job)
          Method to allow specific implementations to add more elements to the Job for reading data from Accumulo.
protected  Collection<org.apache.accumulo.core.data.Mutation> getMutations(Tuple tuple)
           
protected  Tuple getTuple(org.apache.accumulo.core.data.Key key, org.apache.accumulo.core.data.Value value)
           
 
Methods inherited from class org.apache.pig.backend.hadoop.accumulo.AbstractAccumuloStorage
checkSchema, cleanupOnFailure, cleanupOnSuccess, clearUnset, configureOutputFormat, extractArgs, getCommandLine, getEntries, getInputFormat, getInputFormatEntries, getLoadCaster, getNext, getOutputFormat, getOutputFormatEntries, getUDFProperties, getWriter, loadDependentJars, makePair, objectToText, objToBytes, objToText, prepareToRead, prepareToWrite, putNext, relativeToAbsolutePath, relToAbsPathForStoreLocation, schemaToType, schemaToType, setLocation, setStoreFuncUDFContextSignature, setStoreLocation, setUDFContextSignature, simpleUnset, tupleToBytes, tupleToText, unsetEntriesFromConfiguration
 
Methods inherited from class org.apache.pig.LoadFunc
getAbsolutePath, getPathStrings, join, warn
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

AccumuloStorage

public AccumuloStorage()
                throws org.apache.commons.cli.ParseException,
                       IOException
Creates an AccumuloStorage which writes all values in a Tuple with an empty column family and doesn't group column families together on read (creates on Map for all columns)

Throws:
org.apache.commons.cli.ParseException
IOException

AccumuloStorage

public AccumuloStorage(String columns)
                throws org.apache.commons.cli.ParseException,
                       IOException
Create an AccumuloStorage with a CSV of columns-families to use on write and whether columns in a row should be grouped by family on read.

Parameters:
columns - A comma-separated list of column families to use when writing data, aligned to the n'th entry in the tuple
aggregateColfams - Should unique column qualifier and value pairs be grouped together by column family when reading data
Throws:
org.apache.commons.cli.ParseException
IOException

AccumuloStorage

public AccumuloStorage(String columnStr,
                       String args)
                throws org.apache.commons.cli.ParseException,
                       IOException
Throws:
org.apache.commons.cli.ParseException
IOException
Method Detail

getTuple

protected Tuple getTuple(org.apache.accumulo.core.data.Key key,
                         org.apache.accumulo.core.data.Value value)
                  throws IOException
Specified by:
getTuple in class AbstractAccumuloStorage
Throws:
IOException

configureInputFormat

protected void configureInputFormat(org.apache.hadoop.mapreduce.Job job)
Description copied from class: AbstractAccumuloStorage
Method to allow specific implementations to add more elements to the Job for reading data from Accumulo.

Overrides:
configureInputFormat in class AbstractAccumuloStorage

getMutations

protected Collection<org.apache.accumulo.core.data.Mutation> getMutations(Tuple tuple)
                                                                   throws ExecException,
                                                                          IOException
Specified by:
getMutations in class AbstractAccumuloStorage
Throws:
ExecException
IOException

addColumn

protected void addColumn(org.apache.accumulo.core.data.Mutation mutation,
                         String colfam,
                         String colqual,
                         org.apache.accumulo.core.data.Value columnValue)
Adds the given column family, column qualifier and value to the given mutation

Parameters:
mutation -
colfam -
colqual -
columnValue -


Copyright © 2007-2012 The Apache Software Foundation