org.apache.pig.piggybank.storage.partition
Class PathPartitionHelper

java.lang.Object
  extended by org.apache.pig.piggybank.storage.partition.PathPartitionHelper

public class PathPartitionHelper
extends Object

Implements the logic for:

Restrictions
Function calls are not supported by this partition helper and it can only handle String values.
This is normally not a problem given that partition values are part of the hdfs folder path and is given a
determined value that would not need parsing by any external processes.


Field Summary
static String PARITITION_FILTER_EXPRESSION
           
static String PARTITION_COLUMNS
           
 
Constructor Summary
PathPartitionHelper()
           
 
Method Summary
 Set<String> getPartitionKeys(String location, org.apache.hadoop.conf.Configuration conf)
          Returns the partition keys for a location.
The work is delegated to the PathPartitioner class
 Map<String,String> getPathPartitionKeyValues(String location)
          Returns the Partition keys and each key's value for a single location.
That is the location must be something like mytable/partition1=a/partition2=b/myfile.
This method will return a map with [partition1='a', partition2='b']
The work is delegated to the PathPartitioner class
 List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext ctx, Class<? extends LoadFunc> loaderClass, String signature)
          This method is called by the FileInputFormat to find the input paths for which splits should be calculated.
If applyDateRanges == true: Then the HiveRCDateSplitter is used to apply filtering on the input files.
Else the default FileInputFormat listStatus method is used.
 void setPartitionFilterExpression(String partitionFilterExpression, Class<? extends LoadFunc> loaderClass, String signature)
          Sets the PARITITION_FILTER_EXPRESSION property in the UDFContext identified by the loaderClass.
 void setPartitionKeys(String location, org.apache.hadoop.conf.Configuration conf, Class<? extends LoadFunc> loaderClass, String signature)
          Reads the partition keys from the location i.e the base directory
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PARTITION_COLUMNS

public static final String PARTITION_COLUMNS

PARITITION_FILTER_EXPRESSION

public static final String PARITITION_FILTER_EXPRESSION
Constructor Detail

PathPartitionHelper

public PathPartitionHelper()
Method Detail

getPathPartitionKeyValues

public Map<String,String> getPathPartitionKeyValues(String location)
                                             throws IOException
Returns the Partition keys and each key's value for a single location.
That is the location must be something like mytable/partition1=a/partition2=b/myfile.
This method will return a map with [partition1='a', partition2='b']
The work is delegated to the PathPartitioner class

Parameters:
location -
Returns:
Map of String, String
Throws:
IOException

getPartitionKeys

public Set<String> getPartitionKeys(String location,
                                    org.apache.hadoop.conf.Configuration conf)
                             throws IOException
Returns the partition keys for a location.
The work is delegated to the PathPartitioner class

Parameters:
location - String must be the base directory for the partitions
conf -
Returns:
Throws:
IOException

setPartitionFilterExpression

public void setPartitionFilterExpression(String partitionFilterExpression,
                                         Class<? extends LoadFunc> loaderClass,
                                         String signature)
                                  throws IOException
Sets the PARITITION_FILTER_EXPRESSION property in the UDFContext identified by the loaderClass.

Parameters:
partitionFilterExpression -
loaderClass -
Throws:
IOException

setPartitionKeys

public void setPartitionKeys(String location,
                             org.apache.hadoop.conf.Configuration conf,
                             Class<? extends LoadFunc> loaderClass,
                             String signature)
                      throws IOException
Reads the partition keys from the location i.e the base directory

Parameters:
location - String must be the base directory for the partitions
conf -
loaderClass -
Throws:
IOException

listStatus

public List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext ctx,
                                                        Class<? extends LoadFunc> loaderClass,
                                                        String signature)
                                                 throws IOException
This method is called by the FileInputFormat to find the input paths for which splits should be calculated.
If applyDateRanges == true: Then the HiveRCDateSplitter is used to apply filtering on the input files.
Else the default FileInputFormat listStatus method is used.

Parameters:
ctx - JobContext
loaderClass - this is chosen to be a subclass of LoadFunc to maintain some consistency.
Throws:
IOException


Copyright © 2007-2012 The Apache Software Foundation