org.apache.pig.piggybank.storage.partition
Class PathPartitioner

java.lang.Object
  extended by org.apache.pig.piggybank.storage.partition.PathPartitioner

public class PathPartitioner
extends Object

Its convenient sometimes to partition logs by date values or other e.g. country, city etc.
A daydate partitioned hdfs directory might look something like:

 /logs/repo/mylog/
                                        daydate=2010-01-01
                                    daydate=2010-01-02
 
This class accepts a path like /logs/repo/mylog and return a map of the partition keys


Constructor Summary
PathPartitioner()
           
 
Method Summary
 Set<String> getPartitionKeys(String location, org.apache.hadoop.conf.Configuration conf)
          Searches for the key=value pairs in the path pointer by the location parameter.
 Map<String,String> getPathPartitionKeyValues(String location)
          Note: this must be the path lowes in the Searches for the key=value pairs in the path pointer by the location parameter.
 String[] parsePathKeyValue(String path)
          Will look for key=value pairs in the path for example: /user/hive/warehouse/mylogs/year=2010/month=07
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PathPartitioner

public PathPartitioner()
Method Detail

getPathPartitionKeyValues

public Map<String,String> getPathPartitionKeyValues(String location)
                                             throws IOException
Note: this must be the path lowes in the Searches for the key=value pairs in the path pointer by the location parameter.

Parameters:
location - String root path in hdsf e.g. /user/hive/warehouse or /logs/repo
conf - Configuration
Returns:
Set of String. The order is maintained as per the directory tree. i.e. if /logs/repo/year=2010/month=2010 exists the first item in the set will be year and the second month.
Throws:
IOException

getPartitionKeys

public Set<String> getPartitionKeys(String location,
                                    org.apache.hadoop.conf.Configuration conf)
                             throws IOException
Searches for the key=value pairs in the path pointer by the location parameter.

Parameters:
location - String root path in hdsf e.g. /user/hive/warehouse or /logs/repo
conf - Configuration
Returns:
Set of String. The order is maintained as per the directory tree. i.e. if /logs/repo/year=2010/month=2010 exists the first item in the set will be year and the second month.
Throws:
IOException

parsePathKeyValue

public String[] parsePathKeyValue(String path)
Will look for key=value pairs in the path for example: /user/hive/warehouse/mylogs/year=2010/month=07

Parameters:
path -
Returns:
String[] [0]= key [1] = value


Copyright © ${year} The Apache Software Foundation