public class JsonMetadata extends java.lang.Object implements LoadMetadata, StoreMetadata
Constructor and Description |
---|
JsonMetadata() |
JsonMetadata(java.lang.String schemaFileName,
java.lang.String headerFileName,
java.lang.String statFileName) |
Modifier and Type | Method and Description |
---|---|
protected java.util.Set<ElementDescriptor> |
findMetaFile(java.lang.String path,
java.lang.String metaname,
org.apache.hadoop.conf.Configuration conf)
.
|
java.lang.String[] |
getPartitionKeys(java.lang.String location,
org.apache.hadoop.mapreduce.Job job)
Find what columns are partition keys for this input.
|
ResourceSchema |
getSchema(java.lang.String location,
org.apache.hadoop.mapreduce.Job job)
For JsonMetadata schema is considered optional
This method suppresses (and logs) errors if they are encountered.
|
ResourceSchema |
getSchema(java.lang.String location,
org.apache.hadoop.mapreduce.Job job,
boolean isSchemaOn)
Read the schema from json metadata file
If isSchemaOn parameter is false, the errors are suppressed and logged
|
ResourceStatistics |
getStatistics(java.lang.String location,
org.apache.hadoop.mapreduce.Job job)
For JsonMetadata stats are considered optional
This method suppresses (and logs) errors if they are encountered.
|
void |
setFieldDel(byte fieldDel) |
void |
setPartitionFilter(Expression partitionFilter)
Set the filter for partitioning.
|
void |
setRecordDel(byte recordDel) |
void |
storeSchema(ResourceSchema schema,
java.lang.String location,
org.apache.hadoop.mapreduce.Job job)
Store schema of the data being written
|
void |
storeStatistics(ResourceStatistics stats,
java.lang.String location,
org.apache.hadoop.mapreduce.Job job)
Store statistics about the data being written.
|
public JsonMetadata()
public JsonMetadata(java.lang.String schemaFileName, java.lang.String headerFileName, java.lang.String statFileName)
protected java.util.Set<ElementDescriptor> findMetaFile(java.lang.String path, java.lang.String metaname, org.apache.hadoop.conf.Configuration conf) throws java.io.IOException
For each object represented by the path (either directly, or via a glob): If object is a directory, and path/metaname exists, use that as the metadata file. Else if parentPath/metaname exists, use that as the metadata file.
Resolving conflicts, merging the metadata, etc, is not handled by this method and should be taken care of by downstream code.
path
- Path, as passed in to a LoadFunc (may be a Hadoop glob)metaname
- Metadata file designation, such as .pig_schema or .pig_statsconf
- configuration objectjava.io.IOException
public java.lang.String[] getPartitionKeys(java.lang.String location, org.apache.hadoop.mapreduce.Job job)
LoadMetadata
getPartitionKeys
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.public void setPartitionFilter(Expression partitionFilter) throws java.io.IOException
LoadMetadata
LoadMetadata.getPartitionKeys(String, Job)
, then this method is not
called by Pig runtime. This method is also not called by the Pig runtime
if there are no partition filter conditions.setPartitionFilter
in interface LoadMetadata
partitionFilter
- that describes filter for partitioningjava.io.IOException
- if the filter is not compatible with the storage
mechanism or contains non-partition fields.public ResourceSchema getSchema(java.lang.String location, org.apache.hadoop.mapreduce.Job job) throws java.io.IOException
getSchema
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.java.io.IOException
- if an exception occurs while determining the schemapublic ResourceSchema getSchema(java.lang.String location, org.apache.hadoop.mapreduce.Job job, boolean isSchemaOn) throws java.io.IOException
location
- job
- isSchemaOn
- java.io.IOException
public ResourceStatistics getStatistics(java.lang.String location, org.apache.hadoop.mapreduce.Job job) throws java.io.IOException
getStatistics
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.java.io.IOException
- if an exception occurs while retrieving statisticsLoadMetadata.getStatistics(String, Job)
public void storeStatistics(ResourceStatistics stats, java.lang.String location, org.apache.hadoop.mapreduce.Job job) throws java.io.IOException
StoreMetadata
storeStatistics
in interface StoreMetadata
stats
- statistics to be recordedlocation
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.java.io.IOException
public void storeSchema(ResourceSchema schema, java.lang.String location, org.apache.hadoop.mapreduce.Job job) throws java.io.IOException
StoreMetadata
storeSchema
in interface StoreMetadata
schema
- Schema to be recordedlocation
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContext.getConfiguration()
and not to set/query
any runtime job information.java.io.IOException
public void setFieldDel(byte fieldDel)
public void setRecordDel(byte recordDel)
Copyright © 2007-2012 The Apache Software Foundation