public class JsonMetadata extends Object implements LoadMetadata, StoreMetadata
Constructor and Description |
---|
JsonMetadata() |
JsonMetadata(String schemaFileName,
String headerFileName,
String statFileName) |
Modifier and Type | Method and Description |
---|---|
protected Set<ElementDescriptor> |
findMetaFile(String path,
String metaname,
org.apache.hadoop.conf.Configuration conf)
.
|
String[] |
getPartitionKeys(String location,
org.apache.hadoop.mapreduce.Job job)
Find what columns are partition keys for this input.
|
ResourceSchema |
getSchema(String location,
org.apache.hadoop.mapreduce.Job job)
For JsonMetadata schema is considered optional
This method suppresses (and logs) errors if they are encountered.
|
ResourceSchema |
getSchema(String location,
org.apache.hadoop.mapreduce.Job job,
boolean isSchemaOn)
Read the schema from json metadata file
If isSchemaOn parameter is false, the errors are suppressed and logged
|
ResourceStatistics |
getStatistics(String location,
org.apache.hadoop.mapreduce.Job job)
For JsonMetadata stats are considered optional
This method suppresses (and logs) errors if they are encountered.
|
void |
setFieldDel(byte fieldDel) |
void |
setPartitionFilter(Expression partitionFilter)
Set the filter for partitioning.
|
void |
setRecordDel(byte recordDel) |
void |
storeSchema(ResourceSchema schema,
String location,
org.apache.hadoop.mapreduce.Job job)
Store schema of the data being written
|
void |
storeStatistics(ResourceStatistics stats,
String location,
org.apache.hadoop.mapreduce.Job job)
Store statistics about the data being written.
|
protected Set<ElementDescriptor> findMetaFile(String path, String metaname, org.apache.hadoop.conf.Configuration conf) throws IOException
For each object represented by the path (either directly, or via a glob): If object is a directory, and path/metaname exists, use that as the metadata file. Else if parentPath/metaname exists, use that as the metadata file.
Resolving conflicts, merging the metadata, etc, is not handled by this method and should be taken care of by downstream code.
path
- Path, as passed in to a LoadFunc (may be a Hadoop glob)metaname
- Metadata file designation, such as .pig_schema or .pig_statsconf
- configuration objectIOException
public String[] getPartitionKeys(String location, org.apache.hadoop.mapreduce.Job job)
LoadMetadata
getPartitionKeys
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContextImpl.getConfiguration()
and not to set/query
any runtime job information.public void setPartitionFilter(Expression partitionFilter) throws IOException
LoadMetadata
LoadMetadata.getPartitionKeys(String, Job)
, then this method is not
called by Pig runtime. This method is also not called by the Pig runtime
if there are no partition filter conditions.setPartitionFilter
in interface LoadMetadata
partitionFilter
- that describes filter for partitioningIOException
- if the filter is not compatible with the storage
mechanism or contains non-partition fields.public ResourceSchema getSchema(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
getSchema
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContextImpl.getConfiguration()
and not to set/query
any runtime job information.IOException
- if an exception occurs while determining the schemapublic ResourceSchema getSchema(String location, org.apache.hadoop.mapreduce.Job job, boolean isSchemaOn) throws IOException
location
- job
- isSchemaOn
- IOException
public ResourceStatistics getStatistics(String location, org.apache.hadoop.mapreduce.Job job) throws IOException
getStatistics
in interface LoadMetadata
location
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContextImpl.getConfiguration()
and not to set/query
any runtime job information.IOException
- if an exception occurs while retrieving statisticsLoadMetadata.getStatistics(String, Job)
public void storeStatistics(ResourceStatistics stats, String location, org.apache.hadoop.mapreduce.Job job) throws IOException
StoreMetadata
storeStatistics
in interface StoreMetadata
stats
- statistics to be recordedlocation
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContextImpl.getConfiguration()
and not to set/query
any runtime job information.IOException
public void storeSchema(ResourceSchema schema, String location, org.apache.hadoop.mapreduce.Job job) throws IOException
StoreMetadata
storeSchema
in interface StoreMetadata
schema
- Schema to be recordedlocation
- Location as returned by
LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
job
- The Job
object - this should be used only to obtain
cluster properties through JobContextImpl.getConfiguration()
and not to set/query
any runtime job information.IOException
public void setFieldDel(byte fieldDel)
public void setRecordDel(byte recordDel)
Copyright © 2007-2012 The Apache Software Foundation