JsonMetadata (Pig 0.14.0 API)

java.lang.Object
- org.apache.pig.builtin.JsonMetadata

All Implemented Interfaces:

LoadMetadata, StoreMetadata

Direct Known Subclasses:

JsonMetadata
```
public class JsonMetadata
extends java.lang.Object
implements LoadMetadata, StoreMetadata
```
Reads and Writes metadata using JSON in metafiles next to the data.

Constructor Summary

Constructors
Constructor and Description
`JsonMetadata()`
`JsonMetadata(java.lang.String schemaFileName, java.lang.String headerFileName, java.lang.String statFileName)`

Method Summary

Methods
Modifier and Type	Method and Description
`protected java.util.Set<ElementDescriptor>`	`findMetaFile(java.lang.String path, java.lang.String metaname, org.apache.hadoop.conf.Configuration conf)` .
`java.lang.String[]`	`getPartitionKeys(java.lang.String location, org.apache.hadoop.mapreduce.Job job)` Find what columns are partition keys for this input.
`ResourceSchema`	`getSchema(java.lang.String location, org.apache.hadoop.mapreduce.Job job)` For JsonMetadata schema is considered optional This method suppresses (and logs) errors if they are encountered.
`ResourceSchema`	`getSchema(java.lang.String location, org.apache.hadoop.mapreduce.Job job, boolean isSchemaOn)` Read the schema from json metadata file If isSchemaOn parameter is false, the errors are suppressed and logged
`ResourceStatistics`	`getStatistics(java.lang.String location, org.apache.hadoop.mapreduce.Job job)` For JsonMetadata stats are considered optional This method suppresses (and logs) errors if they are encountered.
`void`	`setFieldDel(byte fieldDel)`
`void`	`setPartitionFilter(Expression partitionFilter)` Set the filter for partitioning.
`void`	`setRecordDel(byte recordDel)`
`void`	`storeSchema(ResourceSchema schema, java.lang.String location, org.apache.hadoop.mapreduce.Job job)` Store schema of the data being written
`void`	`storeStatistics(ResourceStatistics stats, java.lang.String location, org.apache.hadoop.mapreduce.Job job)` Store statistics about the data being written.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - JsonMetadata
```
public JsonMetadata()
```
  - JsonMetadata
```
public JsonMetadata(java.lang.String schemaFileName,
            java.lang.String headerFileName,
            java.lang.String statFileName)
```
- Method Detail
  - findMetaFile
```
protected java.util.Set<ElementDescriptor> findMetaFile(java.lang.String path,
                                            java.lang.String metaname,
                                            org.apache.hadoop.conf.Configuration conf)
                                                 throws java.io.IOException
```
    . Given a path, which may represent a glob pattern, a directory, comma separated files/glob patterns or a file, this method finds the set of relevant metadata files on the storage system. The algorithm for finding the metadata file is as follows:
    For each object represented by the path (either directly, or via a glob): If object is a directory, and path/metaname exists, use that as the metadata file. Else if parentPath/metaname exists, use that as the metadata file.
    Resolving conflicts, merging the metadata, etc, is not handled by this method and should be taken care of by downstream code.
    
    Parameters:
    path - Path, as passed in to a LoadFunc (may be a Hadoop glob)
    metaname - Metadata file designation, such as .pig_schema or .pig_stats
    conf - configuration object
    
    Returns:
    Set of element descriptors for all metadata files associated with the files on the path.
    
    Throws:
    
    java.io.IOException
  - getPartitionKeys
```
public java.lang.String[] getPartitionKeys(java.lang.String location,
                                  org.apache.hadoop.mapreduce.Job job)
```
    Description copied from interface: LoadMetadata
    
    Find what columns are partition keys for this input.
    
    Specified by:
    
    getPartitionKeys in interface LoadMetadata
    
    Parameters:
    location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
    job - The Job object - this should be used only to obtain cluster properties through JobContext.getConfiguration() and not to set/query any runtime job information.
    
    Returns:
    array of field names of the partition keys. Implementations should return null to indicate that there are no partition keys
  - setPartitionFilter
```
public void setPartitionFilter(Expression partitionFilter)
                        throws java.io.IOException
```
    Description copied from interface: LoadMetadata
    
    Set the filter for partitioning. It is assumed that this filter will only contain references to fields given as partition keys in getPartitionKeys. So if the implementation returns null in LoadMetadata.getPartitionKeys(String, Job), then this method is not called by Pig runtime. This method is also not called by the Pig runtime if there are no partition filter conditions.
    
    Specified by:
    
    setPartitionFilter in interface LoadMetadata
    
    Parameters:
    partitionFilter - that describes filter for partitioning
    
    Throws:
    
    java.io.IOException - if the filter is not compatible with the storage mechanism or contains non-partition fields.
  - getSchema
```
public ResourceSchema getSchema(java.lang.String location,
                       org.apache.hadoop.mapreduce.Job job)
                         throws java.io.IOException
```
    For JsonMetadata schema is considered optional This method suppresses (and logs) errors if they are encountered.
    
    Specified by:
    
    getSchema in interface LoadMetadata
    
    Parameters:
    location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
    job - The Job object - this should be used only to obtain cluster properties through JobContext.getConfiguration() and not to set/query any runtime job information.
    
    Returns:
    schema for the data to be loaded. This schema should represent all tuples of the returned data. If the schema is unknown or it is not possible to return a schema that represents all returned data, then null should be returned. The schema should not be affected by pushProjection, ie. getSchema should always return the original schema even after pushProjection
    
    Throws:
    
    java.io.IOException - if an exception occurs while determining the schema
  - getSchema
```
public ResourceSchema getSchema(java.lang.String location,
                       org.apache.hadoop.mapreduce.Job job,
                       boolean isSchemaOn)
                         throws java.io.IOException
```
    Read the schema from json metadata file If isSchemaOn parameter is false, the errors are suppressed and logged
    
    Parameters:
    location -
    job -
    isSchemaOn -
    
    Returns:
    schema
    
    Throws:
    
    java.io.IOException
  - getStatistics
```
public ResourceStatistics getStatistics(java.lang.String location,
                               org.apache.hadoop.mapreduce.Job job)
                                 throws java.io.IOException
```
    For JsonMetadata stats are considered optional This method suppresses (and logs) errors if they are encountered.
    
    Specified by:
    
    getStatistics in interface LoadMetadata
    
    Parameters:
    location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
    job - The Job object - this should be used only to obtain cluster properties through JobContext.getConfiguration() and not to set/query any runtime job information.
    
    Returns:
    statistics about the data to be loaded. If no statistics are available, then null should be returned.
    
    Throws:
    
    java.io.IOException - if an exception occurs while retrieving statistics
    See Also:
    LoadMetadata.getStatistics(String, Job)
  - storeStatistics
```
public void storeStatistics(ResourceStatistics stats,
                   java.lang.String location,
                   org.apache.hadoop.mapreduce.Job job)
                     throws java.io.IOException
```
    Description copied from interface: StoreMetadata
    
    Store statistics about the data being written.
    
    Specified by:
    
    storeStatistics in interface StoreMetadata
    
    Parameters:
    stats - statistics to be recorded
    location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
    job - The Job object - this should be used only to obtain cluster properties through JobContext.getConfiguration() and not to set/query any runtime job information.
    
    Throws:
    
    java.io.IOException
  - storeSchema
```
public void storeSchema(ResourceSchema schema,
               java.lang.String location,
               org.apache.hadoop.mapreduce.Job job)
                 throws java.io.IOException
```
    Description copied from interface: StoreMetadata
    
    Store schema of the data being written
    
    Specified by:
    
    storeSchema in interface StoreMetadata
    
    Parameters:
    schema - Schema to be recorded
    location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
    job - The Job object - this should be used only to obtain cluster properties through JobContext.getConfiguration() and not to set/query any runtime job information.
    
    Throws:
    
    java.io.IOException
  - setFieldDel
```
public void setFieldDel(byte fieldDel)
```
  - setRecordDel
```
public void setRecordDel(byte recordDel)
```

Class JsonMetadata

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

JsonMetadata

JsonMetadata

Method Detail

findMetaFile

getPartitionKeys

setPartitionFilter

getSchema

getSchema

getStatistics

storeStatistics

storeSchema

setFieldDel

setRecordDel