LoadMetadata (Pig 0.18.0 API)

All Known Implementing Classes:

AvroStorage, BinStorage, InterStorage, JsonLoader, JsonMetadata, LoadFuncMetadataWrapper, OrcStorage, ParquetLoader, PigStorage, ReadToEndLoader, SequenceFileInterStorage, Storage, TFileStorage, TrevniStorage
```
@InterfaceAudience.Public
 @InterfaceStability.Evolving
public interface LoadMetadata
```
This interface defines how to retrieve metadata related to data to be loaded. If a given loader does not implement this interface, it will be assumed that it is unable to provide metadata about the associated data.

Since:

Pig 0.7

Method Summary

All Methods Instance Methods Abstract Methods
Modifier and Type	Method and Description
`java.lang.String[]`	`getPartitionKeys(java.lang.String location, org.apache.hadoop.mapreduce.Job job)` Find what columns are partition keys for this input.
`ResourceSchema`	`getSchema(java.lang.String location, org.apache.hadoop.mapreduce.Job job)` Get a schema for the data to be loaded.
`ResourceStatistics`	`getStatistics(java.lang.String location, org.apache.hadoop.mapreduce.Job job)` Get statistics about the data to be loaded.
`void`	`setPartitionFilter(Expression partitionFilter)` Set the filter for partitioning.

- Method Detail
  - getSchema
```
ResourceSchema getSchema(java.lang.String location,
                         org.apache.hadoop.mapreduce.Job job)
                  throws java.io.IOException
```
    Get a schema for the data to be loaded.
    
    Parameters:
    
    location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
    
    job - The Job object - this should be used only to obtain cluster properties through JobContextImpl.getConfiguration() and not to set/query any runtime job information.
    
    Returns:
    
    schema for the data to be loaded. This schema should represent all tuples of the returned data. If the schema is unknown or it is not possible to return a schema that represents all returned data, then null should be returned. The schema should not be affected by pushProjection, ie. getSchema should always return the original schema even after pushProjection
    
    Throws:
    
    java.io.IOException - if an exception occurs while determining the schema
  - getStatistics
```
ResourceStatistics getStatistics(java.lang.String location,
                                 org.apache.hadoop.mapreduce.Job job)
                          throws java.io.IOException
```
    Get statistics about the data to be loaded. If no statistics are available, then null should be returned. If the implementing class also extends LoadFunc, then LoadFunc.setLocation(String, org.apache.hadoop.mapreduce.Job) is guaranteed to be called before this method.
    
    Parameters:
    
    location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
    
    job - The Job object - this should be used only to obtain cluster properties through JobContextImpl.getConfiguration() and not to set/query any runtime job information.
    
    Returns:
    
    statistics about the data to be loaded. If no statistics are available, then null should be returned.
    
    Throws:
    
    java.io.IOException - if an exception occurs while retrieving statistics
  - getPartitionKeys
```
java.lang.String[] getPartitionKeys(java.lang.String location,
                                    org.apache.hadoop.mapreduce.Job job)
                             throws java.io.IOException
```
    Find what columns are partition keys for this input.
    
    Parameters:
    
    location - Location as returned by LoadFunc.relativeToAbsolutePath(String, org.apache.hadoop.fs.Path)
    
    job - The Job object - this should be used only to obtain cluster properties through JobContextImpl.getConfiguration() and not to set/query any runtime job information.
    
    Returns:
    
    array of field names of the partition keys. Implementations should return null to indicate that there are no partition keys
    
    Throws:
    
    java.io.IOException - if an exception occurs while retrieving partition keys
  - setPartitionFilter
```
void setPartitionFilter(Expression partitionFilter)
                 throws java.io.IOException
```
    Set the filter for partitioning. It is assumed that this filter will only contain references to fields given as partition keys in getPartitionKeys. So if the implementation returns null in getPartitionKeys(String, Job), then this method is not called by Pig runtime. This method is also not called by the Pig runtime if there are no partition filter conditions.
    
    Parameters:
    
    partitionFilter - that describes filter for partitioning
    
    Throws:
    
    java.io.IOException - if the filter is not compatible with the storage mechanism or contains non-partition fields.

Interface LoadMetadata

Method Summary

Method Detail

getSchema

getStatistics

getPartitionKeys

setPartitionFilter