org.apache.pig.impl.logicalLayer.schema
Class Schema

java.lang.Object
  extended by org.apache.pig.impl.logicalLayer.schema.Schema
All Implemented Interfaces:
Serializable, Cloneable

public class Schema
extends Object
implements Serializable, Cloneable

The Schema class encapsulates the notion of a schema for a relational operator. A schema is a list of columns that describe the output of a relational operator. Each column in the relation is represented as a FieldSchema, a static class inside the Schema. A column by definition has an alias, a type and a possible schema (if the column is a bag or a tuple). In addition, each column in the schema has a unique auto generated name used for tracking the lineage of the column in a sequence of statements. The lineage of the column is tracked using a map of the predecessors' columns to the operators that generate the predecessor columns. The predecessor columns are the columns required in order to generate the column under consideration. Similarly, a reverse lookup of operators that generate the predecessor column to the predecessor column is maintained.

See Also:
Serialized Form

Nested Class Summary
static class Schema.FieldSchema
           
 
Constructor Summary
Schema()
           
Schema(List<Schema.FieldSchema> fields)
           
Schema(Schema.FieldSchema fieldSchema)
          Create a schema with only one field.
Schema(Schema s)
          Copy Constructor.
 
Method Summary
 void add(Schema.FieldSchema f)
           
 void addAlias(String alias, Schema.FieldSchema fs)
           
static boolean castable(Schema cast, Schema input)
          Recursively compare two schemas to check if the input schema can be cast to the cast schema
 Schema clone()
          Make a deep copy of a schema.
static Schema copyAndLink(Schema s, LogicalOperator op)
          Make a copy of the given schema object and link the original with the copy using canonical name map.
 boolean equals(Object other)
          For two schemas to be equal, they have to be deeply equal.
static boolean equals(Schema schema, Schema other, boolean relaxInner, boolean relaxAlias)
          Recursively compare two schemas for equality
 Schema.FieldSchema findFieldSchema(String canonicalName)
          Look for a FieldSchema instance in the schema hierarchy which has the given canonical name.
static Schema generateNestedSchema(byte topLevelType, byte... innerTypes)
           
 Set<String> getAliases()
           
 Schema.FieldSchema getField(int fieldNum)
          Given a field number, find the associated FieldSchema.
 Schema.FieldSchema getField(String alias)
          Given an alias name, find the associated FieldSchema.
 List<Schema.FieldSchema> getFields()
           
 Schema.FieldSchema getFieldSubNameMatch(String alias)
          Given an alias name, find the associated FieldSchema.
static Schema getPigSchema(ResourceSchema rSchema)
           
 int getPosition(String alias)
          Given an alias, find the associated position of the field schema.
 int getPositionSubName(String alias)
          Given an alias, find the associated position of the field schema.
 int hashCode()
           
 boolean isTwoLevelAccessRequired()
           
 Schema merge(Schema other, boolean otherTakesAliasPrecedence)
          Merge this schema with the other schema
 Schema mergePrefixSchema(Schema other, boolean otherTakesAliasPrecedence)
          Recursively prefix merge two schemas
 Schema mergePrefixSchema(Schema other, boolean otherTakesAliasPrecedence, boolean allowMergeableTypes)
          Recursively prefix merge two schemas
static Schema mergeSchema(Schema schema, Schema other, boolean otherTakesAliasPrecedence)
          Recursively merge two schemas
static Schema mergeSchema(Schema schema, Schema other, boolean otherTakesAliasPrecedence, boolean allowDifferentSizeMerge, boolean allowIncompatibleTypes)
          Recursively merge two schemas
static Schema mergeSchemaByAlias(Schema schema1, Schema schema2)
          Merges two schemas using their column aliases (unlike mergeSchema(..) functions which merge using positions) Schema will not be merged if types are incompatible, as per DataType.mergeType(..) For Tuples and Bags, SubSchemas have to be equal be considered compatible
static Schema mergeSchemasByAlias(Collection<Schema> schemas)
          Merges collection of schemas using their column aliases (unlike mergeSchema(..) functions which merge using positions) Schema will not be merged if types are incompatible, as per DataType.mergeType(..) For Tuples and Bags, SubSchemas have to be equal be considered compatible
 void printAliases()
           
 void reconcile(Schema other)
          Reconcile this schema with another schema.
static void setSchemaDefaultType(Schema s, byte t)
          Recursively set NULL type to the specifid type in a schema
 void setTwoLevelAccessRequired(boolean twoLevelAccess)
           
 int size()
          Find the number of fields in the schema.
static void stringifySchema(StringBuilder sb, Schema schema, byte type)
           
 String toString()
           
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Schema

public Schema()

Schema

public Schema(List<Schema.FieldSchema> fields)
Parameters:
fields - List of field schemas that describes the fields.

Schema

public Schema(Schema.FieldSchema fieldSchema)
Create a schema with only one field.

Parameters:
fieldSchema - field to put in this schema.

Schema

public Schema(Schema s)
Copy Constructor.

Parameters:
s - source schema
Method Detail

copyAndLink

public static Schema copyAndLink(Schema s,
                                 LogicalOperator op)
Make a copy of the given schema object and link the original with the copy using canonical name map.

Parameters:
s - The original schema
op - The operator to which the original belongs
Returns:
a new copy

getField

public Schema.FieldSchema getField(String alias)
                            throws FrontendException
Given an alias name, find the associated FieldSchema.

Parameters:
alias - Alias to look up.
Returns:
FieldSchema, or null if no such alias is in this tuple.
Throws:
FrontendException

getFieldSubNameMatch

public Schema.FieldSchema getFieldSubNameMatch(String alias)
                                        throws FrontendException
Given an alias name, find the associated FieldSchema. If exact name is not found see if any field matches the part of the 'namespaced' alias. eg. if given alias is nm::a , and schema is (a,b). It will return FieldSchema of a. if given alias is nm::a and schema is (nm2::a, b), it will return null

Parameters:
alias - Alias to look up.
Returns:
FieldSchema, or null if no such alias is in this tuple.
Throws:
FrontendException

getField

public Schema.FieldSchema getField(int fieldNum)
                            throws FrontendException
Given a field number, find the associated FieldSchema.

Parameters:
fieldNum - Field number to look up.
Returns:
FieldSchema for this field.
Throws:
ParseException - if the field number exceeds the number of fields in the tuple.
FrontendException

size

public int size()
Find the number of fields in the schema.

Returns:
number of fields.

reconcile

public void reconcile(Schema other)
               throws FrontendException
Reconcile this schema with another schema. The schema being reconciled with should have the same number of columns. The use case is where a schema already exists but may not have alias and or type information. If an alias exists in this schema and a new one is given, then the new one will be used. Similarly with types, though this needs to be used carefully, as types should not be lightly changed.

Parameters:
other - Schema to reconcile with.
Throws:
ParseException - if this cannot be reconciled.
FrontendException

equals

public boolean equals(Object other)
For two schemas to be equal, they have to be deeply equal. Use Schema.equals(Schema schema, Schema other, boolean relaxInner, boolean relaxAlias) if relaxation of aliases is a requirement.

Overrides:
equals in class Object

clone

public Schema clone()
             throws CloneNotSupportedException
Make a deep copy of a schema.

Overrides:
clone in class Object
Throws:
CloneNotSupportedException

hashCode

public int hashCode()
Overrides:
hashCode in class Object

toString

public String toString()
Overrides:
toString in class Object

stringifySchema

public static void stringifySchema(StringBuilder sb,
                                   Schema schema,
                                   byte type)
                            throws FrontendException
Throws:
FrontendException

add

public void add(Schema.FieldSchema f)

getPosition

public int getPosition(String alias)
                throws FrontendException
Given an alias, find the associated position of the field schema.

Parameters:
alias - alias of the FieldSchema.
Returns:
position of the FieldSchema.
Throws:
FrontendException

getPositionSubName

public int getPositionSubName(String alias)
                       throws FrontendException
Given an alias, find the associated position of the field schema. It uses getFieldSubNameMatch to look for subName matches as well.

Parameters:
alias - alias of the FieldSchema.
Returns:
position of the FieldSchema.
Throws:
FrontendException

addAlias

public void addAlias(String alias,
                     Schema.FieldSchema fs)

getAliases

public Set<String> getAliases()

printAliases

public void printAliases()

getFields

public List<Schema.FieldSchema> getFields()

castable

public static boolean castable(Schema cast,
                               Schema input)
Recursively compare two schemas to check if the input schema can be cast to the cast schema

Parameters:
cast - schema of the cast operator
input - schema of the cast input
Returns:
true or falsew!

equals

public static boolean equals(Schema schema,
                             Schema other,
                             boolean relaxInner,
                             boolean relaxAlias)
Recursively compare two schemas for equality

Parameters:
schema -
other -
relaxInner - if true, inner schemas will not be checked
relaxAlias - if true, aliases will not be checked
Returns:
true if schemas are equal, false otherwise

merge

public Schema merge(Schema other,
                    boolean otherTakesAliasPrecedence)
Merge this schema with the other schema

Parameters:
other - the other schema to be merged with
otherTakesAliasPrecedence - true if aliases from the other schema take precedence
Returns:
the merged schema, null if they are not compatible

mergeSchema

public static Schema mergeSchema(Schema schema,
                                 Schema other,
                                 boolean otherTakesAliasPrecedence)
Recursively merge two schemas

Parameters:
schema - the initial schema
other - the other schema to be merged with
otherTakesAliasPrecedence - true if aliases from the other schema take precedence
Returns:
the merged schema, null if they are not compatible

mergeSchema

public static Schema mergeSchema(Schema schema,
                                 Schema other,
                                 boolean otherTakesAliasPrecedence,
                                 boolean allowDifferentSizeMerge,
                                 boolean allowIncompatibleTypes)
                          throws SchemaMergeException
Recursively merge two schemas

Parameters:
schema - the initial schema
other - the other schema to be merged with
otherTakesAliasPrecedence - true if aliases from the other schema take precedence
allowDifferentSizeMerge - allow merging of schemas of different types
allowIncompatibleTypes - 1) if types in schemas are not compatible they will be treated as ByteArray (untyped) 2) if schemas in schemas are not compatible and allowIncompatibleTypes is true those inner schemas in the output will be null.
Returns:
the merged schema this can be null if one schema is null and allowIncompatibleTypes is true
Throws:
SchemaMergeException - if they cannot be merged

mergeSchemasByAlias

public static Schema mergeSchemasByAlias(Collection<Schema> schemas)
                                  throws SchemaMergeException
Merges collection of schemas using their column aliases (unlike mergeSchema(..) functions which merge using positions) Schema will not be merged if types are incompatible, as per DataType.mergeType(..) For Tuples and Bags, SubSchemas have to be equal be considered compatible

Parameters:
schemas - - list of schemas to be merged using their column alias
Returns:
merged schema
Throws:
SchemaMergeException

mergeSchemaByAlias

public static Schema mergeSchemaByAlias(Schema schema1,
                                        Schema schema2)
                                 throws SchemaMergeException
Merges two schemas using their column aliases (unlike mergeSchema(..) functions which merge using positions) Schema will not be merged if types are incompatible, as per DataType.mergeType(..) For Tuples and Bags, SubSchemas have to be equal be considered compatible

Parameters:
schema1 -
schema2 -
Returns:
Merged Schema
Throws:
SchemaMergeException - if schemas cannot be merged

generateNestedSchema

public static Schema generateNestedSchema(byte topLevelType,
                                          byte... innerTypes)
                                   throws FrontendException
Parameters:
topLevelType - DataType type of the top level element
innerTypes - DataType types of the inner level element
Returns:
nested schema representing type of top level element at first level and inner schema representing types of inner element(s)
Throws:
FrontendException

mergePrefixSchema

public Schema mergePrefixSchema(Schema other,
                                boolean otherTakesAliasPrecedence)
                         throws SchemaMergeException
Recursively prefix merge two schemas

Parameters:
other - the other schema to be merged with
otherTakesAliasPrecedence - true if aliases from the other schema take precedence
Returns:
the prefix merged schema this can be null if one schema is null and allowIncompatibleTypes is true
Throws:
SchemaMergeException - if they cannot be merged

mergePrefixSchema

public Schema mergePrefixSchema(Schema other,
                                boolean otherTakesAliasPrecedence,
                                boolean allowMergeableTypes)
                         throws SchemaMergeException
Recursively prefix merge two schemas

Parameters:
other - the other schema to be merged with
otherTakesAliasPrecedence - true if aliases from the other schema take precedence
allowMergeableTypes - true if "mergeable" types should be allowed. Two types are mergeable if any of the following conditions is true IN THE BELOW ORDER of checks: 1) if either one has a type null or unknown and other has a type OTHER THAN null or unknown, the result type will be the latter non null/unknown type 2) If either type is bytearray, then result type will be the other (possibly non BYTEARRAY) type 3) If current type can be cast to the other type, then the result type will be the other type
Returns:
the prefix merged schema this can be null if one schema is null and allowIncompatibleTypes is true
Throws:
SchemaMergeException - if they cannot be merged

setSchemaDefaultType

public static void setSchemaDefaultType(Schema s,
                                        byte t)
Recursively set NULL type to the specifid type in a schema

Parameters:
s - the schema whose NULL type has to be set
t - the specified type

isTwoLevelAccessRequired

public boolean isTwoLevelAccessRequired()
Returns:
the twoLevelAccess

setTwoLevelAccessRequired

public void setTwoLevelAccessRequired(boolean twoLevelAccess)
Parameters:
twoLevelAccess - the twoLevelAccess to set

getPigSchema

public static Schema getPigSchema(ResourceSchema rSchema)
                           throws FrontendException
Throws:
FrontendException

findFieldSchema

public Schema.FieldSchema findFieldSchema(String canonicalName)
Look for a FieldSchema instance in the schema hierarchy which has the given canonical name.

Parameters:
canonicalName - canonical name
Returns:
the FieldSchema instance found


Copyright © ${year} The Apache Software Foundation