org.apache.hadoop.zebra.schema
Class Schema

java.lang.Object
  extended by org.apache.hadoop.zebra.schema.Schema
All Implemented Interfaces:
Comparable<Schema>, org.apache.hadoop.io.Writable

public class Schema
extends Object
implements Comparable<Schema>, org.apache.hadoop.io.Writable

Logical schema of tabular data.


Nested Class Summary
static class Schema.ColumnSchema
          Column Schema in Schema
static class Schema.ParsedName
          Helper class to parse a column name string one section at a time and find the required type for the parsed part.
 
Field Summary
static String COLUMN_DELIMITER
           
 
Constructor Summary
Schema()
          Constructor - schema for empty schema (zero-column) .
Schema(boolean projection)
          Constructor - schema for empty projection/schema (zero-column) .
Schema(Schema.ColumnSchema fs)
           
Schema(String schema)
          Constructor - create a schema from a string representation.
Schema(String[] columns)
          Constructor - create a schema from an array of column names.
Schema(String schema, boolean projection)
           
 
Method Summary
 void add(Schema.ColumnSchema f)
          add a column
 int compareTo(Schema other)
           
 boolean equals(Object obj)
           
 Schema.ColumnSchema getColumn(int index)
          Get a particular column's schema
 Schema.ColumnSchema getColumn(String name)
          get a column by name
 int getColumnIndex(String name)
          Get the index of the column for the input column name.
 String getColumnName(int index)
           
 String[] getColumns()
          Get the names of the individual columns.
 Schema.ColumnSchema getColumnSchema(Schema.ParsedName pn)
          find the most fitting subcolumn containing the name: the parsed name is set after the field name plus any possible separator of '.' or '#'.
 Schema.ColumnSchema getColumnSchema(String name)
          Get a column's schema
 Schema.ColumnSchema getColumnSchemaOnParsedName(Schema.ParsedName pn)
          Get a subcolumn's schema and move the name just parsed into the next subtype
 int getNumColumns()
          Get the number of columns as defined in the schema.
 Schema getProjectionSchema(String[] projcols, HashMap<Schema.ColumnSchema,HashSet<String>> keysmap)
          Get a projection's schema
 String[] getTypedColumns()
          Get the names and types of the individual columns.
static String normalize(String value)
          Normalize the schema string.
static Schema parse(String schema)
          Parse a schema string and create a schema object.
 void readFields(DataInput in)
           
 String toProjectionString()
          return untyped schema string for projection
 String toString()
          Convert the schema to a String.
 void unionSchema(Schema other)
          union compatible schemas.
 void write(DataOutput out)
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

COLUMN_DELIMITER

public static final String COLUMN_DELIMITER
See Also:
Constant Field Values
Constructor Detail

Schema

public Schema()
Constructor - schema for empty schema (zero-column) .


Schema

public Schema(boolean projection)
Constructor - schema for empty projection/schema (zero-column) .

Parameters:
projection - A projection schema or not

Schema

public Schema(String schema)
       throws ParseException
Constructor - create a schema from a string representation.

Parameters:
schema - A string representation of the schema. For this version, the schema string is simply a comma separated list of column names. Of course, comma (,) and space characters are illegal in column names. To maintain forward compatibility, please use only alpha-numeric characters in column names.
Throws:
ParseException

Schema

public Schema(String schema,
              boolean projection)
       throws ParseException
Throws:
ParseException

Schema

public Schema(Schema.ColumnSchema fs)
       throws ParseException
Throws:
ParseException

Schema

public Schema(String[] columns)
       throws ParseException
Constructor - create a schema from an array of column names.

Parameters:
columns - An array of column names. To maintain forward compatibility, please use only alpha-numeric characters in column names.
Throws:
ParseException
Method Detail

add

public void add(Schema.ColumnSchema f)
         throws ParseException
add a column

Parameters:
f - Column to be added to the schema
Throws:
ParseException

getColumn

public Schema.ColumnSchema getColumn(String name)
get a column by name


getColumns

public String[] getColumns()
Get the names of the individual columns.

Returns:
An array of the column names.

getColumn

public Schema.ColumnSchema getColumn(int index)
Get a particular column's schema


getColumnName

public String getColumnName(int index)

getTypedColumns

public String[] getTypedColumns()
Get the names and types of the individual columns.

Returns:
An array of the column names.

getColumnIndex

public int getColumnIndex(String name)
Get the index of the column for the input column name.

Parameters:
name - input column name.
Returns:
The column index if the name is valid; -1 otherwise.

getNumColumns

public int getNumColumns()
Get the number of columns as defined in the schema.

Returns:
The number of columns as defined in the schema.

parse

public static Schema parse(String schema)
                    throws ParseException
Parse a schema string and create a schema object.

Parameters:
schema - comma separated schema string.
Returns:
Schema object
Throws:
ParseException

toString

public String toString()
Convert the schema to a String.

Overrides:
toString in class Object
Returns:
the string representation of the schema.

normalize

public static String normalize(String value)
Normalize the schema string.

Parameters:
value - the input string representation of the schema.
Returns:
the normalized string representation.

compareTo

public int compareTo(Schema other)
Specified by:
compareTo in interface Comparable<Schema>
See Also:
Comparable.compareTo(Object)

equals

public boolean equals(Object obj)
Overrides:
equals in class Object
See Also:
Object.equals(Object)

readFields

public void readFields(DataInput in)
                throws IOException
Specified by:
readFields in interface org.apache.hadoop.io.Writable
Throws:
IOException
See Also:
Writable.readFields(DataInput)

write

public void write(DataOutput out)
           throws IOException
Specified by:
write in interface org.apache.hadoop.io.Writable
Throws:
IOException
See Also:
Writable.write(DataOutput)

getProjectionSchema

public Schema getProjectionSchema(String[] projcols,
                                  HashMap<Schema.ColumnSchema,HashSet<String>> keysmap)
                           throws ParseException
Get a projection's schema

Throws:
ParseException

getColumnSchema

public Schema.ColumnSchema getColumnSchema(String name)
                                    throws ParseException
Get a column's schema

Parameters:
name - column name
Returns:
Column schema for the named column
Throws:
ParseException

getColumnSchemaOnParsedName

public Schema.ColumnSchema getColumnSchemaOnParsedName(Schema.ParsedName pn)
                                                throws ParseException
Get a subcolumn's schema and move the name just parsed into the next subtype

Parameters:
pn - The name of subcolumn to be parsed. On return it contains the subcolumn at the next level after parsing
Returns:
the discovered Column Schema for the subcolumn
Throws:
ParseException

getColumnSchema

public Schema.ColumnSchema getColumnSchema(Schema.ParsedName pn)
                                    throws ParseException
find the most fitting subcolumn containing the name: the parsed name is set after the field name plus any possible separator of '.' or '#'. This is used to help discover the most fitting column schema in multiple CG schemas. For instance, if pn contains a name of r.r1.f11 and current schema has r.r1:record(f11:int, f12), it will return f11's column schema, and pn is set at "f12".

Parameters:
pn - The name of subcolumn to be parsed. On return it contains the subcolumn at the next level after parsing
Returns:
the discovered Column Schema for the subcolumn
Throws:
ParseException

unionSchema

public void unionSchema(Schema other)
                 throws ParseException
union compatible schemas. Exception will be thrown if a name appears in multiple schemas but the types are different.

Throws:
ParseException

toProjectionString

public String toProjectionString()
return untyped schema string for projection



Copyright © 2007-2012 The Apache Software Foundation