org.apache.pig.data
Class DataType

java.lang.Object
  extended by org.apache.pig.data.DataType

@InterfaceAudience.Public
@InterfaceStability.Stable
public class DataType
extends Object

A class of static final values used to encode data type and a number of static helper funcitons for manipulating data objects. The data type values could be done as an enumeration, but it is done as byte codes instead to save creating objects.


Field Summary
static byte BAG
           
static byte BIGCHARARRAY
          Internal use only.
static byte BOOLEAN
           
static byte BYTE
           
static byte BYTEARRAY
           
static byte CHARARRAY
           
static byte DOUBLE
           
static byte ERROR
           
static byte FLOAT
           
static byte GENERIC_WRITABLECOMPARABLE
          Internal use only; used to store WriteableComparable objects for creating ordered index in MergeJoin.
static byte INTEGER
           
static byte INTERNALMAP
          Internal use only.
static byte LONG
           
static byte MAP
           
static byte NULL
           
static byte TUPLE
           
static byte UNKNOWN
           
 
Constructor Summary
DataType()
           
 
Method Summary
static boolean castable(byte castType, byte inputType)
          Test if one type can cast to the other.
static int compare(Object o1, Object o2)
          /** Compare two objects to each other.
static int compare(Object o1, Object o2, byte dt1, byte dt2)
          Same as compare(Object, Object), but does not use reflection to determine the type of passed in objects, relying instead on the caller to provide the appropriate values, as determined by findType(Object).
static Schema.FieldSchema determineFieldSchema(Object o)
          Determine the field schema of an object
static Schema.FieldSchema determineFieldSchema(ResourceSchema.ResourceFieldSchema rcFieldSchema)
          Determine the field schema of an ResourceFieldSchema
static boolean equalByteArrays(byte[] lhs, byte[] rhs)
          Test whether two byte arrays (Java byte arrays not Pig byte arrays) are equal.
static byte findType(Object o)
          Determine the datatype of an object.
static byte findType(Type t)
          Given a Type object determine the data type it represents.
static String findTypeName(byte dt)
          Get the type name from the type byte code
static String findTypeName(Object o)
          Get the type name.
static byte[] genAllTypes()
          Get an array of all type values.
static Map<String,Byte> genNameToTypeMap()
          Get a map of type names to type values.
static Map<Byte,String> genTypeToNameMap()
          Get a map of type values to type names.
static boolean isAtomic(byte dataType)
          Determine whether the this data type is atomic.
static boolean isAtomic(Object o)
          Determine whether the this data type is atomic.
static boolean isComplex(byte dataType)
          Determine whether the this data type is complex.
static boolean isComplex(Object o)
          Determine whether the object is complex or atomic.
static boolean isNumberType(byte t)
          Determine if this type is a numeric type.
static boolean isSchemaType(byte dataType)
          Determine whether the this data type can have a schema.
static boolean isSchemaType(Object o)
          Determine whether the this object can have a schema.
static boolean isUsableType(byte t)
          Determine if this is a type that can work can be done on.
static String mapToString(Map<String,Object> m)
          Given a map, turn it into a String.
static byte mergeType(byte type1, byte type2)
          Merge types if possible.
static int numTypes()
          Return the number of types Pig knows about.
static void spillTupleContents(Tuple t, String label)
          Purely for debugging
static DataBag toBag(Object o)
          If this object is a bag, return it as a bag.
static byte[] toBytes(Object o)
           
static byte[] toBytes(Object o, byte type)
           
static Double toDouble(Object o)
          Force a data object to a Double, if possible.
static Double toDouble(Object o, byte type)
          Force a data object to a Double, if possible.
static Float toFloat(Object o)
          Force a data object to a Float, if possible.
static Float toFloat(Object o, byte type)
          Force a data object to a Float, if possible.
static Integer toInteger(Object o)
          Force a data object to an Integer, if possible.
static Integer toInteger(Object o, byte type)
          Force a data object to an Integer, if possible.
static Long toLong(Object o)
          Force a data object to a Long, if possible.
static Long toLong(Object o, byte type)
          Force a data object to a Long, if possible.
static Map<String,Object> toMap(Object o)
          If this object is a map, return it as a map.
static String toString(Object o)
          Force a data object to a String, if possible.
static String toString(Object o, byte type)
          Force a data object to a String, if possible.
static Tuple toTuple(Object o)
          If this object is a tuple, return it as a tuple.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

UNKNOWN

public static final byte UNKNOWN
See Also:
Constant Field Values

NULL

public static final byte NULL
See Also:
Constant Field Values

BOOLEAN

public static final byte BOOLEAN
See Also:
Constant Field Values

BYTE

public static final byte BYTE
See Also:
Constant Field Values

INTEGER

public static final byte INTEGER
See Also:
Constant Field Values

LONG

public static final byte LONG
See Also:
Constant Field Values

FLOAT

public static final byte FLOAT
See Also:
Constant Field Values

DOUBLE

public static final byte DOUBLE
See Also:
Constant Field Values

BYTEARRAY

public static final byte BYTEARRAY
See Also:
Constant Field Values

CHARARRAY

public static final byte CHARARRAY
See Also:
Constant Field Values

BIGCHARARRAY

public static final byte BIGCHARARRAY
Internal use only.

See Also:
Constant Field Values

MAP

public static final byte MAP
See Also:
Constant Field Values

TUPLE

public static final byte TUPLE
See Also:
Constant Field Values

BAG

public static final byte BAG
See Also:
Constant Field Values

GENERIC_WRITABLECOMPARABLE

public static final byte GENERIC_WRITABLECOMPARABLE
Internal use only; used to store WriteableComparable objects for creating ordered index in MergeJoin. Expecting a object that implements Writable interface and has default constructor

See Also:
Constant Field Values

INTERNALMAP

public static final byte INTERNALMAP
Internal use only.

See Also:
Constant Field Values

ERROR

public static final byte ERROR
See Also:
Constant Field Values
Constructor Detail

DataType

public DataType()
Method Detail

findType

public static byte findType(Object o)
Determine the datatype of an object.

Parameters:
o - Object to test.
Returns:
byte code of the type, or ERROR if we don't know.

findType

public static byte findType(Type t)
Given a Type object determine the data type it represents. This isn't cheap, as it uses reflection, so use sparingly.

Parameters:
t - Type to examine
Returns:
byte code of the type, or ERROR if we don't know.

numTypes

public static int numTypes()
Return the number of types Pig knows about.

Returns:
number of types

genAllTypes

public static byte[] genAllTypes()
Get an array of all type values.

Returns:
byte array with an entry for each type.

genTypeToNameMap

public static Map<Byte,String> genTypeToNameMap()
Get a map of type values to type names.

Returns:
map

genNameToTypeMap

public static Map<String,Byte> genNameToTypeMap()
Get a map of type names to type values.

Returns:
map

findTypeName

public static String findTypeName(Object o)
Get the type name.

Parameters:
o - Object to test.
Returns:
type name, as a String.

findTypeName

public static String findTypeName(byte dt)
Get the type name from the type byte code

Parameters:
dt - Type byte code
Returns:
type name, as a String.

isComplex

public static boolean isComplex(byte dataType)
Determine whether the this data type is complex.

Parameters:
dataType - Data type code to test.
Returns:
true if dataType is bag, tuple, or map.

isComplex

public static boolean isComplex(Object o)
Determine whether the object is complex or atomic.

Parameters:
o - Object to determine type of.
Returns:
true if dataType is bag, tuple, or map.

isAtomic

public static boolean isAtomic(byte dataType)
Determine whether the this data type is atomic.

Parameters:
dataType - Data type code to test.
Returns:
true if dataType is bytearray, bigchararray, chararray, integer, long, float, or boolean.

isAtomic

public static boolean isAtomic(Object o)
Determine whether the this data type is atomic.

Parameters:
o - Object to determine type of.
Returns:
true if dataType is bytearray, chararray, integer, long, float, or boolean.

isSchemaType

public static boolean isSchemaType(Object o)
Determine whether the this object can have a schema.

Parameters:
o - Object to determine if it has a schema
Returns:
true if the type can have a valid schema (i.e., bag or tuple)

isSchemaType

public static boolean isSchemaType(byte dataType)
Determine whether the this data type can have a schema.

Parameters:
dataType - dataType to determine if it has a schema
Returns:
true if the type can have a valid schema (i.e., bag or tuple)

compare

public static int compare(Object o1,
                          Object o2)
/** Compare two objects to each other. This function is necessary because there's no super class that implements compareTo. This function provides an (arbitrary) ordering of objects of different types as follows: NULL < BOOLEAN < BYTE < INTEGER < LONG < FLOAT < DOUBLE * < BYTEARRAY < STRING < MAP < TUPLE < BAG. No other functions should implement this cross object logic. They should call this function for it instead.

Parameters:
o1 - First object
o2 - Second object
Returns:
-1 if o1 is less, 0 if they are equal, 1 if o2 is less.

compare

public static int compare(Object o1,
                          Object o2,
                          byte dt1,
                          byte dt2)
Same as compare(Object, Object), but does not use reflection to determine the type of passed in objects, relying instead on the caller to provide the appropriate values, as determined by findType(Object). Use this version in cases where multiple objects of the same type have to be repeatedly compared.

Parameters:
o1 - first object
o2 - second object
dt1 - type, as byte value, of o1
dt2 - type, as byte value, of o2
Returns:
-1 if o1 is < o2, 0 if they are equal, 1 if o1 > o2

toBytes

public static byte[] toBytes(Object o)
                      throws ExecException
Throws:
ExecException

toBytes

public static byte[] toBytes(Object o,
                             byte type)
                      throws ExecException
Throws:
ExecException

toInteger

public static Integer toInteger(Object o,
                                byte type)
                         throws ExecException
Force a data object to an Integer, if possible. Any numeric type can be forced to an Integer (though precision may be lost), as well as CharArray, ByteArray, or Boolean. Complex types cannot be forced to an Integer. This isn't particularly efficient, so if you already know that the object you have is an Integer you should just cast it.

Parameters:
o - object to cast
type - of the object you are casting
Returns:
The object as an Integer.
Throws:
ExecException - if the type can't be forced to an Integer.

toInteger

public static Integer toInteger(Object o)
                         throws ExecException
Force a data object to an Integer, if possible. Any numeric type can be forced to an Integer (though precision may be lost), as well as CharArray, ByteArray, or Boolean. Complex types cannot be forced to an Integer. This isn't particularly efficient, so if you already know that the object you have is an Integer you should just cast it. Unlike toInteger(Object, byte) this method will first determine the type of o and then do the cast. Use toInteger(Object, byte) if you already know the type.

Parameters:
o - object to cast
Returns:
The object as an Integer.
Throws:
ExecException - if the type can't be forced to an Integer.

toLong

public static Long toLong(Object o,
                          byte type)
                   throws ExecException
Force a data object to a Long, if possible. Any numeric type can be forced to a Long (though precision may be lost), as well as CharArray, ByteArray, or Boolean. Complex types cannot be forced to a Long. This isn't particularly efficient, so if you already know that the object you have is a Long you should just cast it.

Parameters:
o - object to cast
type - of the object you are casting
Returns:
The object as a Long.
Throws:
ExecException - if the type can't be forced to a Long.

toLong

public static Long toLong(Object o)
                   throws ExecException
Force a data object to a Long, if possible. Any numeric type can be forced to a Long (though precision may be lost), as well as CharArray, ByteArray, or Boolean. Complex types cannot be forced to an Long. This isn't particularly efficient, so if you already know that the object you have is a Long you should just cast it. Unlike toLong(Object, byte) this method will first determine the type of o and then do the cast. Use toLong(Object, byte) if you already know the type.

Parameters:
o - object to cast
Returns:
The object as a Long.
Throws:
ExecException - if the type can't be forced to an Long.

toFloat

public static Float toFloat(Object o,
                            byte type)
                     throws ExecException
Force a data object to a Float, if possible. Any numeric type can be forced to a Float (though precision may be lost), as well as CharArray, ByteArray. Complex types cannot be forced to a Float. This isn't particularly efficient, so if you already know that the object you have is a Float you should just cast it.

Parameters:
o - object to cast
type - of the object you are casting
Returns:
The object as a Float.
Throws:
ExecException - if the type can't be forced to a Float.

toFloat

public static Float toFloat(Object o)
                     throws ExecException
Force a data object to a Float, if possible. Any numeric type can be forced to a Float (though precision may be lost), as well as CharArray, ByteArray, or Boolean. Complex types cannot be forced to an Float. This isn't particularly efficient, so if you already know that the object you have is a Float you should just cast it. Unlike toFloat(Object, byte) this method will first determine the type of o and then do the cast. Use toFloat(Object, byte) if you already know the type.

Parameters:
o - object to cast
Returns:
The object as a Float.
Throws:
ExecException - if the type can't be forced to an Float.

toDouble

public static Double toDouble(Object o,
                              byte type)
                       throws ExecException
Force a data object to a Double, if possible. Any numeric type can be forced to a Double, as well as CharArray, ByteArray. Complex types cannot be forced to a Double. This isn't particularly efficient, so if you already know that the object you have is a Double you should just cast it.

Parameters:
o - object to cast
type - of the object you are casting
Returns:
The object as a Double.
Throws:
ExecException - if the type can't be forced to a Double.

toDouble

public static Double toDouble(Object o)
                       throws ExecException
Force a data object to a Double, if possible. Any numeric type can be forced to a Double, as well as CharArray, ByteArray, or Boolean. Complex types cannot be forced to an Double. This isn't particularly efficient, so if you already know that the object you have is a Double you should just cast it. Unlike toDouble(Object, byte) this method will first determine the type of o and then do the cast. Use toDouble(Object, byte) if you already know the type.

Parameters:
o - object to cast
Returns:
The object as a Double.
Throws:
ExecException - if the type can't be forced to an Double.

toString

public static String toString(Object o,
                              byte type)
                       throws ExecException
Force a data object to a String, if possible. Any simple (atomic) type can be forced to a String including ByteArray. Complex types cannot be forced to a String. This isn't particularly efficient, so if you already know that the object you have is a String you should just cast it.

Parameters:
o - object to cast
type - of the object you are casting
Returns:
The object as a String.
Throws:
ExecException - if the type can't be forced to a String.

toString

public static String toString(Object o)
                       throws ExecException
Force a data object to a String, if possible. Any simple (atomic) type can be forced to a String including ByteArray. Complex types cannot be forced to a String. This isn't particularly efficient, so if you already know that the object you have is a String you should just cast it. Unlike toString(Object, byte) this method will first determine the type of o and then do the cast. Use toString(Object, byte) if you already know the type.

Parameters:
o - object to cast
Returns:
The object as a String.
Throws:
ExecException - if the type can't be forced to a String.

toMap

public static Map<String,Object> toMap(Object o)
                                throws ExecException
If this object is a map, return it as a map. This isn't particularly efficient, so if you already know that the object you have is a Map you should just cast it.

Parameters:
o - object to cast
Returns:
The object as a Map.
Throws:
ExecException - if the type can't be forced to a Double.

toTuple

public static Tuple toTuple(Object o)
                     throws ExecException
If this object is a tuple, return it as a tuple. This isn't particularly efficient, so if you already know that the object you have is a Tuple you should just cast it.

Parameters:
o - object to cast
Returns:
The object as a Double.
Throws:
ExecException - if the type can't be forced to a Double.

toBag

public static DataBag toBag(Object o)
                     throws ExecException
If this object is a bag, return it as a bag. This isn't particularly efficient, so if you already know that the object you have is a bag you should just cast it.

Parameters:
o - object to cast
Returns:
The object as a Double.
Throws:
ExecException - if the type can't be forced to a Double.

spillTupleContents

public static void spillTupleContents(Tuple t,
                                      String label)
Purely for debugging


isNumberType

public static boolean isNumberType(byte t)
Determine if this type is a numeric type.

Parameters:
t - type (as byte value) to test
Returns:
true if this is a numeric type, false otherwise

isUsableType

public static boolean isUsableType(byte t)
Determine if this is a type that can work can be done on.

Parameters:
t - type (as a byte value) to test
Returns:
false if the type is unknown, null, or error; true otherwise.

castable

public static boolean castable(byte castType,
                               byte inputType)
Test if one type can cast to the other.

Parameters:
castType - data type of the cast type
inputType - data type of the input
Returns:
true or false

mergeType

public static byte mergeType(byte type1,
                             byte type2)
Merge types if possible. Merging types means finding a type that one or both types can be upcast to.

Parameters:
type1 -
type2 -
Returns:
the merged type, or DataType.ERROR if not successful

mapToString

public static String mapToString(Map<String,Object> m)
Given a map, turn it into a String.

Parameters:
m - map
Returns:
string representation of the map

equalByteArrays

public static boolean equalByteArrays(byte[] lhs,
                                      byte[] rhs)
Test whether two byte arrays (Java byte arrays not Pig byte arrays) are equal. I have no idea why we have this function.

Parameters:
lhs - byte array 1
rhs - byte array 2
Returns:
true if both are null or the two are the same length and have the same bytes.

determineFieldSchema

public static Schema.FieldSchema determineFieldSchema(ResourceSchema.ResourceFieldSchema rcFieldSchema)
                                               throws ExecException,
                                                      FrontendException,
                                                      SchemaMergeException
Determine the field schema of an ResourceFieldSchema

Parameters:
rcFieldSchema - the rcFieldSchema we want translated
Returns:
the field schema corresponding to the object
Throws:
ExecException,FrontendException,SchemaMergeException
ExecException
FrontendException
SchemaMergeException

determineFieldSchema

public static Schema.FieldSchema determineFieldSchema(Object o)
                                               throws ExecException,
                                                      FrontendException,
                                                      SchemaMergeException
Determine the field schema of an object

Parameters:
o - the object whose field schema is to be determined
Returns:
the field schema corresponding to the object
Throws:
ExecException,FrontendException,SchemaMergeException
ExecException
FrontendException
SchemaMergeException


Copyright © ${year} The Apache Software Foundation