org.apache.pig
Class PigConfiguration

java.lang.Object
  extended by org.apache.pig.PigConfiguration

public class PigConfiguration
extends Object

Container for static configuration strings, defaults, etc. This is intended just for keys that can be set by users, not for keys that are generally used within pig.


Field Summary
static String INSERT_ENABLED
          This key is used to turn off the inclusion of settings in the jobs.
static String MAX_SCRIPT_SIZE
          Controls the size of Pig script stored in job xml.
static String OPT_FETCH
          This parameter enables/disables fetching.
static String PARTAGG_MINREDUCTION
          Controls the minimum reduction in-mapper Partial Aggregation should achieve in order to stay on.
static String PIG_AUTO_LOCAL_ENABLED
          This key is to turn on auto local mode feature
static String PIG_AUTO_LOCAL_INPUT_MAXBYTES
          Controls the max threshold size to convert jobs to run in local mode
static String PIG_BLACKLIST
          Comma-delimited entries of commands/operators that must be disallowed.
static String PIG_DEFAULT_LOAD_FUNC
          This key is used to define the default load func.
static String PIG_DEFAULT_STORE_FUNC
          This key is used to define the default store func.
static String PIG_DELETE_TEMP_FILE
          This key is used to define whether to delete intermediate files of Hadoop jobs.
static String PIG_ENABLE_TEMP_FILE_COMPRESSION
          This key is used to define whether to have intermediate file compressed
static String PIG_JOIN_REPLICATED_MAX_BYTES
          This key used to control the maximum size loaded into the distributed cache when doing fragment-replicated join
static String PIG_OUTPUT_COMMITTER_RECOVERY
          This key is used to define whether to support recovery to handle the application master getting restarted.
static String PIG_OUTPUT_LAZY
          This key is used to define whether PigOutputFormat will be wrapped with LazyOutputFormat so that jobs won't write empty part files if no output is generated
static String PIG_RANDOM_SAMPLER_SAMPLE_SIZE
          This key used to control the sample size of RandomeSampleLoader for order-by.
static String PIG_STREAMING_ENVIRONMENT
          This key can be used to defined what properties will be set in the streaming environment.
static String PIG_TEMP_DIR
          Location where pig stores temporary files for job setup
static String PIG_TEMP_FILE_COMPRESSION_CODEC
          Compression codec used by intermediate storage TFileStorage only support gzip and lzo.
static String PIG_TEMP_FILE_COMPRESSION_STORAGE
          This key is used to set the storage type used by intermediate file storage If pig.tmpfilecompression, default storage used is TFileStorage.
static String PIG_USER_CACHE_ENABLED
          This key is turn on the user level cache
static String PIG_USER_CACHE_LOCATION
          Location where additional jars are cached for the user Additional jar will be cached under PIG_USER_CACHE_LOCATION/${user.name}/.pigcache and will be re-used across the jobs run by the user if the jar has not changed
static String PIG_WHITELIST
          Comma-delimited entries of commands/operators that must be allowed.
static String PROP_CACHEDBAG_MEMUSAGE
          Controls the fraction of total memory that is allowed to be used by cached bags.
static String PROP_EXEC_MAP_PARTAGG
          Controls whether partial aggregation is turned on
static String PROP_NO_COMBINER
           
static String SCHEMA_TUPLE_SHOULD_ALLOW_FORCE
           
static String SCHEMA_TUPLE_SHOULD_USE_IN_FOREACH
           
static String SCHEMA_TUPLE_SHOULD_USE_IN_FRJOIN
           
static String SCHEMA_TUPLE_SHOULD_USE_IN_MERGEJOIN
           
static String SCHEMA_TUPLE_SHOULD_USE_IN_UDF
           
static String SHOULD_USE_SCHEMA_TUPLE
          This key must be set to true by the user for code generation to be used.
static String TIME_UDFS_PROP
          Controls whether execution time of Pig UDFs should be tracked.
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PROP_CACHEDBAG_MEMUSAGE

public static final String PROP_CACHEDBAG_MEMUSAGE
Controls the fraction of total memory that is allowed to be used by cached bags. Default is 0.2.

See Also:
Constant Field Values

PROP_EXEC_MAP_PARTAGG

public static final String PROP_EXEC_MAP_PARTAGG
Controls whether partial aggregation is turned on

See Also:
Constant Field Values

PARTAGG_MINREDUCTION

public static final String PARTAGG_MINREDUCTION
Controls the minimum reduction in-mapper Partial Aggregation should achieve in order to stay on. If after a period of observation this reduction is not achieved, in-mapper aggregation will be turned off and a message logged to that effect.

See Also:
Constant Field Values

TIME_UDFS_PROP

public static final String TIME_UDFS_PROP
Controls whether execution time of Pig UDFs should be tracked. This feature uses counters; use judiciously.

See Also:
Constant Field Values

SHOULD_USE_SCHEMA_TUPLE

public static final String SHOULD_USE_SCHEMA_TUPLE
This key must be set to true by the user for code generation to be used. In the future, it may be turned on by default (at least in certain cases), but for now it is too experimental.

See Also:
Constant Field Values

SCHEMA_TUPLE_SHOULD_USE_IN_UDF

public static final String SCHEMA_TUPLE_SHOULD_USE_IN_UDF
See Also:
Constant Field Values

SCHEMA_TUPLE_SHOULD_USE_IN_FOREACH

public static final String SCHEMA_TUPLE_SHOULD_USE_IN_FOREACH
See Also:
Constant Field Values

SCHEMA_TUPLE_SHOULD_USE_IN_FRJOIN

public static final String SCHEMA_TUPLE_SHOULD_USE_IN_FRJOIN
See Also:
Constant Field Values

SCHEMA_TUPLE_SHOULD_USE_IN_MERGEJOIN

public static final String SCHEMA_TUPLE_SHOULD_USE_IN_MERGEJOIN
See Also:
Constant Field Values

SCHEMA_TUPLE_SHOULD_ALLOW_FORCE

public static final String SCHEMA_TUPLE_SHOULD_ALLOW_FORCE
See Also:
Constant Field Values

PROP_NO_COMBINER

public static final String PROP_NO_COMBINER
See Also:
Constant Field Values

PIG_STREAMING_ENVIRONMENT

public static final String PIG_STREAMING_ENVIRONMENT
This key can be used to defined what properties will be set in the streaming environment. Just set this property to a comma-delimited list of properties to set, and those properties will be set in the environment.

See Also:
Constant Field Values

PIG_DEFAULT_LOAD_FUNC

public static final String PIG_DEFAULT_LOAD_FUNC
This key is used to define the default load func. Pig will fallback on PigStorage as default in case this is undefined.

See Also:
Constant Field Values

PIG_DEFAULT_STORE_FUNC

public static final String PIG_DEFAULT_STORE_FUNC
This key is used to define the default store func. Pig will fallback on PigStorage as default in case this is undefined.

See Also:
Constant Field Values

PIG_OUTPUT_COMMITTER_RECOVERY

public static final String PIG_OUTPUT_COMMITTER_RECOVERY
This key is used to define whether to support recovery to handle the application master getting restarted.

See Also:
Constant Field Values

INSERT_ENABLED

public static final String INSERT_ENABLED
This key is used to turn off the inclusion of settings in the jobs.

See Also:
Constant Field Values

MAX_SCRIPT_SIZE

public static final String MAX_SCRIPT_SIZE
Controls the size of Pig script stored in job xml.

See Also:
Constant Field Values

PIG_ENABLE_TEMP_FILE_COMPRESSION

public static final String PIG_ENABLE_TEMP_FILE_COMPRESSION
This key is used to define whether to have intermediate file compressed

See Also:
Constant Field Values

PIG_TEMP_FILE_COMPRESSION_STORAGE

public static final String PIG_TEMP_FILE_COMPRESSION_STORAGE
This key is used to set the storage type used by intermediate file storage If pig.tmpfilecompression, default storage used is TFileStorage. This can be overriden to use SequenceFileInterStorage by setting following property to "seqfile".

See Also:
Constant Field Values

PIG_TEMP_FILE_COMPRESSION_CODEC

public static final String PIG_TEMP_FILE_COMPRESSION_CODEC
Compression codec used by intermediate storage TFileStorage only support gzip and lzo.

See Also:
Constant Field Values

PIG_DELETE_TEMP_FILE

public static final String PIG_DELETE_TEMP_FILE
This key is used to define whether to delete intermediate files of Hadoop jobs.

See Also:
Constant Field Values

PIG_JOIN_REPLICATED_MAX_BYTES

public static final String PIG_JOIN_REPLICATED_MAX_BYTES
This key used to control the maximum size loaded into the distributed cache when doing fragment-replicated join

See Also:
Constant Field Values

PIG_RANDOM_SAMPLER_SAMPLE_SIZE

public static final String PIG_RANDOM_SAMPLER_SAMPLE_SIZE
This key used to control the sample size of RandomeSampleLoader for order-by. The default value is 100 rows per task.

See Also:
Constant Field Values

PIG_AUTO_LOCAL_ENABLED

public static final String PIG_AUTO_LOCAL_ENABLED
This key is to turn on auto local mode feature

See Also:
Constant Field Values

PIG_AUTO_LOCAL_INPUT_MAXBYTES

public static final String PIG_AUTO_LOCAL_INPUT_MAXBYTES
Controls the max threshold size to convert jobs to run in local mode

See Also:
Constant Field Values

OPT_FETCH

public static final String OPT_FETCH
This parameter enables/disables fetching. By default it is turned on.

See Also:
Constant Field Values

PIG_OUTPUT_LAZY

public static final String PIG_OUTPUT_LAZY
This key is used to define whether PigOutputFormat will be wrapped with LazyOutputFormat so that jobs won't write empty part files if no output is generated

See Also:
Constant Field Values

PIG_TEMP_DIR

public static final String PIG_TEMP_DIR
Location where pig stores temporary files for job setup

See Also:
Constant Field Values

PIG_USER_CACHE_ENABLED

public static final String PIG_USER_CACHE_ENABLED
This key is turn on the user level cache

See Also:
Constant Field Values

PIG_USER_CACHE_LOCATION

public static final String PIG_USER_CACHE_LOCATION
Location where additional jars are cached for the user Additional jar will be cached under PIG_USER_CACHE_LOCATION/${user.name}/.pigcache and will be re-used across the jobs run by the user if the jar has not changed

See Also:
Constant Field Values

PIG_BLACKLIST

public static final String PIG_BLACKLIST
Comma-delimited entries of commands/operators that must be disallowed. This is a security feature to be used by administrators to block use of commands by users. For eg, an admin might like to block all filesystem commands and setting configs in pig script. In which case, the entry would be "pig.blacklist=fs,set"

See Also:
Constant Field Values

PIG_WHITELIST

public static final String PIG_WHITELIST
Comma-delimited entries of commands/operators that must be allowed. This is a security feature to be used by administrators to block use of commands by users that are not a part of the whitelist. For eg, an admin might like to allow only LOAD, STORE, FILTER, GROUP in pig script. In which case, the entry would be "pig.whitelist=load,store,filter,group"

See Also:
Constant Field Values


Copyright © 2007-2012 The Apache Software Foundation