public class PigConfiguration extends Object
Modifier and Type | Field and Description |
---|---|
static String |
CALLER_ID
Deprecated.
use
PIG_LOG_TRACE_ID instead. Will be removed in Pig 0.18 |
static String |
ENABLE_ATS
Deprecated.
use
PIG_ATS_ENABLED instead. Will be removed in Pig 0.18 |
static String |
INSERT_ENABLED
Deprecated.
use
PIG_SCRIPT_INFO_ENABLED instead. Will be removed in Pig 0.16 |
static String |
MAX_SCRIPT_SIZE
Deprecated.
use
PIG_SCRIPT_MAX_SIZE instead. Will be removed in Pig 0.16 |
static String |
OPT_FETCH
Deprecated.
use
PIG_OPT_FETCH instead. Will be removed in Pig 0.16 |
static String |
PARTAGG_MINREDUCTION
Deprecated.
use
PIG_EXEC_MAP_PARTAGG_MINREDUCTION instead. Will be removed in Pig 0.16 |
static String |
PIG_ACCUMULATIVE_BATCHSIZE |
static String |
PIG_ARTIFACTS_DOWNLOAD_LOCATION
This key is used to set the download location when registering an artifact using ivy coordinate
|
static String |
PIG_ATS_ENABLED
Enable ATS for Pig
|
static String |
PIG_AUTO_LOCAL_ENABLED
This key is to turn on auto local mode feature
|
static String |
PIG_AUTO_LOCAL_INPUT_MAXBYTES
Controls the max threshold size to convert jobs to run in local mode
|
static String |
PIG_BLACKLIST
Comma-delimited entries of commands/operators that must be disallowed.
|
static String |
PIG_BLOOMJOIN_HASH_FUNCTIONS
The number of hash functions to be used in bloom computation.
|
static String |
PIG_BLOOMJOIN_HASH_TYPE
The type of hash function to use.
|
static String |
PIG_BLOOMJOIN_NUM_FILTERS
The number of bloom filters that will be created.
|
static String |
PIG_BLOOMJOIN_STRATEGY
Bloom join has two different kind of implementations.
|
static String |
PIG_BLOOMJOIN_VECTORSIZE_BYTES
The size in bytes of the bit vector to be used for the bloom filter.
|
static String |
PIG_BZIP_USE_HADOOP_INPUTFORMAT
Using hadoop's TextInputFormat for reading bzip input instead of using Pig's Bzip2TextInputFormat.
|
static String |
PIG_CACHEDBAG_DISTINCT_TYPE |
static String |
PIG_CACHEDBAG_MEMUSAGE
Controls the fraction of total memory that is allowed to be used by
cached bags.
|
static String |
PIG_CACHEDBAG_SORT_TYPE |
static String |
PIG_CACHEDBAG_TYPE
Configurations for specifying alternate implementations for cached bags.
|
static String |
PIG_COMPRESS_INPUT_SPLITS
This key is used to configure compression for the pig input splits which
are not FileSplit.
|
static boolean |
PIG_COMPRESS_INPUT_SPLITS_DEFAULT |
static String |
PIG_DATETIME_DEFAULT_TIMEZONE
The timezone to be used by Pig datetime datatype
|
static String |
PIG_DEFAULT_LOAD_FUNC
This key is used to define the default load func.
|
static String |
PIG_DEFAULT_STORE_FUNC
This key is used to define the default store func.
|
static String |
PIG_DELETE_TEMP_FILE
This key is used to define whether to delete intermediate files of Hadoop jobs.
|
static String |
PIG_ENABLE_TEMP_FILE_COMPRESSION
This key is used to define whether to have intermediate file compressed
|
static String |
PIG_ERROR_HANDLING_ENABLED
Boolean value used to enable or disable error handling for storers
|
static String |
PIG_ERROR_HANDLING_MIN_ERROR_RECORDS
Controls the minimum number of errors
|
static String |
PIG_ERROR_HANDLING_THRESHOLD_PERCENT
Set the threshold for percentage of errors
|
static String |
PIG_EXEC_MAP_PARTAGG
Boolean value to enable or disable partial aggregation in map.
|
static String |
PIG_EXEC_MAP_PARTAGG_MINREDUCTION
Controls the minimum reduction in-mapper Partial Aggregation should achieve in order
to stay on.
|
static String |
PIG_EXEC_NO_COMBINER
Boolean value to enable or disable use of combiners in MapReduce jobs.
|
static String |
PIG_EXEC_NO_COMBINER_REDUCER
Enable or disable use of combiners in reducer shuffle-merge phase in Tez.
|
static String |
PIG_EXEC_NO_SECONDARY_KEY
This key controls whether secondary sort key is used for optimization in case
of nested distinct or sort
|
static String |
PIG_EXEC_REDUCER_ESTIMATOR |
static String |
PIG_EXEC_REDUCER_ESTIMATOR_CONSTRUCTOR_ARG_KEY |
static String |
PIG_JOIN_REPLICATED_MAX_BYTES
This key used to control the maximum size loaded into
the distributed cache when doing fragment-replicated join
|
static String |
PIG_LOG_TRACE_ID
Log tracing id that can be used by upstream clients for tracking respective logs
|
static String |
PIG_MAX_COMBINED_SPLIT_SIZE
Specifies the size, in bytes, of data to be processed by a single map.
|
static String |
PIG_NO_SPLIT_COMBINATION
Whether turns combine split files off.
|
static String |
PIG_NO_TASK_REPORT
This key is used to turns off use of task reports in job statistics.
|
static String |
PIG_OPT_ACCUMULATOR
Boolean value used to enable or disable accumulator optimization.
|
static String |
PIG_OPT_FETCH
Boolean value used to enable or disable fetching without a mapreduce job for DUMP.
|
static String |
PIG_OPT_MULTIQUERY
Boolean value used to enable or disable multiquery optimization.
|
static String |
PIG_OUTPUT_COMMITTER_RECOVERY
This key is used to define whether to support recovery to handle the
application master getting restarted.
|
static String |
PIG_OUTPUT_LAZY
This key is used to define whether PigOutputFormat will be wrapped with LazyOutputFormat
so that jobs won't write empty part files if no output is generated
|
static String |
PIG_POISSON_SAMPLER_SAMPLE_RATE
For a given mean and a confidence, a sample rate is obtained from a poisson udf
|
static String |
PIG_PRINT_EXEC_PLAN |
static String |
PIG_RANDOM_SAMPLER_SAMPLE_SIZE
This key used to control the sample size of RandomeSampleLoader for
order-by.
|
static String |
PIG_SCHEMA_TUPLE_ALLOW_FORCE |
static String |
PIG_SCHEMA_TUPLE_ENABLED
This key must be set to true by the user for code generation to be used.
|
static String |
PIG_SCHEMA_TUPLE_USE_IN_FOREACH |
static String |
PIG_SCHEMA_TUPLE_USE_IN_FRJOIN |
static String |
PIG_SCHEMA_TUPLE_USE_IN_MERGEJOIN |
static String |
PIG_SCHEMA_TUPLE_USE_IN_UDF |
static String |
PIG_SCRIPT_INFO_ENABLED
This key is used to turn off the inclusion of settings in the jobs.
|
static String |
PIG_SCRIPT_MAX_SIZE
Controls the size of Pig script stored in job xml.
|
static String |
PIG_SKEWEDJOIN_REDUCE_MEM
Memory available (in bytes) in reduce when calculating memory available for skewed join.
|
static String |
PIG_SKEWEDJOIN_REDUCE_MEMUSAGE
% of memory available for the input data.
|
static String |
PIG_SORT_READONCE_LOADFUNCS
Pig only reads once from datasource for LoadFuncs specified here during sort instead of
loading once for sampling and loading again for partitioning.
|
static String |
PIG_SPARK_USE_NETTY_FILESERVER
Use Netty file server for Pig on Spark, true or false, default value is false
|
static String |
PIG_SPILL_COLLECTION_THRESHOLD_FRACTION
Spill will be triggered if the fraction of biggest heap exceeds the collection threshold.
|
static String |
PIG_SPILL_MEMORY_USAGE_THRESHOLD_FRACTION
Spill will be triggered if the fraction of biggest heap exceeds the usage threshold.
|
static String |
PIG_SPILL_UNUSED_MEMORY_THRESHOLD_SIZE
Spill will be triggered when unused memory falls below the threshold.
|
static String |
PIG_SPLIT_COMBINATION
Turns combine split files on or off
|
static String |
PIG_STORE_SCHEMA_DISAMBIGUATE
If set to false, automatic schema disambiguation gets disabled i.e.
|
static String |
PIG_STORE_SCHEMA_DISAMBIGUATE_DEFAULT |
static String |
PIG_STREAMING_ENVIRONMENT
This key can be used to defined what properties will be set in the streaming environment.
|
static String |
PIG_STREAMING_UDF_PYTHON_COMMAND
This key can be used to configure the python command for python streaming
udf.
|
static String |
PIG_TEMP_DIR
Location where pig stores temporary files for job setup
|
static String |
PIG_TEMP_FILE_COMPRESSION_CODEC
Compression codec used by intermediate storage
TFileStorage only support gzip and lzo.
|
static String |
PIG_TEMP_FILE_COMPRESSION_STORAGE
This key is used to set the storage type used by intermediate file storage
If pig.tmpfilecompression, default storage used is TFileStorage.
|
static String |
PIG_TEZ_AUTO_PARALLELISM
This key is used to configure auto parallelism in tez.
|
static String |
PIG_TEZ_AUTO_PARALLELISM_DISABLE_DAG_RECOVERY
This key is used to turn off dag recovery if there is auto parallelism.
|
static String |
PIG_TEZ_CONFIGURE_AM_MEMORY
If set, Pig will override tez.am.launch.cmd-opts and tez.am.resource.memory.mb to optimal
|
static String |
PIG_TEZ_DAG_STATUS_REPORT_INTERVAL
This key is used to configure the interval of dag status report in seconds.
|
static String |
PIG_TEZ_GRACE_PARALLELISM
This key is used to configure grace parallelism in tez.
|
static String |
PIG_TEZ_INPUT_SPLITS_MEM_THRESHOLD
Serialize input splits to disk if the input splits size exceeds a
threshold to avoid hitting default RPC transfer size limit of 64MB.
|
static int |
PIG_TEZ_INPUT_SPLITS_MEM_THRESHOLD_DEFAULT |
static String |
PIG_TEZ_OPT_UNION
This key is used to enable or disable union optimization in tez.
|
static String |
PIG_TEZ_OPT_UNION_SUPPORTED_STOREFUNCS
These keys are used to enable or disable tez union optimization for
specific StoreFuncs.
|
static String |
PIG_TEZ_OPT_UNION_UNSUPPORTED_STOREFUNCS |
static String |
PIG_TEZ_SESSION_REUSE
This key is used to define whether to reuse AM in Tez jobs.
|
static String |
PIG_UDF_PROFILE
Controls whether execution time of Pig UDFs should be tracked.
|
static String |
PIG_UDF_PROFILE_FREQUENCY |
static String |
PIG_USER_CACHE_ENABLED
This key is turn on the user level cache
|
static String |
PIG_USER_CACHE_LOCATION
Location where additional jars are cached for the user
Additional jar will be cached under PIG_USER_CACHE_LOCATION/${user.name}/.pigcache
and will be re-used across the jobs run by the user if the jar has not changed
|
static String |
PIG_USER_CACHE_REPLICATION
Replication factor for files in pig jar cache
|
static String |
PIG_WHITELIST
Comma-delimited entries of commands/operators that must be allowed.
|
static String |
PROP_CACHEDBAG_MEMUSAGE
Deprecated.
use
PIG_CACHEDBAG_MEMUSAGE instead. Will be removed in Pig 0.16 |
static String |
PROP_EXEC_MAP_PARTAGG
Deprecated.
use
PIG_EXEC_MAP_PARTAGG instead. Will be removed in Pig 0.16 |
static String |
PROP_NO_COMBINER
Deprecated.
use
#PROP_NO_COMBINER1 instead. Will be removed in Pig 0.16 |
static String |
SCHEMA_TUPLE_SHOULD_ALLOW_FORCE
Deprecated.
|
static String |
SCHEMA_TUPLE_SHOULD_USE_IN_FOREACH
Deprecated.
|
static String |
SCHEMA_TUPLE_SHOULD_USE_IN_FRJOIN
Deprecated.
|
static String |
SCHEMA_TUPLE_SHOULD_USE_IN_MERGEJOIN
Deprecated.
|
static String |
SCHEMA_TUPLE_SHOULD_USE_IN_UDF
Deprecated.
|
static String |
SHOULD_USE_SCHEMA_TUPLE
Deprecated.
|
public static final String PIG_AUTO_LOCAL_ENABLED
public static final String PIG_AUTO_LOCAL_INPUT_MAXBYTES
public static final String PIG_OPT_FETCH
public static final String PIG_OPT_MULTIQUERY
public static final String PIG_OPT_ACCUMULATOR
public static final String PIG_ACCUMULATIVE_BATCHSIZE
public static final String PIG_TEZ_OPT_UNION
public static final String PIG_TEZ_OPT_UNION_SUPPORTED_STOREFUNCS
StoreFunc.supportsParallelWriteToStoreLocation()
and return true
or false then that is is used to turn on or off union optimization
respectively. These settings can be used for StoreFuncs that have not
implemented the API yet.public static final String PIG_TEZ_OPT_UNION_UNSUPPORTED_STOREFUNCS
public static final String PIG_SORT_READONCE_LOADFUNCS
public static final String PIG_EXEC_MAP_PARTAGG
public static final String PIG_EXEC_MAP_PARTAGG_MINREDUCTION
public static final String PIG_EXEC_NO_COMBINER
public static final String PIG_EXEC_NO_COMBINER_REDUCER
public static final String PIG_EXEC_NO_SECONDARY_KEY
public static final String PIG_CACHEDBAG_MEMUSAGE
public static final String PIG_SKEWEDJOIN_REDUCE_MEMUSAGE
public static final String PIG_SKEWEDJOIN_REDUCE_MEM
public static final String PIG_BLOOMJOIN_STRATEGY
PIG_BLOOMJOIN_NUM_FILTERS
number of
partitions. Bloom filters from different maps are then combined in the
reducer producing one bloom filter per partition. This is efficient and
fast if there are smaller number of maps (<10) and the number of
distinct keys are not too high. It can be faster with larger number of
maps and even with bigger bloom vector sizes, but the amount of data
shuffled to the reducer for aggregation becomes huge making it
inefficient.PIG_BLOOMJOIN_NUM_FILTERS
number of reducers. One
bloom filter is then created per partition. This is efficient for larger
datasets with lot of maps or very large
PIG_BLOOMJOIN_VECTORSIZE_BYTES
. In this case size of keys sent
to the reducer is smaller than sending bloom filters to reducer for
aggregation making it efficient.public static final String PIG_BLOOMJOIN_NUM_FILTERS
public static final String PIG_BLOOMJOIN_VECTORSIZE_BYTES
public static final String PIG_BLOOMJOIN_HASH_TYPE
public static final String PIG_BLOOMJOIN_HASH_FUNCTIONS
public static final String PIG_JOIN_REPLICATED_MAX_BYTES
public static final String PIG_CACHEDBAG_TYPE
public static final String PIG_CACHEDBAG_DISTINCT_TYPE
public static final String PIG_CACHEDBAG_SORT_TYPE
public static final String PIG_EXEC_REDUCER_ESTIMATOR
public static final String PIG_EXEC_REDUCER_ESTIMATOR_CONSTRUCTOR_ARG_KEY
public static final String PIG_TEZ_AUTO_PARALLELISM
public static final String PIG_TEZ_GRACE_PARALLELISM
public static final String PIG_TEZ_AUTO_PARALLELISM_DISABLE_DAG_RECOVERY
public static final String PIG_COMPRESS_INPUT_SPLITS
public static final boolean PIG_COMPRESS_INPUT_SPLITS_DEFAULT
public static final String PIG_TEZ_INPUT_SPLITS_MEM_THRESHOLD
public static final int PIG_TEZ_INPUT_SPLITS_MEM_THRESHOLD_DEFAULT
public static final String PIG_UDF_PROFILE
public static final String PIG_UDF_PROFILE_FREQUENCY
public static final String PIG_SCHEMA_TUPLE_ENABLED
public static final String PIG_SCHEMA_TUPLE_USE_IN_UDF
public static final String PIG_SCHEMA_TUPLE_USE_IN_FOREACH
public static final String PIG_SCHEMA_TUPLE_USE_IN_FRJOIN
public static final String PIG_SCHEMA_TUPLE_USE_IN_MERGEJOIN
public static final String PIG_SCHEMA_TUPLE_ALLOW_FORCE
public static final String PIG_STREAMING_ENVIRONMENT
public static final String PIG_STREAMING_UDF_PYTHON_COMMAND
public static final String PIG_SPLIT_COMBINATION
public static final String PIG_NO_SPLIT_COMBINATION
public static final String PIG_MAX_COMBINED_SPLIT_SIZE
public static final String PIG_OUTPUT_LAZY
public static final String PIG_OUTPUT_COMMITTER_RECOVERY
public static final String PIG_TEMP_DIR
public static final String PIG_ENABLE_TEMP_FILE_COMPRESSION
public static final String PIG_TEMP_FILE_COMPRESSION_STORAGE
public static final String PIG_TEMP_FILE_COMPRESSION_CODEC
public static final String PIG_DELETE_TEMP_FILE
public static final String PIG_POISSON_SAMPLER_SAMPLE_RATE
public static final String PIG_RANDOM_SAMPLER_SAMPLE_SIZE
public static final String PIG_DEFAULT_LOAD_FUNC
public static final String PIG_DEFAULT_STORE_FUNC
public static final String PIG_SCRIPT_INFO_ENABLED
public static final String PIG_SCRIPT_MAX_SIZE
public static final String PIG_USER_CACHE_ENABLED
public static final String PIG_USER_CACHE_LOCATION
public static final String PIG_USER_CACHE_REPLICATION
public static final String PIG_ERROR_HANDLING_ENABLED
public static final String PIG_ERROR_HANDLING_MIN_ERROR_RECORDS
public static final String PIG_ERROR_HANDLING_THRESHOLD_PERCENT
public static final String PIG_BLACKLIST
public static final String PIG_WHITELIST
public static final String PIG_NO_TASK_REPORT
public static final String PIG_DATETIME_DEFAULT_TIMEZONE
public static final String PIG_BZIP_USE_HADOOP_INPUTFORMAT
public static final String PIG_ARTIFACTS_DOWNLOAD_LOCATION
public static final String PIG_TEZ_SESSION_REUSE
public static final String PIG_TEZ_DAG_STATUS_REPORT_INTERVAL
public static final String PIG_SPILL_MEMORY_USAGE_THRESHOLD_FRACTION
PigConfiguration.PIG_SPILL_UNUSED_MEMORY_THRESHOLD_SIZE
is non-zero, then usage threshold is calculated as
Max(HeapSize * PIG_SPILL_MEMORY_USAGE_THRESHOLD_FRACTION, HeapSize - PIG_SPILL_UNUSED_MEMORY_THRESHOLD_SIZE)
Default is 0.7public static final String PIG_SPILL_COLLECTION_THRESHOLD_FRACTION
PigConfiguration.PIG_SPILL_UNUSED_MEMORY_THRESHOLD_SIZE
is non-zero, then collection threshold is calculated as
Max(HeapSize * PIG_SPILL_COLLECTION_THRESHOLD_FRACTION, HeapSize - PIG_SPILL_UNUSED_MEMORY_THRESHOLD_SIZE)
Default is 0.7public static final String PIG_SPILL_UNUSED_MEMORY_THRESHOLD_SIZE
public static final String PIG_LOG_TRACE_ID
public static final String PIG_SPARK_USE_NETTY_FILESERVER
public static final String CALLER_ID
PIG_LOG_TRACE_ID
instead. Will be removed in Pig 0.18public static final String PIG_ATS_ENABLED
public static final String ENABLE_ATS
PIG_ATS_ENABLED
instead. Will be removed in Pig 0.18public static final String PIG_TEZ_CONFIGURE_AM_MEMORY
public static final String PIG_STORE_SCHEMA_DISAMBIGUATE
public static final String PIG_STORE_SCHEMA_DISAMBIGUATE_DEFAULT
public static final String PIG_PRINT_EXEC_PLAN
@Deprecated public static final String OPT_FETCH
PIG_OPT_FETCH
instead. Will be removed in Pig 0.16@Deprecated public static final String PROP_CACHEDBAG_MEMUSAGE
PIG_CACHEDBAG_MEMUSAGE
instead. Will be removed in Pig 0.16@Deprecated public static final String PROP_EXEC_MAP_PARTAGG
PIG_EXEC_MAP_PARTAGG
instead. Will be removed in Pig 0.16@Deprecated public static final String PARTAGG_MINREDUCTION
PIG_EXEC_MAP_PARTAGG_MINREDUCTION
instead. Will be removed in Pig 0.16@Deprecated public static final String PROP_NO_COMBINER
#PROP_NO_COMBINER1
instead. Will be removed in Pig 0.16@Deprecated public static final String SHOULD_USE_SCHEMA_TUPLE
@Deprecated public static final String SCHEMA_TUPLE_SHOULD_USE_IN_UDF
@Deprecated public static final String SCHEMA_TUPLE_SHOULD_USE_IN_FOREACH
@Deprecated public static final String SCHEMA_TUPLE_SHOULD_USE_IN_FRJOIN
@Deprecated public static final String SCHEMA_TUPLE_SHOULD_USE_IN_MERGEJOIN
@Deprecated public static final String SCHEMA_TUPLE_SHOULD_ALLOW_FORCE
@Deprecated public static final String INSERT_ENABLED
PIG_SCRIPT_INFO_ENABLED
instead. Will be removed in Pig 0.16@Deprecated public static final String MAX_SCRIPT_SIZE
PIG_SCRIPT_MAX_SIZE
instead. Will be removed in Pig 0.16Copyright © 2007-2017 The Apache Software Foundation