org.apache.pig.backend.hadoop.executionengine.mapReduceLayer
Class JobControlCompiler

java.lang.Object
  extended by org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler

public class JobControlCompiler
extends Object

This is compiler class that takes an MROperPlan and converts it into a JobControl object with the relevant dependency info maintained. The JobControl Object is made up of Jobs each of which has a JobConf. The MapReduceOper corresponds to a Job and the getJobCong method returns the JobConf that is configured as per the MapReduceOper

Comparator Design

A few words on how comparators are chosen. In almost all cases we use raw comparators (the one exception being when the user provides a comparison function for order by). For order by queries the PigTYPERawComparator functions are used, where TYPE is Int, Long, etc. These comparators are null aware and asc/desc aware. The first byte of each of the NullableTYPEWritable classes contains info on whether the value is null. Asc/desc is written as an array into the JobConf with the key pig.sortOrder so that it can be read by each of the comparators as part of their setConf call.

For non-order by queries, PigTYPEWritableComparator classes are used. These are all just type specific instances of WritableComparator.


Nested Class Summary
static class JobControlCompiler.PigBagWritableComparator
           
static class JobControlCompiler.PigBigDecimalWritableComparator
           
static class JobControlCompiler.PigBigIntegerWritableComparator
           
static class JobControlCompiler.PigBooleanWritableComparator
           
static class JobControlCompiler.PigCharArrayWritableComparator
           
static class JobControlCompiler.PigDateTimeWritableComparator
           
static class JobControlCompiler.PigDBAWritableComparator
           
static class JobControlCompiler.PigDoubleWritableComparator
           
static class JobControlCompiler.PigFloatWritableComparator
           
static class JobControlCompiler.PigGroupingBagWritableComparator
           
static class JobControlCompiler.PigGroupingBigDecimalWritableComparator
           
static class JobControlCompiler.PigGroupingBigIntegerWritableComparator
           
static class JobControlCompiler.PigGroupingBooleanWritableComparator
           
static class JobControlCompiler.PigGroupingCharArrayWritableComparator
           
static class JobControlCompiler.PigGroupingDateTimeWritableComparator
           
static class JobControlCompiler.PigGroupingDBAWritableComparator
           
static class JobControlCompiler.PigGroupingDoubleWritableComparator
           
static class JobControlCompiler.PigGroupingFloatWritableComparator
           
static class JobControlCompiler.PigGroupingIntWritableComparator
           
static class JobControlCompiler.PigGroupingLongWritableComparator
           
static class JobControlCompiler.PigGroupingPartitionWritableComparator
           
static class JobControlCompiler.PigGroupingTupleWritableComparator
           
static class JobControlCompiler.PigIntWritableComparator
           
static class JobControlCompiler.PigLongWritableComparator
           
static class JobControlCompiler.PigSecondaryKeyGroupComparator
           
static class JobControlCompiler.PigTupleWritableComparator
           
static class JobControlCompiler.PigWritableComparator
           
 
Field Summary
static String BIG_JOB_LOG_MSG
           
static String END_OF_INP_IN_MAP
           
 HashMap<String,ArrayList<Pair<String,Long>>> globalCounters
           
static String LOG_DIR
           
static String PIG_MAP_COUNTER
           
static String PIG_MAP_RANK_NAME
           
static String PIG_MAP_SEPARATOR
           
static String PIG_MAP_STORES
          We will serialize the POStore(s) present in map and reduce in lists in the Hadoop Conf.
static String PIG_REDUCE_STORES
           
static String SMALL_JOB_LOG_MSG
           
 
Constructor Summary
JobControlCompiler(PigContext pigContext, org.apache.hadoop.conf.Configuration conf)
           
JobControlCompiler(PigContext pigContext, org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.conf.Configuration defaultConf)
           
 
Method Summary
 void adjustNumReducers(MROperPlan plan, MapReduceOper mro, org.apache.hadoop.mapreduce.Job nwJob)
          Adjust the number of reducers based on the default_parallel, requested parallel and estimated parallel.
 org.apache.hadoop.mapred.jobcontrol.JobControl compile(MROperPlan plan, String grpName)
          Compiles all jobs that have no dependencies removes them from the plan and returns.
static int estimateNumberOfReducers(org.apache.hadoop.mapreduce.Job job, MapReduceOper mapReducerOper)
          Looks up the estimator from REDUCER_ESTIMATOR_KEY and invokes it to find the number of reducers to use.
 Map<org.apache.hadoop.mapred.jobcontrol.Job,MapReduceOper> getJobMroMap()
          Gets the map of Job and the MR Operator
 List<POStore> getStores(org.apache.hadoop.mapred.jobcontrol.Job job)
          Returns all store locations of a previously compiled job
 void moveResults(List<org.apache.hadoop.mapred.jobcontrol.Job> completedJobs)
          Moves all the results of a collection of MR jobs to the final output directory.
 void reset()
          Resets the state
 int updateMROpPlan(List<org.apache.hadoop.mapred.jobcontrol.Job> completeFailedJobs)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG_DIR

public static final String LOG_DIR
See Also:
Constant Field Values

END_OF_INP_IN_MAP

public static final String END_OF_INP_IN_MAP
See Also:
Constant Field Values

PIG_MAP_COUNTER

public static final String PIG_MAP_COUNTER
See Also:
Constant Field Values

PIG_MAP_RANK_NAME

public static final String PIG_MAP_RANK_NAME
See Also:
Constant Field Values

PIG_MAP_SEPARATOR

public static final String PIG_MAP_SEPARATOR
See Also:
Constant Field Values

globalCounters

public HashMap<String,ArrayList<Pair<String,Long>>> globalCounters

SMALL_JOB_LOG_MSG

public static final String SMALL_JOB_LOG_MSG
See Also:
Constant Field Values

BIG_JOB_LOG_MSG

public static final String BIG_JOB_LOG_MSG
See Also:
Constant Field Values

PIG_MAP_STORES

public static final String PIG_MAP_STORES
We will serialize the POStore(s) present in map and reduce in lists in the Hadoop Conf. In the case of Multi stores, we could deduce these from the map plan and reduce plan but in the case of single store, we remove the POStore from the plan - in either case, we serialize the POStore(s) so that PigOutputFormat and PigOutputCommiter can get the POStore(s) in the same way irrespective of whether it is multi store or single store.

See Also:
Constant Field Values

PIG_REDUCE_STORES

public static final String PIG_REDUCE_STORES
See Also:
Constant Field Values
Constructor Detail

JobControlCompiler

public JobControlCompiler(PigContext pigContext,
                          org.apache.hadoop.conf.Configuration conf)

JobControlCompiler

public JobControlCompiler(PigContext pigContext,
                          org.apache.hadoop.conf.Configuration conf,
                          org.apache.hadoop.conf.Configuration defaultConf)
Method Detail

getStores

public List<POStore> getStores(org.apache.hadoop.mapred.jobcontrol.Job job)
Returns all store locations of a previously compiled job


reset

public void reset()
Resets the state


getJobMroMap

public Map<org.apache.hadoop.mapred.jobcontrol.Job,MapReduceOper> getJobMroMap()
Gets the map of Job and the MR Operator


moveResults

public void moveResults(List<org.apache.hadoop.mapred.jobcontrol.Job> completedJobs)
                 throws IOException
Moves all the results of a collection of MR jobs to the final output directory. Some of the results may have been put into a temp location to work around restrictions with multiple output from a single map reduce job. This method should always be called after the job execution completes.

Throws:
IOException

compile

public org.apache.hadoop.mapred.jobcontrol.JobControl compile(MROperPlan plan,
                                                              String grpName)
                                                       throws JobCreationException
Compiles all jobs that have no dependencies removes them from the plan and returns. Should be called with the same plan until exhausted.

Parameters:
plan - - The MROperPlan to be compiled
grpName - - The name given to the JobControl
Returns:
JobControl object - null if no more jobs in plan
Throws:
JobCreationException

updateMROpPlan

public int updateMROpPlan(List<org.apache.hadoop.mapred.jobcontrol.Job> completeFailedJobs)

adjustNumReducers

public void adjustNumReducers(MROperPlan plan,
                              MapReduceOper mro,
                              org.apache.hadoop.mapreduce.Job nwJob)
                       throws IOException
Adjust the number of reducers based on the default_parallel, requested parallel and estimated parallel. For sampler jobs, we also adjust the next job in advance to get its runtime parallel as the number of partitions used in the sampler.

Parameters:
plan - the MR plan
mro - the MR operator
nwJob - the current job
Throws:
IOException

estimateNumberOfReducers

public static int estimateNumberOfReducers(org.apache.hadoop.mapreduce.Job job,
                                           MapReduceOper mapReducerOper)
                                    throws IOException
Looks up the estimator from REDUCER_ESTIMATOR_KEY and invokes it to find the number of reducers to use. If REDUCER_ESTIMATOR_KEY isn't set, defaults to InputSizeReducerEstimator.

Parameters:
job -
mapReducerOper -
Throws:
IOException


Copyright © 2007-2012 The Apache Software Foundation