Class JobControlCompiler

  extended by org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler

public class JobControlCompiler
extends Object

This is compiler class that takes an MROperPlan and converts it into a JobControl object with the relevant dependency info maintained. The JobControl Object is made up of Jobs each of which has a JobConf. The MapReduceOper corresponds to a Job and the getJobCong method returns the JobConf that is configured as per the MapReduceOper

Comparator Design

A few words on how comparators are chosen. In almost all cases we use raw comparators (the one exception being when the user provides a comparison function for order by). For order by queries the PigTYPERawComparator functions are used, where TYPE is Int, Long, etc. These comparators are null aware and asc/desc aware. The first byte of each of the NullableTYPEWritable classes contains info on whether the value is null. Asc/desc is written as an array into the JobConf with the key pig.sortOrder so that it can be read by each of the comparators as part of their setConf call.

For non-order by queries, PigTYPEWritableComparator classes are used. These are all just type specific instances of WritableComparator.

Nested Class Summary
static class JobControlCompiler.PigBagWritableComparator
static class JobControlCompiler.PigBigDecimalWritableComparator
static class JobControlCompiler.PigBigIntegerWritableComparator
static class JobControlCompiler.PigBooleanWritableComparator
static class JobControlCompiler.PigCharArrayWritableComparator
static class JobControlCompiler.PigDateTimeWritableComparator
static class JobControlCompiler.PigDBAWritableComparator
static class JobControlCompiler.PigDoubleWritableComparator
static class JobControlCompiler.PigFloatWritableComparator
static class JobControlCompiler.PigGroupingBagWritableComparator
static class JobControlCompiler.PigGroupingBigDecimalWritableComparator
static class JobControlCompiler.PigGroupingBigIntegerWritableComparator
static class JobControlCompiler.PigGroupingBooleanWritableComparator
static class JobControlCompiler.PigGroupingCharArrayWritableComparator
static class JobControlCompiler.PigGroupingDateTimeWritableComparator
static class JobControlCompiler.PigGroupingDBAWritableComparator
static class JobControlCompiler.PigGroupingDoubleWritableComparator
static class JobControlCompiler.PigGroupingFloatWritableComparator
static class JobControlCompiler.PigGroupingIntWritableComparator
static class JobControlCompiler.PigGroupingLongWritableComparator
static class JobControlCompiler.PigGroupingPartitionWritableComparator
static class JobControlCompiler.PigGroupingTupleWritableComparator
static class JobControlCompiler.PigIntWritableComparator
static class JobControlCompiler.PigLongWritableComparator
static class JobControlCompiler.PigSecondaryKeyGroupComparator
static class JobControlCompiler.PigTupleWritableComparator
static class JobControlCompiler.PigWritableComparator
Field Summary
static String BIG_JOB_LOG_MSG
static String END_OF_INP_IN_MAP
 HashMap<String,ArrayList<Pair<String,Long>>> globalCounters
static String LOG_DIR
static String PIG_MAP_COUNTER
static String PIG_MAP_RANK_NAME
static String PIG_MAP_STORES
          We will serialize the POStore(s) present in map and reduce in lists in the Hadoop Conf.
static String SMALL_JOB_LOG_MSG
Constructor Summary
JobControlCompiler(PigContext pigContext, org.apache.hadoop.conf.Configuration conf)
JobControlCompiler(PigContext pigContext, org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.conf.Configuration defaultConf)
Method Summary
 void adjustNumReducers(MROperPlan plan, MapReduceOper mro, org.apache.hadoop.mapreduce.Job nwJob)
          Adjust the number of reducers based on the default_parallel, requested parallel and estimated parallel.
 org.apache.hadoop.mapred.jobcontrol.JobControl compile(MROperPlan plan, String grpName)
          Compiles all jobs that have no dependencies removes them from the plan and returns.
static int estimateNumberOfReducers(org.apache.hadoop.mapreduce.Job job, MapReduceOper mapReducerOper)
          Looks up the estimator from REDUCER_ESTIMATOR_KEY and invokes it to find the number of reducers to use.
 Map<org.apache.hadoop.mapred.jobcontrol.Job,MapReduceOper> getJobMroMap()
          Gets the map of Job and the MR Operator
 List<POStore> getStores(org.apache.hadoop.mapred.jobcontrol.Job job)
          Returns all store locations of a previously compiled job
 void moveResults(List<org.apache.hadoop.mapred.jobcontrol.Job> completedJobs)
          Moves all the results of a collection of MR jobs to the final output directory.
 void reset()
          Resets the state
 int updateMROpPlan(List<org.apache.hadoop.mapred.jobcontrol.Job> completeFailedJobs)
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail


public static final String LOG_DIR
See Also:
Constant Field Values


public static final String END_OF_INP_IN_MAP
See Also:
Constant Field Values


public static final String PIG_MAP_COUNTER
See Also:
Constant Field Values


public static final String PIG_MAP_RANK_NAME
See Also:
Constant Field Values


public static final String PIG_MAP_SEPARATOR
See Also:
Constant Field Values


public HashMap<String,ArrayList<Pair<String,Long>>> globalCounters


public static final String SMALL_JOB_LOG_MSG
See Also:
Constant Field Values


public static final String BIG_JOB_LOG_MSG
See Also:
Constant Field Values


public static final String PIG_MAP_STORES
We will serialize the POStore(s) present in map and reduce in lists in the Hadoop Conf. In the case of Multi stores, we could deduce these from the map plan and reduce plan but in the case of single store, we remove the POStore from the plan - in either case, we serialize the POStore(s) so that PigOutputFormat and PigOutputCommiter can get the POStore(s) in the same way irrespective of whether it is multi store or single store.

See Also:
Constant Field Values


public static final String PIG_REDUCE_STORES
See Also:
Constant Field Values
Constructor Detail


public JobControlCompiler(PigContext pigContext,
                          org.apache.hadoop.conf.Configuration conf)


public JobControlCompiler(PigContext pigContext,
                          org.apache.hadoop.conf.Configuration conf,
                          org.apache.hadoop.conf.Configuration defaultConf)
Method Detail


public List<POStore> getStores(org.apache.hadoop.mapred.jobcontrol.Job job)
Returns all store locations of a previously compiled job


public void reset()
Resets the state


public Map<org.apache.hadoop.mapred.jobcontrol.Job,MapReduceOper> getJobMroMap()
Gets the map of Job and the MR Operator


public void moveResults(List<org.apache.hadoop.mapred.jobcontrol.Job> completedJobs)
                 throws IOException
Moves all the results of a collection of MR jobs to the final output directory. Some of the results may have been put into a temp location to work around restrictions with multiple output from a single map reduce job. This method should always be called after the job execution completes.



public org.apache.hadoop.mapred.jobcontrol.JobControl compile(MROperPlan plan,
                                                              String grpName)
                                                       throws JobCreationException
Compiles all jobs that have no dependencies removes them from the plan and returns. Should be called with the same plan until exhausted.

plan - - The MROperPlan to be compiled
grpName - - The name given to the JobControl
JobControl object - null if no more jobs in plan


public int updateMROpPlan(List<org.apache.hadoop.mapred.jobcontrol.Job> completeFailedJobs)


public void adjustNumReducers(MROperPlan plan,
                              MapReduceOper mro,
                              org.apache.hadoop.mapreduce.Job nwJob)
                       throws IOException
Adjust the number of reducers based on the default_parallel, requested parallel and estimated parallel. For sampler jobs, we also adjust the next job in advance to get its runtime parallel as the number of partitions used in the sampler.

plan - the MR plan
mro - the MR operator
nwJob - the current job


public static int estimateNumberOfReducers(org.apache.hadoop.mapreduce.Job job,
                                           MapReduceOper mapReducerOper)
                                    throws IOException
Looks up the estimator from REDUCER_ESTIMATOR_KEY and invokes it to find the number of reducers to use. If REDUCER_ESTIMATOR_KEY isn't set, defaults to InputSizeReducerEstimator.

job -
mapReducerOper -

Copyright © 2007-2012 The Apache Software Foundation