JobControlCompiler (Pig 0.17.0 API)

java.lang.Object
- org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler

```
public class JobControlCompiler
extends Object
```
This is compiler class that takes an MROperPlan and converts it into a JobControl object with the relevant dependency info maintained. The JobControl Object is made up of Jobs each of which has a JobConf. The MapReduceOper corresponds to a Job and the getJobCong method returns the JobConf that is configured as per the MapReduceOper
Comparator Design

A few words on how comparators are chosen. In almost all cases we use raw comparators (the one exception being when the user provides a comparison function for order by). For order by queries the PigTYPERawComparator functions are used, where TYPE is Int, Long, etc. These comparators are null aware and asc/desc aware. The first byte of each of the NullableTYPEWritable classes contains info on whether the value is null. Asc/desc is written as an array into the JobConf with the key pig.sortOrder so that it can be read by each of the comparators as part of their setConf call.
For non-order by queries, PigTYPEWritableComparator classes are used. These are all just type specific instances of WritableComparator.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`JobControlCompiler.PigBagWritableComparator`
`static class`	`JobControlCompiler.PigBigDecimalWritableComparator`
`static class`	`JobControlCompiler.PigBigIntegerWritableComparator`
`static class`	`JobControlCompiler.PigBooleanWritableComparator`
`static class`	`JobControlCompiler.PigCharArrayWritableComparator`
`static class`	`JobControlCompiler.PigDateTimeWritableComparator`
`static class`	`JobControlCompiler.PigDBAWritableComparator`
`static class`	`JobControlCompiler.PigDoubleWritableComparator`
`static class`	`JobControlCompiler.PigFloatWritableComparator`
`static class`	`JobControlCompiler.PigGroupingBagWritableComparator`
`static class`	`JobControlCompiler.PigGroupingBigDecimalWritableComparator`
`static class`	`JobControlCompiler.PigGroupingBigIntegerWritableComparator`
`static class`	`JobControlCompiler.PigGroupingBooleanWritableComparator`
`static class`	`JobControlCompiler.PigGroupingCharArrayWritableComparator`
`static class`	`JobControlCompiler.PigGroupingDateTimeWritableComparator`
`static class`	`JobControlCompiler.PigGroupingDBAWritableComparator`
`static class`	`JobControlCompiler.PigGroupingDoubleWritableComparator`
`static class`	`JobControlCompiler.PigGroupingFloatWritableComparator`
`static class`	`JobControlCompiler.PigGroupingIntWritableComparator`
`static class`	`JobControlCompiler.PigGroupingLongWritableComparator`
`static class`	`JobControlCompiler.PigGroupingPartitionWritableComparator`
`static class`	`JobControlCompiler.PigGroupingTupleWritableComparator`
`static class`	`JobControlCompiler.PigIntWritableComparator`
`static class`	`JobControlCompiler.PigLongWritableComparator`
`static class`	`JobControlCompiler.PigSecondaryKeyGroupComparator`
`static class`	`JobControlCompiler.PigTupleWritableComparator`
`static class`	`JobControlCompiler.PigWritableComparator`

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`BIG_JOB_LOG_MSG`
`static String`	`END_OF_INP_IN_MAP`
`HashMap<String,ArrayList<Pair<String,Long>>>`	`globalCounters`
`static String`	`LOG_DIR`
`static String`	`PIG_MAP_COUNTER`
`static String`	`PIG_MAP_RANK_NAME`
`static String`	`PIG_MAP_SEPARATOR`
`static String`	`PIG_MAP_STORES` We will serialize the POStore(s) present in map and reduce in lists in the Hadoop Conf.
`static String`	`PIG_REDUCE_STORES`
`static String`	`SMALL_JOB_LOG_MSG`

Constructor Summary

Constructors
Constructor and Description
`JobControlCompiler(PigContext pigContext, org.apache.hadoop.conf.Configuration conf)`
`JobControlCompiler(PigContext pigContext, org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.conf.Configuration defaultConf)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`adjustNumReducers(MROperPlan plan, MapReduceOper mro, org.apache.hadoop.mapreduce.Job nwJob)` Adjust the number of reducers based on the default_parallel, requested parallel and estimated parallel.
`org.apache.hadoop.mapred.jobcontrol.JobControl`	`compile(MROperPlan plan, String grpName)` Compiles all jobs that have no dependencies removes them from the plan and returns.
`static void`	`configureCompression(org.apache.hadoop.conf.Configuration conf)`
`static int`	`estimateNumberOfReducers(org.apache.hadoop.mapreduce.Job job, MapReduceOper mapReducerOper)` Looks up the estimator from REDUCER_ESTIMATOR_KEY and invokes it to find the number of reducers to use.
`static org.apache.hadoop.fs.Path`	`getCacheStagingDir(org.apache.hadoop.conf.Configuration conf)`
`static org.apache.hadoop.fs.Path`	`getFromCache(PigContext pigContext, org.apache.hadoop.conf.Configuration conf, URL url)`
`Map<org.apache.hadoop.mapred.jobcontrol.Job,MapReduceOper>`	`getJobMroMap()` Gets the map of Job and the MR Operator
`List<POStore>`	`getStores(org.apache.hadoop.mapred.jobcontrol.Job job)` Returns all store locations of a previously compiled job
`void`	`moveResults(List<org.apache.hadoop.mapred.jobcontrol.Job> completedJobs)` Moves all the results of a collection of MR jobs to the final output directory.
`void`	`reset()` Resets the state
`static void`	`setOutputFormat(org.apache.hadoop.mapreduce.Job job)`
`int`	`updateMROpPlan(List<org.apache.hadoop.mapred.jobcontrol.Job> completeFailedJobs)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- LOG_DIR
```
public static final String LOG_DIR
```
  See Also:
  
  Constant Field Values
- END_OF_INP_IN_MAP
```
public static final String END_OF_INP_IN_MAP
```
  See Also:
  
  Constant Field Values
- PIG_MAP_COUNTER
```
public static final String PIG_MAP_COUNTER
```
  See Also:
  
  Constant Field Values
- PIG_MAP_RANK_NAME
```
public static final String PIG_MAP_RANK_NAME
```
  See Also:
  
  Constant Field Values
- PIG_MAP_SEPARATOR
```
public static final String PIG_MAP_SEPARATOR
```
  See Also:
  
  Constant Field Values
- globalCounters
```
public HashMap<String,ArrayList<Pair<String,Long>>> globalCounters
```
- SMALL_JOB_LOG_MSG
```
public static final String SMALL_JOB_LOG_MSG
```
  See Also:
  
  Constant Field Values
- BIG_JOB_LOG_MSG
```
public static final String BIG_JOB_LOG_MSG
```
  See Also:
  
  Constant Field Values
- PIG_MAP_STORES
```
public static final String PIG_MAP_STORES
```
  We will serialize the POStore(s) present in map and reduce in lists in the Hadoop Conf. In the case of Multi stores, we could deduce these from the map plan and reduce plan but in the case of single store, we remove the POStore from the plan - in either case, we serialize the POStore(s) so that PigOutputFormat and PigOutputCommiter can get the POStore(s) in the same way irrespective of whether it is multi store or single store.
  
  See Also:
  
  Constant Field Values
- PIG_REDUCE_STORES
```
public static final String PIG_REDUCE_STORES
```
  See Also:
  
  Constant Field Values

Constructor Detail

JobControlCompiler

public JobControlCompiler(PigContext pigContext,
                          org.apache.hadoop.conf.Configuration conf)

JobControlCompiler

public JobControlCompiler(PigContext pigContext,
                          org.apache.hadoop.conf.Configuration conf,
                          org.apache.hadoop.conf.Configuration defaultConf)

Method Detail

getStores

public List<POStore> getStores(org.apache.hadoop.mapred.jobcontrol.Job job)

Returns all store locations of a previously compiled job

reset
```
public void reset()
```
Resets the state

getJobMroMap

public Map<org.apache.hadoop.mapred.jobcontrol.Job,MapReduceOper> getJobMroMap()

Gets the map of Job and the MR Operator

moveResults
```
public void moveResults(List<org.apache.hadoop.mapred.jobcontrol.Job> completedJobs)
                 throws IOException
```
Moves all the results of a collection of MR jobs to the final output directory. Some of the results may have been put into a temp location to work around restrictions with multiple output from a single map reduce job. This method should always be called after the job execution completes.

Throws:

IOException

compile
```
public org.apache.hadoop.mapred.jobcontrol.JobControl compile(MROperPlan plan,
                                                              String grpName)
                                                       throws JobCreationException
```
Compiles all jobs that have no dependencies removes them from the plan and returns. Should be called with the same plan until exhausted.

Parameters:

plan - - The MROperPlan to be compiled

grpName - - The name given to the JobControl

Returns:

JobControl object - null if no more jobs in plan

Throws:

JobCreationException

updateMROpPlan

public int updateMROpPlan(List<org.apache.hadoop.mapred.jobcontrol.Job> completeFailedJobs)

configureCompression

public static void configureCompression(org.apache.hadoop.conf.Configuration conf)

adjustNumReducers
```
public void adjustNumReducers(MROperPlan plan,
                              MapReduceOper mro,
                              org.apache.hadoop.mapreduce.Job nwJob)
                       throws IOException
```
Adjust the number of reducers based on the default_parallel, requested parallel and estimated parallel. For sampler jobs, we also adjust the next job in advance to get its runtime parallel as the number of partitions used in the sampler.

Parameters:

plan - the MR plan

mro - the MR operator

nwJob - the current job

Throws:

IOException

estimateNumberOfReducers

public static int estimateNumberOfReducers(org.apache.hadoop.mapreduce.Job job,
                                           MapReduceOper mapReducerOper)
                                    throws IOException

Looks up the estimator from REDUCER_ESTIMATOR_KEY and invokes it to find the number of reducers to use. If REDUCER_ESTIMATOR_KEY isn't set, defaults to InputSizeReducerEstimator.

Parameters:: job -; mapReducerOper -
Throws:: IOException

getCacheStagingDir

public static org.apache.hadoop.fs.Path getCacheStagingDir(org.apache.hadoop.conf.Configuration conf)
                                                    throws IOException

Throws:: IOException

getFromCache

public static org.apache.hadoop.fs.Path getFromCache(PigContext pigContext,
                                                     org.apache.hadoop.conf.Configuration conf,
                                                     URL url)
                                              throws IOException

Throws:: IOException

setOutputFormat

public static void setOutputFormat(org.apache.hadoop.mapreduce.Job job)

Class JobControlCompiler

Comparator Design

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

LOG_DIR

END_OF_INP_IN_MAP

PIG_MAP_COUNTER

PIG_MAP_RANK_NAME

PIG_MAP_SEPARATOR

globalCounters

SMALL_JOB_LOG_MSG

BIG_JOB_LOG_MSG

PIG_MAP_STORES

PIG_REDUCE_STORES

Constructor Detail

JobControlCompiler

JobControlCompiler

Method Detail

getStores

reset

getJobMroMap

moveResults

compile

updateMROpPlan

configureCompression

adjustNumReducers

estimateNumberOfReducers

getCacheStagingDir

getFromCache

setOutputFormat