MapRedUtil (Pig 0.17.0 API)

java.lang.Object
- org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil

```
public class MapRedUtil
extends Object
```
A class of utility static methods to be used in the hadoop map reduce backend

Field Summary

Fields
Modifier and Type Field and Description

static String FILE_SYSTEM_NAME

Fields
Modifier and Type	Field and Description
`static String`	`FILE_SYSTEM_NAME`

Constructor Summary

Constructors
Constructor and Description

MapRedUtil()

Constructors
Constructor and Description
`MapRedUtil()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static FileSpec`	`checkLeafIsStore(PhysicalPlan plan, PigContext pigContext)`
`static void`	`copyTmpFileConfigurationValues(org.apache.hadoop.conf.Configuration fromConf, org.apache.hadoop.conf.Configuration toConf)`
`static List<org.apache.hadoop.fs.FileStatus>`	`getAllFileRecursively(List<org.apache.hadoop.fs.FileStatus> files, org.apache.hadoop.conf.Configuration conf)` Get all files recursively from the given list of files
`static List<List<org.apache.hadoop.mapreduce.InputSplit>>`	`getCombinePigSplits(List<org.apache.hadoop.mapreduce.InputSplit> oneInputSplits, long maxCombinedSplitSize, org.apache.hadoop.conf.Configuration conf)`
`static long`	`getPathLength(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.FileStatus status)`
`static long`	`getPathLength(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.FileStatus status, long max)` Returns the total number of bytes for this file, or if a directory all files in the directory.
`String`	`inputSplitToString(org.apache.hadoop.mapreduce.InputSplit[] splits)`
`static <E> Map<E,Pair<Integer,Integer>>`	`loadPartitionFileFromLocalCache(String keyDistFile, Integer[] totalReducers, byte keyType, org.apache.hadoop.conf.Configuration mapConf)` Loads the key distribution sampler file
`static void`	`setupStreamingDirsConfMulti(PigContext pigContext, org.apache.hadoop.conf.Configuration conf)` Sets up output and log dir paths for a multi-store streaming job
`static void`	`setupStreamingDirsConfSingle(POStore st, PigContext pigContext, org.apache.hadoop.conf.Configuration conf)` Sets up output and log dir paths for a single-store streaming job
`static void`	`setupUDFContext(org.apache.hadoop.conf.Configuration job)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- FILE_SYSTEM_NAME
```
public static final String FILE_SYSTEM_NAME
```
  See Also:
  
  Constant Field Values

Constructor Detail
- MapRedUtil
```
public MapRedUtil()
```

Method Detail

loadPartitionFileFromLocalCache

public static <E> Map<E,Pair<Integer,Integer>> loadPartitionFileFromLocalCache(String keyDistFile,
                                                                               Integer[] totalReducers,
                                                                               byte keyType,
                                                                               org.apache.hadoop.conf.Configuration mapConf)
                                                                        throws IOException

Loads the key distribution sampler file

Parameters:: keyDistFile - the name for the distribution file; totalReducers - gets set to the total number of reducers as found in the dist file; keyType - Type of the key to be stored in the return map. It currently treats Tuple as a special case.
Throws:: IOException

copyTmpFileConfigurationValues

public static void copyTmpFileConfigurationValues(org.apache.hadoop.conf.Configuration fromConf,
                                                  org.apache.hadoop.conf.Configuration toConf)

setupUDFContext

public static void setupUDFContext(org.apache.hadoop.conf.Configuration job)
                            throws IOException

Throws:: IOException

setupStreamingDirsConfSingle

public static void setupStreamingDirsConfSingle(POStore st,
                                                PigContext pigContext,
                                                org.apache.hadoop.conf.Configuration conf)
                                         throws IOException

Sets up output and log dir paths for a single-store streaming job

Parameters:: st - - POStore of the current job; pigContext -; conf -
Throws:: IOException

setupStreamingDirsConfMulti

public static void setupStreamingDirsConfMulti(PigContext pigContext,
                                               org.apache.hadoop.conf.Configuration conf)
                                        throws IOException

Sets up output and log dir paths for a multi-store streaming job

Parameters:: pigContext -; conf -
Throws:: IOException

checkLeafIsStore

public static FileSpec checkLeafIsStore(PhysicalPlan plan,
                                        PigContext pigContext)
                                 throws ExecException

Throws:: ExecException

getAllFileRecursively

public static List<org.apache.hadoop.fs.FileStatus> getAllFileRecursively(List<org.apache.hadoop.fs.FileStatus> files,
                                                                          org.apache.hadoop.conf.Configuration conf)
                                                                   throws IOException

Get all files recursively from the given list of files

Parameters:: files - a list of FileStatus; conf - the configuration object
Returns:: the list of fileStatus that contains all the files in the given list and, recursively, all the files inside the directories in the given list
Throws:: IOException

getPathLength

public static long getPathLength(org.apache.hadoop.fs.FileSystem fs,
                                 org.apache.hadoop.fs.FileStatus status)
                          throws IOException

Throws:: IOException

getPathLength
```
public static long getPathLength(org.apache.hadoop.fs.FileSystem fs,
                                 org.apache.hadoop.fs.FileStatus status,
                                 long max)
                          throws IOException
```
Returns the total number of bytes for this file, or if a directory all files in the directory.

Parameters:

fs - FileSystem

status - FileStatus

max - Maximum value of total length that will trigger exit. Many times we're only interested whether the total length of files is greater than X or not. In such case, we can exit the function early as soon as the max is reached.

Returns:

Throws:

IOException

getCombinePigSplits

public static List<List<org.apache.hadoop.mapreduce.InputSplit>> getCombinePigSplits(List<org.apache.hadoop.mapreduce.InputSplit> oneInputSplits,
                                                                                     long maxCombinedSplitSize,
                                                                                     org.apache.hadoop.conf.Configuration conf)
                                                                              throws IOException,
                                                                                     InterruptedException

Throws:: IOException; InterruptedException

inputSplitToString

public String inputSplitToString(org.apache.hadoop.mapreduce.InputSplit[] splits)
                          throws IOException,
                                 InterruptedException

Throws:: IOException; InterruptedException

Class MapRedUtil

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

FILE_SYSTEM_NAME

Constructor Detail

MapRedUtil

Method Detail

loadPartitionFileFromLocalCache

copyTmpFileConfigurationValues

setupUDFContext

setupStreamingDirsConfSingle

setupStreamingDirsConfMulti

checkLeafIsStore

getAllFileRecursively

getPathLength

getPathLength

getCombinePigSplits

inputSplitToString