- Direct Known Subclasses:
- PartitionSkewedKeysTez
public class PartitionSkewedKeys
extends EvalFunc<Map<String,Object>>
Partition reducers for skewed keys. This is used in skewed join during
sampling process. It figures out how many reducers required to process a
skewed key without causing spill and allocate this number of reducers to this
key. This UDF outputs a map which contains 2 keys:
"totalreducers": the value is an integer wich indicates the
number of total reducers for this join job
"partition.list": the value is a bag which contains a
list of tuples with each tuple representing partitions for a skewed key.
The tuple has format of <join key>,<min index of reducer>,
<max index of reducer>
For example, a join job configures 10 reducers, and the sampling process
finds out 2 skewed keys, "swpv" needs 4 reducers and "swps"
needs 2 reducers. The output file would be like following:
{totalreducers=10, partition.list={(swpv,0,3), (swps,4,5)}}
The name of this file is set into next MR job which does the actual join.
That job uses this information to partition skewed keys properly