BuildBloom (Pig 0.18.0 API)

java.lang.Object
- org.apache.pig.EvalFunc<T>
- - org.apache.pig.builtin.BuildBloomBase<DataByteArray>
  - - org.apache.pig.builtin.BuildBloom

All Implemented Interfaces:

Algebraic
```
public class BuildBloom
extends BuildBloomBase<DataByteArray>
implements Algebraic
```
Build a bloom filter for use later in Bloom. This UDF is intended to run in a group all job. For example: define bb BuildBloom('jenkins', '100', '0.1'); A = load 'foo' as (x, y); B = group A all; C = foreach B generate bb(A.x); store C into 'mybloom'; The bloom filter can be on multiple keys by passing more than one field (or the entire bag) to BuildBloom. The resulting file can then be used in a Bloom filter as: define bloom Bloom('mybloom'); A = load 'foo' as (x, y); B = load 'bar' as (z); C = filter B by bloom(z); D = join C by z, A by x; It uses BloomFilter.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class BuildBloom.Final

static class BuildBloom.Initial

static class BuildBloom.Intermediate
- Nested classes/interfaces inherited from class org.apache.pig.EvalFunc
  EvalFunc.SchemaType

Nested Classes
Modifier and Type	Class and Description
`static class`	`BuildBloom.Final`
`static class`	`BuildBloom.Initial`
`static class`	`BuildBloom.Intermediate`

Field Summary
- Fields inherited from class org.apache.pig.builtin.BuildBloomBase
  filter, hType, numHash, vSize
- Fields inherited from class org.apache.pig.EvalFunc
  log, pigLogger, reporter, returnType

Constructor Summary

Constructors
Constructor and Description
`BuildBloom(java.lang.String hashType, java.lang.String numElements, java.lang.String desiredFalsePositive)` Construct a Bloom filter based on expected number of elements and desired accuracy.
`BuildBloom(java.lang.String hashType, java.lang.String mode, java.lang.String vectorSize, java.lang.String nbHash)` Build a bloom filter of fixed size and number of hash functions.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`DataByteArray`	`exec(Tuple input)` This callback method must be implemented by all subclasses.
`java.lang.String`	`getFinal()` Get the final function.
`java.lang.String`	`getInitial()` Get the initial function.
`java.lang.String`	`getIntermed()` Get the intermediate function.
`Schema`	`outputSchema(Schema input)` Report the schema of the output of this UDF.

Methods inherited from class org.apache.pig.builtin.BuildBloomBase
bloomIn, bloomOr, bloomOut

Methods inherited from class org.apache.pig.EvalFunc
addCredentials, allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLoadCaster, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, needEndOfAllInputProcessing, progress, setEndOfAllInput, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - BuildBloom
```
public BuildBloom(java.lang.String hashType,
                  java.lang.String mode,
                  java.lang.String vectorSize,
                  java.lang.String nbHash)
```
    Build a bloom filter of fixed size and number of hash functions.
    
    Parameters:
    
    hashType - type of the hashing function (see Hash).
    
    mode - Will be ignored, though by convention it should be "fixed" or "fixedsize"
    
    vectorSize - The vector size of this filter.
    
    nbHash - The number of hash functions to consider.
  - BuildBloom
```
public BuildBloom(java.lang.String hashType,
                  java.lang.String numElements,
                  java.lang.String desiredFalsePositive)
```
    Construct a Bloom filter based on expected number of elements and desired accuracy.
    
    Parameters:
    
    hashType - type of the hashing function (see Hash).
    
    numElements - The number of distinct elements expected to be placed in this filter.
    
    desiredFalsePositive - the acceptable rate of false positives. This should be a floating point value between 0 and 1.0, where 1.0 would be 100% (ie, a totally useless filter).
- Method Detail
  - exec
```
public DataByteArray exec(Tuple input)
                   throws java.io.IOException
```
    Description copied from class: EvalFunc
    
    This callback method must be implemented by all subclasses. This is the method that will be invoked on every Tuple of a given dataset. Since the dataset may be divided up in a variety of ways the programmer should not make assumptions about state that is maintained between invocations of this method.
    
    Specified by:
    
    exec in class EvalFunc<DataByteArray>
    
    Parameters:
    
    input - the Tuple to be processed.
    
    Returns:
    
    result, of type T.
    
    Throws:
    
    java.io.IOException
  - getInitial
```
public java.lang.String getInitial()
```
    Description copied from interface: Algebraic
    
    Get the initial function.
    
    Specified by:
    
    getInitial in interface Algebraic
    
    Returns:
    
    A function name of f_init. f_init should be an eval func. The return type of f_init.exec() has to be Tuple
  - getIntermed
```
public java.lang.String getIntermed()
```
    Description copied from interface: Algebraic
    
    Get the intermediate function.
    
    Specified by:
    
    getIntermed in interface Algebraic
    
    Returns:
    
    A function name of f_intermed. f_intermed should be an eval func. The return type of f_intermed.exec() has to be Tuple
  - getFinal
```
public java.lang.String getFinal()
```
    Description copied from interface: Algebraic
    
    Get the final function.
    
    Specified by:
    
    getFinal in interface Algebraic
    
    Returns:
    
    A function name of f_final. f_final should be an eval func parametrized by the same datum as the eval func implementing this interface.
  - outputSchema
```
public Schema outputSchema(Schema input)
```
    Description copied from class: EvalFunc
    
    Report the schema of the output of this UDF. Pig will make use of this in error checking, optimization, and planning. The schema of input data to this UDF is provided.
    The default implementation interprets the OutputSchema annotation, if one is present. Otherwise, it returns null (no known output schema).
    
    Overrides:
    
    outputSchema in class EvalFunc<DataByteArray>
    
    Parameters:
    
    input - Schema of the input
    
    Returns:
    
    Schema of the output

Class BuildBloom

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.pig.EvalFunc

Field Summary

Fields inherited from class org.apache.pig.builtin.BuildBloomBase

Fields inherited from class org.apache.pig.EvalFunc

Constructor Summary

Method Summary

Methods inherited from class org.apache.pig.builtin.BuildBloomBase

Methods inherited from class org.apache.pig.EvalFunc

Methods inherited from class java.lang.Object

Constructor Detail

BuildBloom

BuildBloom

Method Detail

exec

getInitial

getIntermed

getFinal

outputSchema