public class Bloom extends FilterFunc
BloomFilter
.
You can also pass the Bloom filter from BuildBloom directly to Bloom UDF
as a scalar instead of storing it to file and loading again. This is simpler
if the Bloom filter will not be reused and needs to be discarded after the
run of the script.
define bb BuildBloom('jenkins', '100', '0.1');
A = load 'foo' as (x, y);
B = group A all;
C = foreach B generate bb(A.x) as bloomfilter;
D = load 'bar' as (z);
E = filter D by Bloom(C.bloomfilter, z);
F = join E by z, A by x;EvalFunc.SchemaType
log, pigLogger, reporter, returnType
Constructor and Description |
---|
Bloom() |
Bloom(String filename)
The filename containing the serialized Bloom filter.
|
Modifier and Type | Method and Description |
---|---|
Boolean |
exec(Tuple input)
This callback method must be implemented by all subclasses.
|
List<String> |
getCacheFiles()
Allow a UDF to specify a list of hdfs files it would like placed in the distributed
cache.
|
void |
setFilter(DataByteArray dba)
For testing only, do not use directly.
|
finish
allowCompileTimeCalculation, getArgToFuncMapping, getInputSchema, getLoadCaster, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, needEndOfAllInputProcessing, outputSchema, progress, setEndOfAllInput, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
public Bloom()
public Bloom(String filename)
filename
- file containing the serialized Bloom filterpublic Boolean exec(Tuple input) throws IOException
EvalFunc
exec
in class EvalFunc<Boolean>
input
- the Tuple to be processed.IOException
public List<String> getCacheFiles()
EvalFunc
getCacheFiles
in class EvalFunc<Boolean>
public void setFilter(DataByteArray dba) throws IOException
IOException
Copyright © 2007-2017 The Apache Software Foundation