public class BuildBloom extends BuildBloomBase<DataByteArray> implements Algebraic
BloomFilter
.Modifier and Type | Class and Description |
---|---|
static class |
BuildBloom.Final |
static class |
BuildBloom.Initial |
static class |
BuildBloom.Intermediate |
EvalFunc.SchemaType
filter, hType, numHash, vSize
log, pigLogger, reporter, returnType
Constructor and Description |
---|
BuildBloom(String hashType,
String numElements,
String desiredFalsePositive)
Construct a Bloom filter based on expected number of elements and
desired accuracy.
|
BuildBloom(String hashType,
String mode,
String vectorSize,
String nbHash)
Build a bloom filter of fixed size and number of hash functions.
|
Modifier and Type | Method and Description |
---|---|
DataByteArray |
exec(Tuple input)
This callback method must be implemented by all subclasses.
|
String |
getFinal()
Get the final function.
|
String |
getInitial()
Get the initial function.
|
String |
getIntermed()
Get the intermediate function.
|
Schema |
outputSchema(Schema input)
Report the schema of the output of this UDF.
|
bloomIn, bloomOr, bloomOut
allowCompileTimeCalculation, finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, getShipFiles, isAsynchronous, needEndOfAllInputProcessing, progress, setEndOfAllInput, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
public BuildBloom(String hashType, String mode, String vectorSize, String nbHash)
hashType
- type of the hashing function (see
Hash
).mode
- Will be ignored, though by convention it should be
"fixed" or "fixedsize"vectorSize
- The vector size of this filter.nbHash
- The number of hash functions to consider.public BuildBloom(String hashType, String numElements, String desiredFalsePositive)
hashType
- type of the hashing function (see
Hash
).numElements
- The number of distinct elements expected to be
placed in this filter.desiredFalsePositive
- the acceptable rate of false positives.
This should be a floating point value between 0 and 1.0, where 1.0
would be 100% (ie, a totally useless filter).public DataByteArray exec(Tuple input) throws IOException
EvalFunc
exec
in class EvalFunc<DataByteArray>
input
- the Tuple to be processed.IOException
public String getInitial()
Algebraic
getInitial
in interface Algebraic
public String getIntermed()
Algebraic
getIntermed
in interface Algebraic
public String getFinal()
Algebraic
public Schema outputSchema(Schema input)
EvalFunc
The default implementation interprets the OutputSchema
annotation,
if one is present. Otherwise, it returns null
(no known output schema).
outputSchema
in class EvalFunc<DataByteArray>
input
- Schema of the inputCopyright © 2007-2012 The Apache Software Foundation