Class DIFF

  extended by org.apache.pig.EvalFunc<DataBag>
      extended by org.apache.pig.builtin.DIFF

public class DIFF
extends EvalFunc<DataBag>

DIFF takes two bags as arguments and compares them. Any tuples that are in one bag but not the other are returned. If the fields are not bags then they will be returned if they do not match, or an empty bag will be returned if the two records match.

The implementation assumes that both bags being passed to this function will fit entirely into memory simultaneously. If that is not the case the UDF will still function, but it will be very slow.

Nested Class Summary
Nested classes/interfaces inherited from class org.apache.pig.EvalFunc
Field Summary
Fields inherited from class org.apache.pig.EvalFunc
log, pigLogger, reporter, returnType
Constructor Summary
Method Summary
 DataBag exec(Tuple input)
          Compares a tuple with two fields.
Methods inherited from class org.apache.pig.EvalFunc
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, outputSchema, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail


public DIFF()
Method Detail


public DataBag exec(Tuple input)
             throws IOException
Compares a tuple with two fields. Emits any differences.

Specified by:
exec in class EvalFunc<DataBag>
input - a tuple with exactly two fields.
result, of type T.
IOException - if there are not exactly two fields in a tuple

Copyright © 2007-2012 The Apache Software Foundation