Package org.apache.pig.impl.logicalLayer

The logical operators that represent a pig script and tools for manipulating those operators.

See: Description

Package org.apache.pig.impl.logicalLayer Description

The logical operators that represent a pig script and tools for manipulating those operators. The logical layer contains the logical operators themselves, as well as validators that check the logical plan, an optimizer, and a general visitor utility for working with the logical plans.

Design

Logical operators use the operator, plan, visitor, and optimizer framework provided by the org.apache.pig.impl.plan package.

Logical operators consist of both relational and expression operators. Relational operators work on an entire bag. Expression operators work on an element of a tuple (which may also be a bag). Due to Pig's nested data and execution model the distinction between relational and expression operators is not always clear. And some operators such as LOProject function as both.

In a traditional data base system, a query execution plan is constructed from relational operators, such as project, filter, sort, aggregate, join. Each of these may contain an expression tree, made up of expression operators. For example, consider a SQL query select a from T where a = 5;. The where clause would be represented by a filter operator with an expression tree for a=5.

Pig takes a similar approach, except that the operators contained inside of a relational operator may also be relational. For example, a foreach statement that has a nested script, such as foreach B { C = distinct $1; generate group, COUNT(C);}. This foreach needs to contain not just an expression tree but the distinct relational operator. For this reason, Pig's relational operators do not contain expression trees. Instead they contain one or more LogicalPlans themselves. This allows Pig to arbitrarily nest the logical plan. In this sense Pig is more similar to a traditional procedural language where certain statements (e.g. if, while) can contain any other statement in the language rather than being like SQL where the statement execution tends to be more linear.

Copyright © 2007-2012 The Apache Software Foundation