Heuristic cost estimation

Using Heuristic cost estimation

Process for heuristics optimization

1.The parser of a high-level query generates an initial internal representation;

2.Apply heuristics rules to optimize the internal representation.

3.A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query.

The main heuristic is to apply first the operations that reduce the size of intermediate results. E.g., Apply SELECT and PROJECT operations before applying the JOIN or other binary operations.

Query tree: a tree data structure that corresponds to a relational algebra expression. It represents the input relations of the query as leaf nodes of the tree, and represents the relational algebra operations as internal nodes.

An execution of the query tree consists of executing an internal node operation whenever its operands are available and then replacing that internal node by the relation that results from executing the operation.

Query graph: a graph data structure that corresponds to a relational calculus expression. It does not indicate an order on which operations to perform first. There is only a single graph corresponding to each query.

Example:
For every project located in ‘Stafford’, retrieve the project number, the controlling department number and the department manager’s last name, address and birthdate.

SQL query:

Q2: SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;

Heuristic Optimization of Query Trees:

The same query could correspond to many different relational algebra expressions — and hence many different query trees.

The task of heuristic optimization of query trees is to find a final query tree that is efficient to execute.

Example:

Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND PNMUBER=PNO
AND ESSN=SSN AND BDATE > ‘1957-12-31’;

General Transformation Rules for Relational Algebra Operations:

Outline of a Heuristic Algebraic Optimization Algorithm:

1.Using rule 1, break up any select operations with conjunctive conditions into a cascade of select operations.

2.Using rules 2, 4, 6, and 10 concerning the commutativity of select with other operations, move each select operation as far down the query tree as is permitted by the attributes involved in the select condition.

3.Using rule 9 concerning associativity of binary operations, rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive select operations are executed first in the query tree representation.

4.Using Rule 12, combine a cartesian product operation with a subsequent select operation in the tree into a join operation.

5.Using rules 3, 4, 7, and 11 concerning the cascading of project and the commuting of project with other operations, break down and move lists of projection attributes down the tree as far as possible by creating new project operations as needed.

6.Identify subtrees that represent groups of operations that can be executed by a single algorithm.

Summary of Heuristics for Algebraic Optimization:

1.The main heuristic is to apply first the operations that reduce the size of intermediate results.

2.Perform select operations as early as possible to reduce the number of tuples and perform project operations as early as possible to reduce the number of attributes. (This is done by moving select and project operations as far down the tree as possible.)

3.The select and join operations that are most restrictive should be executed before other similar operations. (This is done by reordering the leaf nodes of the tree among themselves and adjusting the rest of the tree appropriately.)

Query Execution Plans

An execution plan for a relational algebra query consists of a combination of the relational algebra query tree and information about the access methods to be used for each relation as well as the methods to be used in computing the relational operators stored in the tree.

Materialized evaluation: the result of an operation is stored as a temporary relation.

Pipelined evaluation: as the result of an operator is produced, it is forwarded to the next operator in sequence.

8. Using Selectivity and Cost Estimates in Query Optimization

Cost-based query optimization: Estimate and compare the costs of executing a query using different execution strategies and choose the strategy with the lowest cost estimate.

(Compare to heuristic query optimization)

Issues
Cost function
Number of execution strategies to be considered

Cost Components for Query Execution
1.Access cost to secondary storage
2.Storage cost
3.Computation cost
4.Memory usage cost
5.Communication cost

Note: Different database systems may focus on different cost components.

Catalog Information Used in Cost Functions

Information about the size of a file
number of records (tuples) (r),
record size (R),
number of blocks (b)
blocking factor (bfr)

Information about indexes and indexing attributes of a file
Number of levels (x) of each multilevel index
Number of first-level index blocks (bI1)
Number of distinct values (d) of an attribute
Selectivity (sl) of an attribute
Selection cardinality (s) of an attribute. (s = sl * r)

Examples of Cost Functions for SELECT S1. Linear search (brute force) approach

Examples of using the cost functions.

Examples of Cost Functions for JOIN