Ce contenu n'est pas disponible dans la langue sélectionnée.
14.9. Query Planner
14.9.1. Query Planner Copier lienLien copié sur presse-papiers!
- Relational Planner
- Procedure Planner
- XML Planner
- Generate canonical plan
- Optimization
- Plan to process converter - converts plan data structure into a processing form
14.9.2. Relational Planner Copier lienLien copié sur presse-papiers!
- WITH (create common table expressions) - handled by a specialized PROJECT NODE
- FROM (read and join all data from tables) - SOURCE node for each from clause item, Join node (if >1 table)
- WHERE (filter rows) - SELECT node
- GROUP BY (group rows into collapsed rows) - GROUP node
- HAVING (filter grouped rows) - SELECT node
- SELECT (evaluate expressions and return only requested rows) - PROJECT node and DUP_REMOVE node (for SELECT DISTINCT)
- INTO - specialized PROJECT with a SOURCE child
- ORDER BY (sort rows) - SORT node
- LIMIT (limit result set to a certain range of results) - LIMIT node
Project(groups=[anon_grp0], props={PROJECT_COLS=[anon_grp0.agg0 AS expr1]})
Group(groups=[anon_grp0], props={SYMBOL_MAP={anon_grp0.agg0=MAX(pm1.g1.e1)}})
Select(groups=[pm1.g1], props={SELECT_CRITERIA=e2 = 1})
Source(groups=[pm1.g1])
Project(groups=[anon_grp0], props={PROJECT_COLS=[anon_grp0.agg0 AS expr1]})
Group(groups=[anon_grp0], props={SYMBOL_MAP={anon_grp0.agg0=MAX(pm1.g1.e1)}})
Select(groups=[pm1.g1], props={SELECT_CRITERIA=e2 = 1})
Source(groups=[pm1.g1])
- ACCESS - a source access or plan execution.
- DUP_REMOVE - removes duplicate rows
- JOIN - a join (LEFT OUTER, FULL OUTER, INNER, CROSS, SEMI, etc.)
- PROJECT - a projection of tuple values
- SELECT - a filtering of tuples
- SORT - an ordering operation, which may be inserted to process other operations such as joins
- SOURCE - any logical source of tuples including an inline view, a source access, XMLTABLE, etc.
- GROUP - a grouping operation
- SET_OP - a set operation (UNION/INTERSECT/EXCEPT)
- NULL - a source of no tuples
- TUPLE_LIMIT - row offset / limit
- ATOMIC_REQUEST - The final form of a source request
- MODEL_ID - The metadata object for the target model/schema
- PROCEDURE_CRITERIA/PROCEDURE_INPUTS/PROCEDURE_DEFAULTS - Used in planning procedureal relational queries
- IS_MULTI_SOURCE - set to true when the node represents a multi-source access
- SOURCE_NAME - used to track the multi-source source name
- CONFORMED_SOURCES - tracks the set of conformed sources when the conformed extension metadata is used
- SUB_PLAN/SUB_PLANS - used in multi-source planning
- SET_OPERATION/USE_ALL - defines the set operation (UNION/INTERSECT/EXCEPT) and if all rows or distinct rows are used.
- JOIN_CRITERIA - all join predicates
- JOIN_TYPE - type of join (INNER, LEFT OUTER, etc.)
- JOIN_STRATEGY - the algorithm to use (nested loop, merge, etc.)
- LEFT_EXPRESSIONS - the expressions in equi-join predicates that originate from the left side of the join
- RIGHT_EXPRESSIONS - the expressions in equi-join predicates that originate from the right side of the join
- DEPENDENT_VALUE_SOURCE - set if a dependent join is used
- NON_EQUI_JOIN_CRITERIA - non-equi join predicates
- SORT_LEFT - if the left side needs sorted for join processing
- SORT_RIGHT - if the right side needs sorted for join processing
- IS_OPTIONAL - if the join is optional
- IS_LEFT_DISTINCT - if the left side is distinct with respect to the equi join predicates
- IS_RIGHT_DISTINCT - if the right side is distinct with respect to the equi join predicates
- IS_SEMI_DEP - if the dependent join represents a semi-join
- PRESERVE - if the preserve hint is preserving the join order
- PROJECT_COLS - the expressions projected
- INTO_GROUP - the group targeted if this is a select into or insert with a query expression
- HAS_WINDOW_FUNCTIONS - true if window functions are used
- CONSTRAINT - the constraint that must be met if the values are being projected into a group
- SELECT_CRITERIA - the filter
- IS_HAVING - if the filter is applied after grouping
- IS_PHANTOM - true if the node is marked for removal, but temporarily left in the plan.
- IS_TEMPORARY - inferred criteria that may not be used in the final plan
- IS_COPIED - if the criteria has already been processed by rule copy criteria
- IS_PUSHED - if the criteria is pushed as far as possible
- IS_DEPENDENT_SET - if the criteria is the filter of a dependent join
- SORT_ORDER - the order by that defines the sort
- UNRELATED_SORT - if the ordering includes a value that is not being projected
- IS_DUP_REMOVAL - if the sort should also perform duplicate removal over the entire projection
- Source Properties - many source properties also become present on associated access nodes
- SYMBOL_MAP - the mapping from the columns above the source to the projected expressions. Also present on Group nodes
- PARTITION_INFO - the partitioning of the union branches
- VIRTUAL_COMMAND - if the source represents an view or inline view, the query that defined the view
- MAKE_DEP - hint information
- PROCESSOR_PLAN - the processor plan of a non-relational source (typically from the NESTED_COMMAND)
- NESTED_COMMAND - the non-relational command
- TABLE_FUNCTION - the table function (XMLTABLE, OBJECTTABLE, etc.) defining the source
- CORRELATED_REFERENCES - the correlated references for the nodes below the source
- MAKE_NOT_DEP - if make not dep is set
- INLINE_VIEW - If the source node represents an inline view
- NO_UNNEST - if the no_unnest hint is set
- MAKE_IND - if the make ind hint is set
- SOURCE_HINT - the source hint. See Federated Optimizations.
- ACCESS_PATTERNS - access patterns yet to be satisfied
- ACCESS_PATTERN_USED - satisfied access patterns
- REQUIRED_ACCESS_PATTERN_GROUPS - groups needed to satisfy the access patterns. Used in join planning.
- GROUP_COLS - the grouping columns
- ROLLUP - if the grouping includes a rollup
- MAX_TUPLE_LIMIT - expression that evaluates to the max number of tuples generated
- OFFSET_TUPLE_COUNT - Expression that evaluates to the tuple offset of the starting tuple
- IS_IMPLICIT_LIMIT - if the limit is created by the rewriter as part of a subquery
- IS_NON_STRICT - if the unordered limit should not be enforced strictly optimization
- OUTPUT_COLS - the output columns for the node. Is typically set after rule assign output elements.
- EST_SET_SIZE - represents the estimated set size this node would produce for a sibling node as the independent node in a dependent join scenario
- EST_DEP_CARDINALITY - value that represents the estimated cardinality (amount of rows) produced by this node as the dependent node in a dependent join scenario
- EST_DEP_JOIN_COST - value that represents the estimated cost of a dependent join (the join strategy for this could be Nested Loop or Merge)
- EST_JOIN_COST - value that represents the estimated cost of a merge join (the join strategy for this could be Nested Loop or Merge)
- EST_CARDINALITY - represents the estimated cardinality (amount of rows) produced by this node
- EST_COL_STATS - column statistics including number of null values, distinct value count,
- EST_SELECTIVITY - represents the selectivity of a criteria node
- Access Pattern Validation - ensures that all access patterns have been satisfied
- Apply Security - applies row and column level security
- Assign Output Symbol - this rule walks top down through every node and calculates the output columns for each node. Columns that are not needed are dropped at every node, which is known as projection minimization. This is done by keeping track of both the columns needed to feed the parent node and also keeping track of columns that are “created” at a certain node.
- Calculate Cost - adds costing information to the plan
- Choose Dependent - this rule looks at each join node and determines whether the join should be made dependent and in which direction. Cardinality, the number of distinct values, and primary key information are used in several formulas to determine whether a dependent join is likely to be worthwhile. The dependent join differs in performance ideally because a fewer number of values will be returned from the dependent side. Also, we must consider the number of values passed from independent to dependent side. If that set is larger than the max number of values in an IN criteria on the dependent side, then we must break the query into a set of queries and combine their results. Executing each query in the connector has some overhead and that is taken into account. Without costing information a lot of common cases where the only criteria specified is on a non-unique (but strongly limiting) field are missed. A join is eligible to be dependent if:there is at least one equi-join criterion, i.e. tablea.col = tableb.colthe join is not a full outer join and the dependent side of the join is on the inner side of the joinThe join will be made dependent if one of the following conditions, listed in precedence order, holds:There is an unsatisfied access pattern that can be satisfied with the dependent join criteriaThe potential dependent side of the join is marked with an option makedep if costing was enabled, the estimated cost for the dependent join (possibly in each direction in the case of inner joins) is computed and compared to not performing the dependent join. If the costs were all determined (which requires all relevant table cardinality, column ndv, and possibly nnv values to be populated) the lowest is chosen.If key metadata information indicates that the potential dependent side is not “small” and the other side is “not small” or the potential dependent side is the inner side of a left outer join.Dependent join is the key optimization we use to efficiently process multi-source joins.Instead of reading all of source A and all of source B and joining them on A.x = B.x, we read all of A then build a set of A.x that are passed as a criteria when querying B. In cases where A is small and B is large, this can drastically reduce the data retrieved from B, thus greatly speeding the overall query.
- Choose Join Strategy - choose the join strategy based upon the cost and attributes of the join.
- Clean Criteria - removes phantom criteria
- Collapse Source - takes all of the nodes below an access node and creates a SQL query representation
- Copy Criteria - this rule copies criteria over an equality criteria that is present in the criteria of a join. Since the equality defines an equivalence, this is a valid way to create a new criteria that may limit results on the other side of the join (especially in the case of a multi-source join).
- Decompose Join - this rule perfomes a partition-wise join optimization on joins of Federated Optimizations#Partitioned Union. The decision to decompose is based upon detecting that each side of the join is a partitioned union (note that non-ansi joins of more than 2 tables may cause the optimization to not detect the appropriate join). The rule currently only looks for situations where at most 1 partition matches from each side.
- Implement Join Strategy - adds necessary sort and other nodes to process the chosen join strategy
- Merge Criteria - combines select nodes and can convert subqueries to semi-joins
- Merge Virtual - removes view and inline view layers
- Place Access - places access nodes under source nodes. An access node represents the point at which everything below the access node gets pushed to the source or is a plan invocation. Later rules focus on either pushing under the access or pulling the access node up the tree to move more work down to the sources. This rule is also responsible for placing Federated Optimizations Access Patterns.
- Plan Joins - this rule attempts to find an optimal ordering of the joins performed in the plan, while ensuring that Federated Optimizations Access Patterns dependencies are met. This rule has three main steps. First it must determine an ordering of joins that satisfy the access patterns present. Second it will heuristically create joins that can be pushed to the source (if a set of joins are pushed to the source, we will not attempt to create an optimal ordering within that set. More than likely it will be sent to the source in the non-ANSI multi-join syntax and will be optimized by the database). Third it will use costing information to determine the best left-linear ordering of joins performed in the processing engine. This third step will do an exhaustive search for 6 or less join sources and is heuristically driven by join selectivity for 7 or more sources.
- Plan Procedures - plans procedures that appear in procedural relational queries
- Plan Sorts - optimizations around sorting, such as combining sort operations or moving projection
- Plan Unions - reorders union children for more pushdown
- Plan Aggregates - performs aggregate decomposition over a join or union
- Push Limit - pushes the effect of a limit node further into the plan
- Push Non-Join Criteria - this rule will push predicates from the On Clause if it is not necessary for the correctness of the join.
- Push Select Criteria - pushed select nodes as far as possible through unions, joins, and views layers toward the access nodes. In most cases movement down the tree is good as this will filter rows earlier in the plan. We currently do not undo the decisions made by Push Select Criteria. However in situations where criteria cannot be evaluated by the source, this can lead to sub optimal plans.
- Raise Access - this rule attempts to raise the Access nodes as far up the plan as possible. This is mostly done by looking at the source’s capabilities and determining whether the operations can be achieved in the source or not.
- Raise Null - raises null nodes. Raising a null node removes the need to consider any part of the old plan that was below the null node.
- Remove Optional Joins - removes joins that are marked as or determined to be optional
- Substitute Expressions - used only when a function based index is present
- Validate Where All - ensures criteria is used when required by the source
OPTIMIZATION COMPLETE: PROCESSOR PLAN: AccessNode(0) output=[e1] SELECT g_0.e1 FROM pm1.g1 AS g_0
OPTIMIZATION COMPLETE:
PROCESSOR PLAN:
AccessNode(0) output=[e1] SELECT g_0.e1 FROM pm1.g1 AS g_0
The procedure planner is fairly simple. It converts the statements in the procedure into instructions in a program that will be run during processing. This is mostly a 1-to-1 mapping and very little optimization is performed.
- Document selection - determine which tags of the virtual document should be excluded from the output document. This is done based on a combination of the model (which marks parts of the document excluded) and the query (which may specify a subset of columns to include in the SELECT clause).
- Criteria evaluation - breaks apart the user’s criteria, determine which result set the criteria should be applied to, and add that criteria to that result set query.
- Result set ordering - the query’s ORDER BY clause is broken up and the ORDER BY is applied to each result set as necessary
- Result set planning - ultimately, each result set is planned using the relational planner and taking into account all the impacts from the user's query. The planner will also look to automatically create staging tables and dependent joins based upon the mapping class hierarchy.
- Program generation - a set of instructions to produce the desired output document is produced, taking into account the final result set queries and the excluded parts of the document. Generally, this involves walking through the virtual document in document order, executing queries as necessary and emitting elements and attributes.