Skip to content

Optimization Strategies

Strategy Type Status
Split Conjunctive Predicates Heuristic Implemented
Predicate Pushdown Schema-Aware Implemented
Projection Pushdown Schema-Aware Implemented
Constant Folding Heuristic Implemented
Predicate Rewriter Heuristic Implemented
Aggregate Pushdown Schema-Aware Considered
IN (literal) to JOIN Schema-Aware Considered
Use Heap Sort Heuristic Implemented
Use pass-thru DISTINCT Heuristic Implemented
Limit Pushdown Heuristic Implemented
Distinct Pushdown Heuristic Implemented
IN (subquery) to JOIN Schema-Aware Considered
CTE rewrite Heuristic Considered
Subquery flattening Schema-Aware Considered
JOIN ordering Cost-Based Considered
Predicate Ordering Cost-Based Attempted
Predicate Flattening Schema-Aware Attempted
Predicate Compaction Schema-Aware Designed
Correlated Predicates Schema-Aware Implemented
Predicate Elimination Heuristic Implemented
JOIN Elimination Schema-Aware Considered

Split Conjunctive Predicates

status implemented
goal prepare
description Split ANDed (conjunctions) predicates into separate steps in the query plan

SplitConjunctivePredicatesStrategy

This optimization step is preparing for later optimizations, it splits filter conditions at ANDs to try to create smaller, simpler individual conditions (predicates). These predicates are then handled in subsequent strategies, for example Predicate Pushdown and Predicate Ordering.

To split conditions, no information is needed about the schema or about costs, this strategy is executed before the binder.

Predicate Pushdown

status implemented
goal reduce rows
description Move filters earlier in the query plan

PredicatePushdownStrategy

This optimization aims to reduce the number of records as quickly as possible running through the execution engine by executing filters as soon as possible. This can include pushing filters into the actual data read (e.g. for SQL and parquet), straight after reading, or into JOIN conditions.

This optimization needs to know which relation identifiers are from to be able to push toward the correct scan/JOIN, this strategy is run after the binder.

improvements - Push complex predicates (e.g. ORed) to SQL sources - Where filters can be inferred from statistics and JOINs, create and push these

Projection Pushdown

status implemented
goal reduce columns
description Trim unwanted columns at read

ProjectionPushdownStrategy

This optimization aims to increase the number of records per morsel by removing unwanted columns from relations. This is generally done during the actual read, or straight after reading (e.g. for JSONL and CSV).

This optimization needs to know which relation identifiers are from to be able to push toward the correct scan, this strategy is run after the binder.

improvements - Push common functions and transforms to SQL sources

Constant Folding

status implemented
goal reduce calculations
description Pre-evaluate expressions

ConstantFoldingStrategy

This optimization aims to reduce the work done to evaluate expressions by pre-evaluating expressions which don't rely on data from relations. This includes fixed constants (e.g. literals, PI()), variable constants (current_time) and variables.

This optimization requires variables to be resolved so is run after the binder.

improvements - filters which evaluate to false (when ANDed) should prune the scan(s) below it

Predicate Rewriter

status implemented
goal faster implementations
description replace predicates with faster versions

BooleanSimplificationStrategy
PredicateRewriteStrategy

Some predicates can support complex filtering but are used to perform trivial filtering, where a complex predicate is used to perform a trivial check, replace the check with an simpler function call which is faster.

  • demorgans laws
  • negative filter reduction
  • LIKE to STARTS_WITH, ENDS_WITH, SEARCH/INSTR
  • Single element IN to equals
  • No wildcard LIKE to equals

improvements Identify other functions which have faster versions.

Aggregate Pushdown

into SQL sources

IN (literal) to JOIN

Use Heap Sort

status implemented
goal reduce memory usage & sort complexity
description incremental sort

OperatorFusionStrategy

When performing an ORDER BY and a LIMIT, use a heap sort in batches to avoid loading the entire dataset into memory.

This works by acquiring tuples to sort in batches, sorting the batch, keeping the number of tuples to satisfy the limit and then fetching the next batch.

Use pass-thru DISTINCT

status implemented
goal reduce memory usage description incremental distinct

When dealing with large number of records, rather than load them all into memory to to the distinct, use a pass-thru limit approach by maintaining a HashSet of the seen values and checking each morsel against the HashSet.

Limit Pushdown

status implemented
goal reduce data movement description incremental sort

LimitPushdownStrategy

Push limits to the SQL reader to reduce the amount of data trnasferred and processed by the engine.

improvements Push limit to other readers.

Distinct Pushdown

status implemented
goal reduce size of internal table creation

DistinctPushdownStrategy

Push DISTINCT clause into CROSS JOIN UNNEST.

IN (subquery) to JOIN

CTE rewrite

Subquery flattening

JOIN ordering

Predicate Ordering

Predicate Flattening

Predicate Elimination

Where logical expressions contain boolean literals, these can be removed sometimes resulting in the entire expression being removed. E.g. 1 = 1 OR name = 'Alice' can be completely removed avoiding any further handling of the check against the name column.

Correlated Predicates

status implemented
goal reduce records processed in joins

CorrelatedFiltersStrategy

If a column participating in a JOIN has a known value range, push down corresponding filters to both sides of the JOIN, reducing the size of the datasets that need to be joined.

JOIN Elimination

Value range information for fields participating in JOINs could lead to the identification of redundant JOINs. Where the value ranges of the joining columns do not overlap, the JOIN will not produce any results, the entire sub-tree below the JOINcould be pruned.