Changelog
All notable changes to this project will be documented in this file, where appropriate the GitHub issue reference will be noted along with the change. Breaking changes will be clearly indicated with the icon.
The format is based on Keep a Changelog.
[0.18.0] - 2024-10-15
Added
- [#1993] ClickBench regression suite @joocer
- [#2003] Push equals filters to Firestore @joocer
- [#2008] Push not equals filters to Firestore @joocer
- [#2000] Initial
SHOW CREATE VIEW
support @joocer - [#2040] Additional items in
$statistics
table @joocer - [#1984] Reduce columns exposed by subqueries if not used out outer queries @joocer
- [#2043] Statistics include when optimizations have been triggered @joocer
- [#1899] Support
@>
forARRAY
containment @joocer
Fixed
- [#1981]
null
handling by functions. @joocer - [#2002] IP containment fails on nulls @joocer
- [#1994] ClickBench test failures @joocer
- [#2017]
OFFSET
fails when zero records @joocer - [#2022] Better memory management for
CROSS JOIN UNNEST
@joocer - [#2039] Over committing to buffer pool @joocer
- [#2034] Improvements to
COUNT(*)
for parquet files @joocer - [#2029] Heap Sort sorts data twice in some situations @joocer
- [#2050]
RANDOM_STRING
incorrectly invoked @joocer - [#2051]
CASE
statements failed onnull
values @joocer
Changed
- [#1990] Flag
LIST_CONTAINS_ANY
andLIST_CONTAINS_ALL
as Deprecated @joocer - [#2013] MemoryPool based on Python's multiprocessing buffer for storage @joocer
- [#2021] Error raised when subqueries expose columns with duplicate names. @joocer
- [#2066] Reduce materialization of
ARRAY
columns. @joocer
[0.17.0] - 2024-09-05
Added
- [#1813]
ASCII
,CHAR
,LPAD
andRPAD
functions @joocer - [#998] Initial
HTTP
function relation @joocer - [#1838]
JSONB_OBJECT_KEYS
function @joocer - [#1871] Basic
JOIN
fuzzer @joocer - [#1889]
$statistics
virtual dataset, initially reporting bufferpool statistics @joocer - [#1904] Optional scale attribute to
CIEL
andFLOOR
functions @joocer - [#1926] Add support for Pipe Separated Value (psv) data files @joocer
- [#1931] TPC-H regression suite @joocer
- [#1944] Boolean expression elimination optimizer strategy @joocer
Fixed
- [#1837] Unhelpful error on
CROSS JOIN UNNEST
. @joocer - [#1848] Unary (
IS
) operators fail on zero row tables. @joocer - [#1849] Cannot use
UNION
andFOR
clauses together. @joocer - [#1850] Cannot
IFNULL
anARRAY
column. @joocer - [#1854] Implicit
CROSS JOIN
using function datasets fails. @joocer - [#1857] Cannot use json accessors in conditions. @joocer
- [#1861]
TRY_CAST
doesn't safely fail on empty strings @joocer - [#1865]
COUNT(*)
and date filters pushed to Parquet reader @joocer - [#1875]
IFNULL
errors in some circumstances @joocer - [#1867] Reliability of JSON Accessors @joocer
- [#1878]
COUNT(*)
optimization restored @joocer - [#1880]
!=
incorrectly pushed intoCROSS JOIN UNNEST
@joocer - [#1887] Incorrectly pushed filters to left side of
LEFT OUTER JOIN
@joocer - [#1918] JSON Accessors broken by parser update @joocer
- [#1935]
BLOB
cast double encodes binary data @joocer - [#1955]
ANY
/ALL
ops fail onnull
values @joocer - [#1952] JSON Accessors are not comparisons @joocer
- [#1958] Inconsisitent handling of
null
inINNER JOIN
@joocer - [#1977] Unable to perform arithmetic on result of
GENERATE_SERIES
@joocer
Changed
- [#1877] Updated sqlparser-rs to version 0.49.0 @joocer
- [#1891]
DISTINCT
performance improvements @joocer - [#1906] Improve performance of AVRO files @joocer
- [#1923] Improve performance of
DISTINCT
@joocer - [#1939] Improve performance of
JOIN
s @joocer - [#1960] Remove evaluation context @joocer
- [#1962] Predicate rewriter stand alone strategy @joocer
- [#1963] Constant Folding should traverse expression tree @joocer
[0.16.0] - 2024-07-20
Added
- [#1802] Table-level permissions @joocer
- [#1803] Push date filters to Parquet reader @joocer
- [#1784] Support for Tarchia catalog @joocer
Fixed
[0.15.9] - 2024-06-29
Added
- [#1778] Specialized
INNER JOIN
operator forINTEGERS
andVARCHARS
@joocer - [#1768] Date calculation predicate rewriter (optimizer) @joocer
Fixed
[0.15.8] - 2024-06-29
Added
Fixed
[0.15.5] - 2024-06-18
Changed
[0.15.4] - 2024-06-17
Added
- [#1746] Support legacy Mabel LZMA compressed JSONL files @joocer
- [#1748] S3/MinIO Connector supports async reads @joocer
Changed
- [#1714] [ClickBench]
GROUP BY
literals @joocer - [#1696] Prevent predicates being pushed past limits @joocer
- [#1753] Error on
SELECT TOP
syntax @joocer
Fixed
[0.15.3] - 2024-06-06
Changed
- [#1715] More optimizations
AND
,OR
andXOR
aware @joocer - [#1717] Memcahed notified when resource read from local buffers @joocer
[0.15.1] - 2024-05-31
Added
- [#1685] [ClickBench] Include
BLOB
as column types for predicate pushdowns @joocer - [#1697] [ClickBench] Optimizer removes redundant operators @joocer
- [#1698] [ClickBench] Replace
LIKE
conditions withSTARTS_WITH
andENDS_WITH
functions @joocer - [#1581]
ANY
operator supports literal lists @joocer - [#1690] Support
REGEXP_REPLACE
function @joocer
[0.15.0] - 2024-05-26
Added
- [#723] Initial
MATCH AGAINST
support. @joocer - [#346] Permissions Model. @joocer
- [#1582] Lemmatize full text searches @joocer
- [#1584] Additional Statistics @joocer
- [#1590] Push filters into sub queries @joocer
- [#1588] Push filters into
UNNEST
@joocer - [#1613] Parallelize reads of GCS storage @joocer
- [#1652]
BLOB
andVARCHAR
values can be compared @joocer - [#1666] Heap Sort fused operator (
LIMIT
andORDER BY
implemented) @joocer - [#1665] Smart buffer size allocations @joocer
- [#1676] b-prefixed blob literals @joocer
- [#1586] Initial VIEWs functionality @joocer
Changed
- [#1550] Additional Statistics @joocer
- [#731] Buffer Pool Statistics @joocer
- [#1604] Internal 'Node' object performance @joocer
- [#1643] Mabel Planning Improvements @joocer
Fixed
- [#1587] Filtering on
CROSS JOIN UNNEST
columns pushed too far. @joocer - [#1592] Prevent
RANDOM
being evaluated once in optimizer. @joocer - [#1598] Buffer Pool inefficiencies. @joocer
- [#1620] Failure to set in Memcache shouldn't be fatal. @joocer
- [#1622] Incorrect handling of nulls in JOIN conditions. @joocer
- [#1664] Memory Pool stats not correct @joocer
- [#1674] Errors with large chunks @joocer
[0.14.1] - 2024-04-13
Added
Fixed
[0.14.0] - 2024-04-07
Added
- [#1460] Support PyArrow's IPC formated files. @joocer
- [#1464]
LIKE
with no wildcards rewritten as=
@joocer - [#1468] Python 3.12 support @joocer
- [#1459] Primitive support for
EXECUTE
queries @joocer - [#1479] Unary (
IS TRUE
) operators pushed to SQL sources @joocer - [#540] Initial support for JSON Accessors
->
and->>
. @joocer - [#1499] Error when
EXECUTE USING
syntax is attempted. @joocer - [#1491] New test dataset
$missions
. @joocer - [#1300] Support named Parameters. @joocer
- [#1516] Day of Week placeholders for temporal queries. @joocer
- [#1563] Add support for Cassandra-based sources@joocer
Changed
- [#1448] Improved error message on
GROUP BY
errors. @joocer - [#1447] Levenshtein implementation rewritten. @joocer
- [#1451] Improved IP containment testing performance. @joocer
- [#1410] + [#190] Rewrite
INNER JOIN
@joocer - [#1481] Improvements to
DATE
+/-INTERVAL
performance. @joocer - [#1486] Implement bespoke
INTERVAL
operators @joocer - [#1535] Native implementation of
LEFT JOIN
@joocer - [#1547] Buffer Pool implemented using a Memory Pool @joocer
Fixed
- [#1462] Unhandled exception on empty datasets. @joocer
- [#1465] Clashing column names not aliased correctly. @joocer
- [#1445]
SHOW EXTENDED COLUMNS
not working as expected - changes made to data profile format. @joocer - [#1473]
INNER JOIN
statistics not correct. @joocer - [#1474] Unhelpful error message when
SELECT *
is mixed with column references @joocer - [#1487] Filters are not applied to scans on specific conditions involving
JOIN
s @joocer - [#1480]
INTERVAL
cannot be compared to durations @joocer - [#1485] Improved handling of
COUNT(*)
when pushing to parquet and SQL @joocer - [#1513]
LIST_CONTAINS_ANY
performance @joocer
[0.13.3] - 2024-02-13
Added
Fixed
- [#1413] Regression test failures on Windows. @joocer
- [#1320] Errors due to binding memoization. @joocer
- [#1433] Bandit tests misconfigured. @joocer
Changes
- [#1421] Improve
CROSS JOIN UNNEST
performance. @joocer - [#1410] Whilst working on
JOIN
improvements, improvements toDISTINCT
made. @joocer - Refactored SQL Fuzzer regression test script. @joocer
[0.13.0] - 2024-02-03
Added
Fixed
- [#1402] Force consistent blob read order. @joocer
- [#1391] Rolling Log doesn't truncate records. @joocer
- [#1395] Incorrect error raised on type errors on
JOIN
s. @joocer - [#1397] Subscript
SPLIT
results. @joocer - [#1379] Unable to run on ARM. @joocer
- [#1183]
JOIN
on literals fails. @joocer
Changed
- [#1374] GCS access improvements (up to 2.5x faster IO). @joocer
- [#1393] Performance Tuning blob reading. @joocer
- [#1411] Updated sqlparser-rs to version 0.43.1 dependabot
[0.12.2] - 2024-01-10
Added
- [#1355] Shortcut
OR
evaluations @joocer - [#1363] Shortcut nested
AND
evaluations @joocer - [#21] Support
UNION
statements @joocer - [#1354] New Optimization: Constant Expression Evaluations @joocer
Fixed
Changed
[0.12.0] - 2024-01-02
Fixed
- [#1080] Windows regression test failures. @joocer
- Soundex incorrectly evaluated empty strings as '0000'. @joocer
Changed
- [#1083] Simplify the handling of Query Statistics. @joocer
- [#1086] Update Pythonize to v0.19 and py03 to v0.20.0 dependabot
- [#1042] Create a generic base connector and use for all access @joocer
- [#1032] Introduce a query binder @joocer
- [#1158] Resync sqloxide @joocer
Added
- [#1117] Cockroach Labs regression tests @joocer
- [#1128] Specialized handlers for
IS NOT TRUE
andIS NOT FALSE
@joocer DATES SINCE
temporal filter syntax added. @joocer- [#1145] Debug Logging @joocer
- [#1156] Bitwise operators and Hex literals @joocer
- [#1141] BigQuery regression tests and documentation @joocer
- [#1171] Support
NATURAL JOIN
syntax @joocer - [#1171] Support
SEMI
andANTI
join syntax @joocer - [#1219] Extended
FAKE
syntax @joocer - [#1219]
DISTINCT ON
syntax added @joocer - [#1339] Updated sqlparser-rs to version 0.41.0 dependabot
- [#1329] Add Redis as remote read cache option. @joocer
- [#1337] Support
RLIKE
@joocer - [#1344] Initial Support for
ANY
andALL
array containment syntax @joocer
Removed
- Python 3.8 is no longer supported. @joocer
[0.11.0] - 2023-06-16
Fixed
- [#1069] Minor improvements identified during code review of code to generate numeric series. @joocer
- [#1072] Minor improvements identified during code review of code to handle dates and intervals. @joocer
- [#1026] Removed pin to version 0.11 of PyArrow dependabot
- [#1077] Removed pin to version 0.7.1 of DuckDB dependabot
Changed
- [#808] Rewrite of AST to Logical plan. @joocer
- [#1031]
.to_df
deprecation complete. @joocer - [#356] Prepositioning changes for extended types. @joocer
- [#1046] Updated sqlparser-rs to version 0.34.0 dependabot
- [#1017] Fuzzy matching for suggestions is punctuation insensitive @joocer
- [#1060] Conditional test execution made more explicit. @joocer
- [#1026] Timeout FireStore connection. @joocer
Added
- [#1034] Schemas added for the internal sample datasets. @joocer
- [#1038] Able to pass SqlAlchemy Engine to the SQL Connectors, allowing for more complex authentication scenarios. @joocer
- [#1065] Support integer division operator
DIV
. @joocer
[0.10.0] - 2023-05-03
Warnings
.to_df()
will be replaced with.pandas()
in version 0.11.
Fixed
- [#929] Improved error messages for malformed temporal clauses. @joocer
- [#735] (correction) Cursor
fetchone
andfetchmany
step over the record set. @joocer - [#994]
LIMIT
didn't prevent additional files from being read after limit was met. @joocer - [#996] Performance issues with
LIMIT
and serialization steps. @joocer - [#1008]
JOIN
on a literal fails when attempting to find good match. @joocer - [#1006] Errors handling filenames with multiple dots in the name. @joocer
- [#1010] Predicates not pushed for ZSTD compressed files. @joocer
- [#1007] Wildcards not interpretted correctly in some projection pushdowns @joocer
- [#1015] Column comparisons not working as expected in predicate pushdowns @joocer
Changed
- [#925] Updated sqlparser-rs to version 0.31.0 dependabot
- [#931] Cursor
fetch
no longer acceptasdict
parameter, instead each tuple (an orso Row) has anas_dict
method @joocer - [#906] Cursor extends an orso DataFrame, providing additional functionality @joocer
- [#938] Updated sqlparser-rs to version 0.32.0 dependabot
- [#940] CityHash moved to orso.cityhash @joocer
- [#942] Profiler (and distogram) moved to orso @joocer
- [#965] (MySQL Compatibility)
SHOW STORES
renamed toSHOW DATABASES
@joocer - [#973] Improved readability of Sort nodes in
EXPLAIN
queries @joocer - [#984] Updated sqlparser-rs to version 0.33.0 dependabot
- [#999] Improved error messages when using subscript functions @joocer
- [#1019] CircularLog renamed RollingLog @joocer
Added
- [#952] Implement statement-based permissions model @joocer
- [#951] Initial Support for Prepared Statements (
EXECUTE
queries) @joocer - [#958] Log of recent queries @joocer
- [#905] CLI includes REPL mode @joocer
- [#942] CLI has option to output to Markdown format @joocer
- [#967] Initial Information Schema capability @joocer
- [#978] Initial support for
USE
queries @joocer - [#989] REPL supports limited dot commands
.help
and.exit
@joocer - [#991] Added
SPLIT
function @joocer - [#969] New functions supporting Power BI integration @joocer
- [#999]
STRUCT
casting functions @joocer - [#1003] DuckDB compatibility tests @joocer
- [#1002] Limit function ignored @joocer
[0.9.3] - 2023-03-04
Fixed
- [#916] Profile error on morsel with all nulls in column @joocer
- Correctness of LRU-K algorithm @joocer
- [#917] Comparisons failed on very long and skinny tables @joocer
[0.9.2] - 2023-02-28
Fixed
- [#909] Divide by Zero error handling empty pages @joocer
- [#912] Literal expressioned which evaluate to a boolean were ignored @joocer
Changed
- [#901] Generate Series no longer accepts single numbers or IP ranges, provide explicit start or use
|
to test IP address containment @joocer - [#848] Collection and SQL Connectors dynamically size reads to fill target morsel size @joocer
[0.9.1] - 2023-02-23
Fixed
[0.9.0] - 2023-02-19
Fixed
- [#797] Name collisons with aliases cause issues in
ORDER BY
. @joocer - [#833] Unhelpful error when no statement is provided @joocer
- [#870] Repeated columns in
GROUP BY
not processed @joocer - [#873] 2 x CodeQL security issues @joocer
Changed
- [#799] Chunk large blob reads. @joocer
- [#812] Abstract the tree structure that plans are built from. @joocer
- [#808] Split Logical and Physical planning (partial). @joocer
- [#825] Remove HyperLogLog from profiling. @joocer
- [#750] More CLI improvements. @joocer
- [#589] Moved conditional imports out of program initialization @joocer
- [#836] Use PyArrow 11s exposure of underlying date values in profiler @joocer
- [#853] CaskDB replaces RocksDB as default KV store @joocer
- [#855] Caches have been renamed and separated from KV Stores to disencourage incorrect use; The Memcache Cache is now imported using
from opteryx.managers.cache import MemcachedCache
@joocer - [#857] Removed PyYAML install @joocer
- [#865] Replaced third-party
DATE_TRUNC
implementation with a first-party implementation @joocer - [#861] Replaced third-party
bitarray
library with a first-party implementation @joocer - [#871] Consistently name internal variables relating to chunks of data to 'morsel' (technically breaking, but no user impact expected) @joocer
- [#880] Minor performance improvements @joocer
Added
- [#801] New helper function
opteryx.query()
. @joocer - [#818] Save query plans to disk (partial). @joocer
- [#163] Initial support for SQL databases as a data source. @joocer
- [#844] Materialize results as a Polars dataframe. @joocer
- [#869] Introduce a SQL Fuzzer. @joocer
- [#877] Initial experimental implementation of internal KV database, HadroDB @joocer
[0.8.3] - 2023-01-10
Fixed
Changed
- [#789] Updated sqlparser-rs to version 0.30.0 dependabot
Added
- [#521] Query files directly. @joocer
- [#786] Save dataset as pandas DataFrame. @joocer
- [#787] Run queries against pandas DataFrames. @joocer
[0.8.2] - 2023-01-06
Fixed
- [#757] Multiple bugs in config manager. @joocer
- [#769]
ARRAY_AGG
couldn't be nested. @joocer - [#775] Connection function
.arrow()
materializes before applying limit. @joocer
Changed
- Internal refactoring relating to creation of metadata service. @joocer
- [#761] Updated sqlparser-rs to version 0.29.0 dependabot
Added
[0.8.1] - 2022-12-30
Fixed
[0.8.0] - 2022-12-27
Fixed
- [#703]
ORDER BY
columns not inSELECT
clause. @joocer - [#712] Aggregates on literals when combined with a
GROUP BY
clause. @joocer - [#710]
SEARCH
mishandles pages with empty values in first row. @joocer - [#711]
DATE_TRUNC
is case sensitive. @joocer
Changed
- [#707] First try to estimate unique values using the Distogram for
SHOW EXTENDED COLUMNS
. @joocer - [#707]
SHOW EXTENDED COLUMNS
creates histograms of 20 bins. @joocer - [#707] Distogram (data profiler) significant performance improvements. @joocer
- [#722] Allow temporal
FOR
after aliasAS
clauses. @joocer - [#743] 'Did you mean' prompt for columns better suggestions when casing is different. @joocer
Added
- [#515] Implement various new functions. @joocer
- [#19] Initial support for CTE expressions. @joocer
- [#204] Initial support predicate pushdowns. @joocer
- [#721] Improved temporal range error messages. @joocer
[0.7.0] - 2022-12-02
Fixed
- [#653]
LIKE
andFOR
clauses cannot coexist inSHOW
queries. @joocer - [#669]
COUNT(*)
cannot be mixed with other aggregates. @joocer - [#518]
SELECT *
andGROUP BY
can't be used together. @joocer - [#689]
IS
comparisons cannot be combined with other comparisons when optimization is off. @joocer
Changed
- [#662] Updated sqlparser-rs to version 0.27.0 dependabot
Added
- [#629] Optimizer pre-evaluates constant expressions. @joocer
- [#439] Support
SHOW STORES
. @joocer - [#542] Support
POSITION
. @joocer - [#22] Support
CASE
statements. @joocer - [#665] Partial support of
ARRAY_AGG
function. @joocer - [#668] Optimizer exchanges functions with constant results. @joocer
- [#300] Support advanced
TRIM
syntax. @joocer - [#570] Optimizer implements De Morgan's Law. @joocer
[0.6.0] - 2022-11-08
Fixed
- [#568] Unable to perform aggregates on literals. @joocer
- [#592] Dates not always handled correctly. @joocer
- [#600] Parameterization when used on query batches fails. @joocer
- [#580] Empty result sets have no column information. @joocer
- [#548] 'did you mean' message restored for dataset
WITH
hints. @joocer - [#640]
COUNT(*)
shortcut only used when in uppercase. @joocer - [#645] (correction)
null
values not handled correctly in comparisions. @joocer - Problem installing on M1 Mac. @joocer
- Support
AND
,OR
, andXOR
inSELECT
statement. @joocer - [#646] Temporal clauses in incorrect place were ignored @joocer
Changed
- [#566] Change from using SQLite3 to DuckDB for SQL comparision tests in Wrenchy-Bench. @joocer
- [#584] (clarity)
enable_page_management
configuration and parameter renamedenable_page_defragmentation
with some minor refactoring of approach to defragmentation. @joocer - (alignment)
TIMESTAMP
casting no longer supports casting from a number. @joocer - [#588] Integrate sqloxide into Opteryx to reduce lag with sqlparser-rs updates. @joocer
- [#619] Page defragmentation moved to an Operator and positioned by the Optimizer. @joocer
- (correction) cursor 'fetch*' methods return Python tuple, rather than Python lists. @joocer
Added
- [#533] Support
LIKE
onSHOW FUNCTIONS
, see sqlparser-rs/#620. @joocer - [#570] Query Optimizer rule to reduce steps in expression evaluation by partial elimination of negatives. @joocer
- [#129] Support
FOR
clauses for all datasets. @joocer - [#543] Support 'type string' notation for casting values. @joocer
- [#596] Optimizer replaces
ORDER BY
andLIMIT
plan steps with a single 'HeapSort' plan step. @joocer - [#515]
NULLIF
function. @joocer - [#581] New SQL Battery test that tests results, and initial set of tests. @joocer
- [#577] Hierarchical buffer pool and configuration. @joocer
[0.5.0] - 2022-10-02
Fixed
- [#528]
.shape()
and.count()
not working as expected. @joocer - Numbers expressed in the form
+n
not parsed correctly. @joocer
Changed
- (alignment)
.as_arrow()
renamed to.arrow()
to align to DuckDB naming. @joocer - (consistency)
SHOW COLUMNS
returns the column name in thename
column, previouslycolumn_name
@joocer - (correction) cursor 'fetch*' methods returns tuples rather than dictionaries as defaults, this is correcting a bug in PEP249 compatibility. @joocer
- [#517] (security) Placeholder changed from '%s' to '?'. @joocer
- [#522] Implementation of LRU-K(2) for cache evictions. @joocer
- [#537] Significant refactor of Query Planner. @joocer
Added
- [#397] Time Travel with '$planets' dataset. @joocer
- [#519] Introduce a size limit on
.as_arrow()
. @joocer - [#324] Support
IN UNNEST()
. @joocer - [#386] Support
SET
statements. @joocer - [#531] Support
SHOW VARIABLES
andSHOW PARAMETERS
. @joocer - [#464] Support
LEFT JOIN <relation> USING
@joocer - [#402]
INNER JOIN ON
supports multiple conditions @joocer - [#551] Document stores (MongoDb + FireStore) return '_id' column holding string version of document ID. @joocer
- [#532] Runtime parameters are able to be altered using the
SET
statement. @joocer - [#524] Query Optimizer - conjunctive predicate splitter. @joocer
[0.4.1] - 2022-09-12
Fixed
- Fixed missing
__init__
file. @joocer
[0.4.0] - 2022-09-12
Added
- [#366] Implement 'function not found' suggestions. @joocer
- [#443] Introduce a CLI. @joocer
- [#351] Support
SHOW FUNCTIONS
. @joocer - [#442] Various functions. @joocer
- [#483] Support
SHOW CREATE TABLE
. @joocer - [#375] Results to an Arrow Table. @joocer
- [#486] Support functions on aggregates and aggregates on functions. @joocer
- Initial support for
INTERVAL
s. @joocer - [#395] Support reading CSV files. @joocer
- [#498] CLI support writing CSV/JSONL/Parquet. @joocer
Changed
Fixed
- [#448]
VERSION()
failed and missing from regression suite. @joocer - [#404]
COALESCE
fails for NaN values. @joocer - [#453] PyArrow bug with long lists creating new columns. @joocer
- [#444] Very low cardinality
INNER JOINS
exceed memory allocation. @joocer - [#459] Functions lose some detail on non-first page. @joocer
- [#465] Pages aren't matched to schema for simple queries. @joocer
- [#468] Parquet reader shows some fields as "item". @joocer
- [#471] Column aliases not correctly applied when the relation has an alias. @joocer
- [#489] Intermittent behaviour on hash
JOIN
algorithm. @joocer
[0.3.0] - 2022-08-28
Added
- [#196] Partial implementation of projection pushdown (Parquet Only). @joocer
- [#41] Enable the results of functions to be used as parameters for other functions. @joocer
- [#42] Enable inline operations. @joocer
- [#330] Support
SIMILAR TO
alias for RegEx match. @joocer - [#331] Support
SAFE_CAST
alias forTRY_CAST
. @joocer - [#419] Various simple functions (
SIGN
,SQRT
,TITLE
,REVERSE
). @joocer - [#364] Support
SOUNDEX
function. @joocer - [#401] Support SHA-based hash algorithm functions. @joocer
Changed
- (alignment) Paths to storage adapters has been updated to reflect 'connector' terminology.
- (sensible defaults) Default behaviour changed from Mabel partitioning to no partitioning.
- (correction) - Use of aliases defined in the
SELECT
clause can no longer be used inWHERE
andGROUP BY
clauses - this is a correction to align to standard SQL behaviour. - (correction) - Use of 'None' as an alias for
null
is no longer supported - this is a correction to align to standard SQL behaviour. - [#326] Prefer pyarrow's 'promote' over manually handling missing fields. @joocer
- [#39] Rewrite Aggregation Node to use Pyarrow
group_by()
. @joocer - [#338] Remove Evaluation Node. @joocer
- [#58] Performance of
ORDER BY RAND()
improved. @joocer
Fixed
- [#334] All lists should be cast to lists of strings. (@joocer)
- [#382]
INNER JOIN
onUNNEST
relation. (@joocer) - [#320] Can't execute functions on results of
GROUP BY
. (@joocer) - [#399] Strings in double quotes aren't parsed. (@joocer)
[0.2.0] - 2022-07-31
Added
- [#232] Support
DATEPART
andEXTRACT
date functions. @joocer - [#63] Estimate row counts when reading blobs. (@joocer)
- [#231] Implement
DATEDIFF
function. (@joocer) - [#301] Optimizations for
IS
conditions. (@joocer) - [#229] Support
TIME_BUCKET
function. (@joocer)
Changed
- [#35] Table scan planning done during query planning. @joocer
- [#173] Data not found raises different errors under different scenarios. (@joocer)
- Implementation of
LEFT
andRIGHT
functions to reduce execution time. (@joocer) - [#258] Code release approach. (@joocer)
- [#295] Removed redundant projection when
SELECT *
. (@joocer) - [#297] Filters on
SHOW COLUMNS
execute before profiling. (@joocer)
Fixed
- [#252] Planner should gracefully convert byte strings to ascii strings. (@joocer)
- [#184] Schema changes cause unexpected and unhelpful failures. (@joocer)
- [#261] Read fails if buffer cache is unavailable. (@joocer)
- [#277] Cache errors should be transparent. (@joocer)
- [#285]
DISTINCT
on nulls throws error. (@joocer) - [#281]
SELECT
on empty aggregates reports missing columns. (@joocer) - [#312] Invalid dates in
FOR
clauses treated asTODAY
. (@joocer)
[0.1.0] - 2022-07-02
Added
- [#165] Support S3/MinIO data stores for blobs. (@joocer)
FAKE
dataset constructor (part of #179). (@joocer)- [#177] Support
SHOW FULL COLUMNS
to read entire datasets rather than just the first blob. (@joocer) - [#194] Functions that are abbreviations, should have the full name as an alias. (@joocer)
- [#201]
generate_series()
supports CIDR expansion. (@joocer) - [#175] Support
WITH (NO_CACHE)
hint to disable using cache. (@joocer) - [#203] When reporting that a column doesn't exist, it should suggest likely correct columns. (@joocer)
- 'Not' Regular Expression match operator,
!~
added to supported set of operators. (@joocer) - [#226] Implement
DATE_TRUNC
function. (@joocer) - [#230] Allow addressing fields as numbers. (@joocer)
- [#234] Implement
SEARCH
function. (@joocer) - [#237] Implement
COALESCE
function. (@joocer)
Changed
- Blob-based readers (disk & GCS) moved from 'local' and 'network' paths to a new 'blob' path. (@joocer)
- Query Execution rewritten. (@joocer)
- [#20] Split query planner and query plan into different modules. (@joocer)
- [#164] Split dataset reader into specific types. (@joocer)
- Expression evaluation short-cuts execution when executing evaluations against an array of
null
. (@joocer) - [#244] Improve performance of
IN
test against literal lists. (@joocer)
Fixed
- [#172]
LIKE
on non string column gives confusing error (@joocer) - [#179] Aggregate Node creates new metadata for each chunk (@joocer)
- [#183]
NOT
doesn't display in plan correctly (@joocer) - [#182] Unable to evaluate valid filters (@joocer)
- [#178]
SHOW COLUMNS
returns type OTHER when it can probably work out the type (@joocer) - [#128]
JOIN
fails, using PyArrow .join() (@joocer) - [#189] Explicit
JOIN
algorithm exceeds memory (@joocer) - [#199]
SHOW EXTENDED COLUMNS
blows memory allocations on large tables (@joocer) - [#169] Selection nodes in
EXPLAIN
have nested parentheses. (@joocer) - [#220]
LIKE
clause fails for columns that contain nulls. (@joocer) - [#222] Column of
NULL
detects asVARCHAR
. (@joocer) - [#225]
UNNEST
does not assign a type to the column when all of the values areNULL
. (@joocer)
[0.0.2] - 2022-06-03
Added
- [#72] Configuration is now read from
opteryx.yaml
rather than the environment. (@joocer) - [#139] Gather statistics on planning reading of segements. (@joocer)
- [#151] Implement
SELECT table.*
. (@joocer) - [#137]
GENERATE_SERIES
function. (@joocer)
Fixed
- [#106]
ORDER BY
on qualified fields fails (@joocer) - [#103]
ORDER BY
afterJOIN
errors (@joocer) - [#110] SubQueries
AS
statement ignored (@joocer) - [#112]
SHOW COLUMNS
doesn't work for non sample datasets (@joocer) - [#113] Sample data has "NaN" as a string, rather than the value
NaN
(@joocer) - [#111]
CROSS JOIN UNNEST
should return aNONE
when the list is empty (orNONE
) (@joocer) - [#119] 'NoneType' object is not iterable error on
UNNEST
(@joocer) - [#127] Reading from segments appears to only read the first segment (@joocer)
- [#132] Multiprocessing regressed Caching functionality (@joocer)
- [#140] Appears to have read both frames rather than the latest frame (@joocer)
- [#144] Multiple
JOINS
in one query aren't recognized (@joocer)
[0.0.1] - 2022-05-09
Added
- Additional statistics recording the time taken to scan partitions (@joocer)
- Support for
FULL JOIN
andRIGHT JOIN
(@joocer)
Changed
- Use PyArrow implementation for
INNER JOIN
andLEFT JOIN
(@joocer)
Fixed
- [#99] Grouping by a list gives an unhelpful error message (@joocer)
- [#100] Projection ignores field qualifications (@joocer)
[0.0.0]
- Initial Version