Changelog
All notable changes to this project will be documented in this file, where appropriate the GitHub issue reference will be noted along with the change. Breaking changes will be clearly indicated with the icon.
The format is based on Keep a Changelog.
[0.26.1] - 2025-11-05
Removed
- Python 3.10 regression tests, builds will continue but not recommended for use in production.
GETandSEARCHmarked as deprecated to be removed in or after version 0.27.
Fixed
- [#2786]
EXCEPTandORDER BYcannot coexist (@joocer) - [#2781] Duplicate Aggregate References Cause KeyError (@tigerquoll, @joocer)
- [#2797] [Iceberg] Empty tables have no statistics to read (@joocer)
- [#2805] Read more rows to guess SQL Schema (@joocer)
- [#2829] Support schemas in table names for Postgres (@joocer)
- [#2849] Return column names as per
SELECTclause (@joocer) - [#2877] [ClickBench] Various changes to address performance degradation (@joocer)
Changed
- [#2804] [rugo] Replace parquet metadata reader (@joocer)
- [#2806] Fix file descriptor leak and improve buffer handling in connectors and decoders (@Copilot)
- [#2809] [sqlparser-rs] Version bump and planner changes (@joocer)
- [#2814] Consolidate list_ops Cython files into single file with auto-generation (@joocer, @Copilot)
- [#2816] Optimize initialization performance (@joocer, @Copilot)
- [#2825] Improve interactions with external DBMSs (@joocer)
- Numerous housekeeping changes to tidy the structure of the code and repository (@joocer,@Copilot)
Added
- [#2771] Add
@>>asARRAY_CONTAINS_ALLoperator (@joocer) - [#2769] Rewrite visibility filters to reduce complexity (@joocer)
- [#2836] Add
REPLACEandINITCAPfunctions (@joocer) - [#2832] Experimental support for free-threaded Python (@joocer)
- [#2808] Support wildcards when querying local filesystem (@Copilot)
- [#2807] Support protocol prefixes for paths (gs://, s3://) (@Copilot)
- [#2851] Bespoke JSONL decoder with projection pushdown support (@joocer, @Copilot)
- [#2885] Predicate compaction strategy (@joocer, @Copilot)
- [#2892] Vendor RE2 (@joocer, @Copilot)
[0.25.0] - 2025-09-04
Removed
INTmarked as deprecated to be removed in or after version 0.27.- Python 3.10 will be deprecated in version 0.26.
Fixed
- [#2754] Ensure
INSTRreceives arrow arrays (@joocer) - [#2756] Improper encoding of UTF-8 strings (@joocer)
- [#2761] JOIN internals using both INT64 and UINT64 (@joocer)
Changed
- [#2713] [ClickBench] CLI cycles reporting to 3dp (@joocer)
- [#2720] Refactor timestamp parsing (@joocer)
- [#2745] Remove pybase64 dependency (@joocer)
Added
- [#2718] Use Ryu to
CASTDOUBLEStoVARCHARandBLOB(@joocer) - [#2722] Use fast_float to
CASTDOUBLESfromVARCHARandBLOB(@joocer) - [#2726] Custom routines to
CASTINTEGERtoVARCHARandBLOB(@joocer) - [#2729] JSON lines reader doesn't need to parse lines to get line count for
COUNT(*)(@joocer) - [#2730] Custom routines to
CASTINTEGERfromVARCHARandBLOB(@joocer) - [#2240] [Optimizer] Initial cost-based optimization strategy (@joocer)
- [#2736] Support Apache Vortex files (@joocer)
[0.24.1] - 2025-07-29
Fixed
- [#2707] [Security] Fix potential Buffer Overflow (@joocer)
- [#2709] [ClickBench] Benchmark script failing (@joocer)
[0.24.0] - 2025-07-28
Removed
- Deprecated functions removed:
LIST_CONTAINS,STR,STRING,FLOAT,TRY_NUMERIC,TRY_STRING,TRY_STRUCT,LEN
Fixed
- [#2649]
UNIXTIMEfunction using incorrect method to convert (@joocer) - [#2651]
current_timereturns a timestamp (@joocer) - [#2661]
cache_oversizestatistic over reporting (@joocer) - [#2657] Unable to override default connector (@joocer)
- [#2654] Typo in deprecation warning (@joocer)
Added
- [#2651] Add
current_timestampfunction (@joocer) - [#2671] Support qualified wildcards mixed with explicit columns in
SELECT(@joocer) - [#2679] Purge blob from caches on read error (@joocer)
- [#2687] Cache parquet file statistics and use for read pruning (@joocer)
- [#2692] Support predicate pushdowns when reading from S3 (@joocer)
Changed
- [#2663] Introduce latches on the memory pool to enable safe zero copy reads (@joocer)
- [#2659] Code review of GCS Connector (@joocer)
- [#2679] Purge blob from bufferpool and remote cache on read error (@joocer)
- [#2682] Code review following performance profiling (@joocer)
- [#2683] Update vendored pysimdjson (@joocer)
[0.23.0] - 2025-06-29
Removed
Added
Fixed
- [#2592] Cannot
GROUP BYempty sets (@joocer) - [#2611] [CI] Refactor version incrementor as pre-commit (@joocer)
- [#2614] Manually replace values to avoid multidimensional array confusion in numpy (@joocer)
- [#2636] numpy
objectdtype definition incorrect inCROSS JOIN(@joocer)
Changed
- [#2588] Pass through error messages from functions (@joocer)
- [#2594] [Optimizer] Combine
ORchains ofANYequals conditions (@joocer) - [#2481] [Optimizer] Rewrite
STARTS_WITHandENDS_WITHtoLIKEstatements (@joocer) - [#2604] Log the post-optimized plan to the statistics (@joocer)
- [#2607] Unqualified
SEMIandANTIJOINs are interpreted asLEFT(@joocer) - [#2609] Filter joins (
SEMIandANTI) internals rewritten (@joocer) - [#2616] Refactor nested loop join to use pyarrow buffers (@joocer)
- [#2619] Refactor hash join variation of
INNER JOINand internal bloom filter to use pyarrow buffers (@joocer) - [#2627] Arrow native
CROSS JOIN UNNESTwith pushed filters (@joocer) - [#2445]
LEFT JOINrewrite (@joocer)
[0.22.0] - 2025-05-20
Removed
- Deprecated function aliases removed
NUMERIC,LIST_CONTAINS_ANY,LIST_CONTAINS_ALL, useDOUBLE,ARRAY_CONTAINS_ANY,ARRAY_CONTAINS_ALLinstead.STRUCTremoved as no longer required. LIST_CONTAINS,STR,STRING,FLOAT,TRY_NUMERIC,TRY_STRING,TRY_STRUCT,LENmarked as deprecated to be removed in or after version 0.24.
Fixed
- [#2547]
INSTRoptimization incorrectly applied in some cases (@joocer) - [#2554] Explicit column ordering (@joocer)
- [#2545] Decimal scale and precision ignored in
CAST(@joocer) - [#2556] STRUCT ARRAYS not correctly converted to JSON ARRAYs (@joocer)
- [#2563]
ANY/ALLOps fail with single row datasets (@joocer) - [#2565]
SHOW COLUMNSdoesn't show extended type information (@joocer)
Changed
[0.21.0] - 2025-03-30
Removed
- Deprecated function aliases removed
CEILING,ABSOLUTE,MAXIMUM,MINIMUM, use shorter formCEIL,ABS,MAX,MINinstead. INNER JOIN UNNESTremoved from this release
Added
- [#2401] [Optimizer] Rewrite aggregations to reduce elementwise calculations (@joocer)
- [#2410] NumPy v2 support (@joocer)
- [#2406] [Security] Row visibility filters accept boolean literals as identifiers (@joocer)
- [#2418] Support
EXPLAIN ANALYZE FORMAT MERMAID(@joocer) - [#2463] [Optimizer] Smallest table to left in
INNER JOIN(@joocer) - [#2469] [Optimizer] Additional check when folding constants (@joocer)
- [#2384] Support
GROUP BY ALL(@joocer) - [#2374] Support underscores in numeric literals (@joocer)
- [#2468] Combine multiple
LIKEs toRLIKE(@joocer) - [#2520] Combine multiple
=s toIN(@joocer) - [#1352] Basic CTE expressions (@joocer)
Fixed
- [#2259] High number of calls to Aggregate Operators (@joocer)
- [#2517]
IFNULLassumes data types (@joocer) - [#1821]
CONCAT/CONCAT_WSfunctional correction (@joocer) - [#2525] Intermittent
CROSS JOIN UNNESTerror (@joocer) - [#2521] Intermittent
LIKEtype error (@joocer) - [#2524]
UNIONfield order incorrect (@joocer) - [#2523] Filters pushed into
UNIONincorrectly (@joocer) - [#1217] Subqueries don't prevent relation name collisions (@joocer)
- [#2537] Intermittent errors with aggregations (@joocer)
- [#2533]
UNIONfield order incorrect (@joocer)
Changed
- [#2405] Refactor of IntBuffer for performance (@joocer)
- [#2412] Improve
INSTRperformance (@joocer) - [#2420] Tidy up C++ code (@joocer)
- [#2439] [Optimizer] Function rewrites (
COALESCEtoIFNULL) (@joocer) - [#2456] Avoid use of Numpy in
DISTINCT(@joocer) - [#2453] Avoid use of Numpy in BloomFilter (@joocer)
- [#2457]
COUNT(DISTINCT )refactor (@joocer) - [#2476] Specialized
CROSS JOIN UNNESTfor string type (@joocer) - [#2479] Reduce instances of LITERAL expansion (@joocer)
- [#2495] Remove use of CityHash, replace with xxHash (@joocer)
- [#2495]
HASHfunction uses xxHash (previously CityHash) (@joocer) - [#2490] Replace regular expression engine for
REGEX_REPLACE. (@joocer)
[0.20.0] - 2025-02-13
Added
- [#2213] Specialized buffer for collecting Integers (@joocer)
- [#2185] [Iceberg] Initial support for Iceberg catalogs (@joocer)
- [#2209] Initial support for Excel (xlsx) files (@joocer)
- [#2223] [Clickbench] Avoid creating tables for simple
COUNT(*)queries (@joocer) - [#2228] [Iceberg] Push
LIMITto Iceberg (@joocer) - [#2215] Create connector capability for column statistics (@joocer)
- [#2234] Capture column and relation statistics (@joocer)
- [#2241] [Optimizer] Initial implementation of correlated filtering (@joocer)
- [#2238] Create row-estimates for multi-file datasets (@joocer)
- [#2253] [CI] Ubuntu ARM included in CI test-suite (@joocer)
- [#2271] Specialized simple aggregators (@joocer)
- [#2266] Prefilter
INNER JOINusing a bloom filter (@joocer) - [#2292] [Fuzzer] Introduce new fuzzer - same query on different Connectors (@joocer)
- [#2303] [Iceberg] Column statistics for Iceberg (@joocer)
- [#2297] [Optimizer] Optimization killer questions to avoid execution (@joocer)
- [#2332] Introduce
rstring prefix to represent raw strings (@joocer) - [#2216] [Iceberg] Push predicates to Iceberg (@joocer)
- [#2330] Add
HUMANIZEfunction (@joocer) - [#2293] [Optimizer] Rewrite
CASEstatements toIFNULLwhere possible (@joocer) - [#2179] Python 3.13 builds (@joocer)
- [#2391] Vendor pysimdjson (@joocer)
- [#2357] Prefer nested loop join for small relations (@joocer)
- [#2372] Add support from
SELECT * EXCEPT(@joocer)
Fixed
- [#1954] JSON Accessors rewritten to support
literal = document->keyform (@joocer) - [#2190] [Clickbench] Resolve failing queries (partial) (@joocer)
- [#2167] [CI] Use freezegun to reduce flaky tests (@joocer)
- [#2247] Non-Existent SQL tables returned incorrect error (@joocer)
- [#2231] Visibility Filters don't accept array literals (@joocer)
- [#2300] Visibility Filters don't restrict when no filter provided (@joocer)
- [#2299] [Fuzzer] Unable to sort by
DECIMALcolumns which containNULLvalues (@joocer) - [#2302] [Fuzzer]
IS TRUEisn't handled consistently by different connectors (@joocer) - [#2340] [Fuzzer]
!=withNULLisn't handled correctly by all connectors (@joocer) - [#2343] [Fuzzer]
IS TRUEisn't handled consistently by different connectors (@joocer)
Changed
- [#2197] [Clickbench] Rewritten local file access routines (@joocer)
- [#1453] Compiled code restructure (@joocer)
- [#2205] Prefer Abseil containers (@joocer)
- [#2202] [Clickbench] Allow local reads to use pyarrow multithreading (@joocer)
- [#2205] [CI] Prefer uv as package manager (@joocer)
- [#2220] Remove steps from
MATCH() AGAINST()(@joocer) - [#2233] Bypass OS cache for disk access (@joocer)
- [#2248] [Optimizer] Specialized operator for
LIKE '%x%'conditions (@joocer) - [#2252] Streamline
DATE_TRUNCfunction (@joocer) - [#2251] [Optimizer] Specialized operator for
ILIKE '%x%'conditions (@joocer) - [#2279] Performance improvements to bloom filter (@joocer)
- [#2312]
DISTINCTfunctions with prehashing don't rehash when adding to HashSet (@joocer) - [#2346] Compiled function for
INset containment testing (@joocer) - [#2361] Split
list_opsto function per file (@joocer) - [#2356] [Parser] Create an Opteryx dialect for sqlparser-rs (@joocer)
- [#2376] [Parser] Support hyphens in identifier names (@joocer)
- [#2327] Updated sqlparser-rs to version 0.54.0 dependabot
[0.19.0] - 2025-01-02
Added
- [#2073] Support for JSON path syntax
$.key, for@?operator (@joocer) - [#1701] Initial implementation of push down for
LIMIT(@joocer) - [#2074] Able to chain
->and->>json accessors (@joocer) - [#2025] Use
EXTRACTandSUBSTRINGwith temporal clauses (@joocer) - [#2105] Support
LIKE ANYandILIKE ANY(@joocer) - [#2111]
x LIKE '%'written tox IS NOT NULL(@joocer) - [#2133] Add support for ValKey cache (@joocer)
- [#1866] Rewrite aggregations on constants to literal values (@joocer)
- [#2159] Added
IFNOTNULLandPASSTHRUfunctions (@joocer)
Fixed
- [#2085] LEFT/RIGHT side in a
LEFT JOINoccasionally swapped. (@joocer) - [#2082] Buffer Pool eviction puts reader into an invalid state (@joocer)
- [#2091] Command Line interface fails with unhelpful errors (@joocer)
- [#2113] Improve reliability of
NULLIFfunction (@joocer) - [#2128] GCS limited to 1000 files (@joocer)
- [#2134] Nested JSON in NDJSON/JSONL incorrectly normalized (@joocer)
- [#2144]
NULLIFdoesn't warnVARCHARandBLOBwon't match (@joocer) - [#2151] Slow aggregations on calculations (@joocer)
- [#2177] Unable to determine type of nested identifiers (@joocer)
- [#2159] Constant-folding wasn't always null-aware (@joocer)
- [#2180] Parentheses in
ORDER BYandGROUP BYclauses weren't handled (@joocer) - [#2181] Complex boolean functions weren't always null-aware (@joocer)
Changed
- [#2129] Rewritten
LEFT ANTI JOINoperator (@joocer) - [#2149]
ANTIandSEMIjoins moved to their own operator (@joocer) - [#2161] Minor performance improvements to
LEVENSHTIENfunction (@joocer) - [#2161] Empty morsels are only pushed through the plan if all morsels are empty (@joocer)
- [#2163] Prefer 64bit indexes in Cython (@joocer)
- [#2142] Accept negative scaling factors on
CEILandFLOORfunctions (@joocer)
[0.18.0] - 2024-10-15
Added
- [#1993] ClickBench regression suite (@joocer)
- [#2003] Push Equals filters to Firestore (@joocer)
- [#2008] Push Not Equals filters to Firestore (@joocer)
- [#2000] Initial
SHOW CREATE VIEWsupport (@joocer) - [#2040] Additional items in
$statisticstable (@joocer) - [#1984] Reduce columns exposed by subqueries if not used out outer queries (@joocer)
- [#2043] Statistics include when optimizations have been triggered (@joocer)
- [#1899] Support
@>forARRAYcontainment (@joocer)
Fixed
- [#1981]
nullhandling by functions. (@joocer) - [#2002] IP containment fails on nulls (@joocer)
- [#1994] ClickBench test failures (@joocer)
- [#2017]
OFFSETfails when zero records (@joocer) - [#2022] Better memory management for
CROSS JOIN UNNEST(@joocer) - [#2039] Over committing to buffer pool (@joocer)
- [#2034] Improvements to
COUNT(*)for parquet files (@joocer) - [#2029] Heap Sort sorts data twice in some situations (@joocer)
- [#2050]
RANDOM_STRINGincorrectly invoked (@joocer) - [#2051]
CASEstatements failed onnullvalues (@joocer)
Changed
- [#1990] Flag
LIST_CONTAINS_ANYandLIST_CONTAINS_ALLas Deprecated (@joocer) - [#2013] MemoryPool based on Python's multiprocessing buffer for storage (@joocer)
- [#2021] Error raised when subqueries expose columns with duplicate names. (@joocer)
- [#2066] Reduce materialization of
ARRAYcolumns. (@joocer)
[0.17.0] - 2024-09-05
Added
- [#1813]
ASCII,CHAR,LPADandRPADfunctions (@joocer) - [#998] Initial
HTTPfunction relation (@joocer) - [#1838]
JSONB_OBJECT_KEYSfunction (@joocer) - [#1871] Basic
JOINfuzzer (@joocer) - [#1889]
$statisticsvirtual dataset, initially reporting bufferpool statistics (@joocer) - [#1904] Optional scale attribute to
CIELandFLOORfunctions (@joocer) - [#1926] Add support for Pipe Separated Value (psv) data files (@joocer)
- [#1931] TPC-H regression suite (@joocer)
- [#1944] Boolean expression elimination optimizer strategy (@joocer)
Fixed
- [#1837] Unhelpful error on
CROSS JOIN UNNEST. (@joocer) - [#1848] Unary (
IS) operators fail on zero row tables. (@joocer) - [#1849] Cannot use
UNIONandFORclauses together. (@joocer) - [#1850] Cannot
IFNULLanARRAYcolumn. (@joocer) - [#1854] Implicit
CROSS JOINusing function datasets fails. (@joocer) - [#1857] Cannot use json accessors in conditions. (@joocer)
- [#1861]
TRY_CASTdoesn't safely fail on empty strings (@joocer) - [#1865]
COUNT(*)and date filters pushed to Parquet reader (@joocer) - [#1875]
IFNULLerrors in some circumstances (@joocer) - [#1867] Reliability of JSON Accessors (@joocer)
- [#1878]
COUNT(*)optimization restored (@joocer) - [#1880]
!=incorrectly pushed intoCROSS JOIN UNNEST(@joocer) - [#1887] Incorrectly pushed filters to left side of
LEFT OUTER JOIN(@joocer) - [#1918] JSON Accessors broken by parser update (@joocer)
- [#1935]
BLOBcast double encodes binary data (@joocer) - [#1955]
ANY/ALLops fail onnullvalues (@joocer) - [#1952] JSON Accessors are not comparisons (@joocer)
- [#1958] Inconsistent handling of
nullinINNER JOIN(@joocer) - [#1977] Unable to perform arithmetic on result of
GENERATE_SERIES(@joocer)
Changed
- [#1877] Updated sqlparser-rs to version 0.49.0 (@joocer)
- [#1891]
DISTINCTperformance improvements (@joocer) - [#1906] Improve performance of AVRO files (@joocer)
- [#1923] Improve performance of
DISTINCT(@joocer) - [#1939] Improve performance of
JOINs (@joocer) - [#1960] Remove evaluation context (@joocer)
- [#1962] Predicate rewriter stand alone strategy (@joocer)
- [#1963] Constant Folding should traverse expression tree (@joocer)
[0.16.0] - 2024-07-20
Added
- [#1802] Table-level permissions (@joocer)
- [#1803] Push date filters to Parquet reader (@joocer)
- [#1784] Support for Tarchia catalog (@joocer)
Fixed
[0.15.9] - 2024-06-29
Added
- [#1778] Specialized
INNER JOINoperator forINTEGERSandVARCHARS(@joocer) - [#1768] Date calculation predicate rewriter (optimizer) (@joocer)
Fixed
[0.15.8] - 2024-06-29
Added
Fixed
[0.15.5] - 2024-06-18
Changed
[0.15.4] - 2024-06-17
Added
- [#1746] Support legacy Mabel LZMA compressed JSONL files (@joocer)
- [#1748] S3/MinIO Connector supports async reads (@joocer)
Changed
- [#1714] [ClickBench]
GROUP BYliterals (@joocer) - [#1696] Prevent predicates being pushed past limits (@joocer)
- [#1753] Error on
SELECT TOPsyntax (@joocer)
Fixed
[0.15.3] - 2024-06-06
Changed
- [#1715] More optimizations
AND,ORandXORaware (@joocer) - [#1717] Memcached notified when resource read from local buffers (@joocer)
[0.15.1] - 2024-05-31
Added
- [#1685] [ClickBench] Include
BLOBas column types for predicate pushdowns (@joocer) - [#1697] [ClickBench] Optimizer removes redundant operators (@joocer)
- [#1698] [ClickBench] Replace
LIKEconditions withSTARTS_WITHandENDS_WITHfunctions (@joocer) - [#1581]
ANYoperator supports literal lists (@joocer) - [#1690] Support
REGEXP_REPLACEfunction (@joocer)
[0.15.0] - 2024-05-26
Added
- [#723] Initial
MATCH AGAINSTsupport. (@joocer) - [#346] Permissions Model. (@joocer)
- [#1582] Lemmatize full text searches (@joocer)
- [#1584] Additional Statistics (@joocer)
- [#1590] Push filters into sub queries (@joocer)
- [#1588] Push filters into
UNNEST(@joocer) - [#1613] Parallelize reads of GCS storage (@joocer)
- [#1652]
BLOBandVARCHARvalues can be compared (@joocer) - [#1666] Heap Sort fused operator (
LIMITandORDER BYimplemented) (@joocer) - [#1665] Smart buffer size allocations (@joocer)
- [#1676] b-prefixed blob literals (@joocer)
- [#1586] Initial VIEWs functionality (@joocer)
Changed
- [#1550] Additional Statistics (@joocer)
- [#731] Buffer Pool Statistics (@joocer)
- [#1604] Internal 'Node' object performance (@joocer)
- [#1643] Mabel Planning Improvements (@joocer)
Fixed
- [#1587] Filtering on
CROSS JOIN UNNESTcolumns pushed too far. (@joocer) - [#1592] Prevent
RANDOMbeing evaluated once in optimizer. (@joocer) - [#1598] Buffer Pool inefficiencies. (@joocer)
- [#1620] Failure to set in Memcache shouldn't be fatal. (@joocer)
- [#1622] Incorrect handling of nulls in JOIN conditions. (@joocer)
- [#1664] Memory Pool stats not correct (@joocer)
- [#1674] Errors with large chunks (@joocer)
[0.14.1] - 2024-04-13
Added
Fixed
[0.14.0] - 2024-04-07
Added
- [#1460] Support PyArrow's IPC formatted files. (@joocer)
- [#1464]
LIKEwith no wildcards rewritten as=(@joocer) - [#1468] Python 3.12 support (@joocer)
- [#1459] Primitive support for
EXECUTEqueries (@joocer) - [#1479] Unary (
IS TRUE) operators pushed to SQL sources (@joocer) - [#540] Initial support for JSON Accessors
->and->>. (@joocer) - [#1499] Error when
EXECUTE USINGsyntax is attempted. (@joocer) - [#1491] New test dataset
$missions. (@joocer) - [#1300] Support named Parameters. (@joocer)
- [#1516] Day of Week placeholders for temporal queries. (@joocer)
- [#1563] Add support for Cassandra-based sources (@joocer)
Changed
- [#1448] Improved error message on
GROUP BYerrors. (@joocer) - [#1447] Levenshtein implementation rewritten. (@joocer)
- [#1451] Improved IP containment testing performance. (@joocer)
- [#1410] + [#190] Rewrite
INNER JOIN(@joocer) - [#1481] Improvements to
DATE+/-INTERVALperformance. (@joocer) - [#1486] Implement bespoke
INTERVALoperators (@joocer) - [#1535] Native implementation of
LEFT JOIN(@joocer) - [#1547] Buffer Pool implemented using a Memory Pool (@joocer)
Fixed
- [#1462] Unhandled exception on empty datasets. (@joocer)
- [#1465] Clashing column names not aliased correctly. (@joocer)
- [#1445]
SHOW EXTENDED COLUMNSnot working as expected - changes made to data profile format. (@joocer) - [#1473]
INNER JOINstatistics not correct. (@joocer) - [#1474] Unhelpful error message when
SELECT *is mixed with column references (@joocer) - [#1487] Filters are not applied to scans on specific conditions involving
JOINs (@joocer) - [#1480]
INTERVALcannot be compared to durations (@joocer) - [#1485] Improved handling of
COUNT(*)when pushing to parquet and SQL (@joocer) - [#1513]
LIST_CONTAINS_ANYperformance (@joocer)
[0.13.3] - 2024-02-13
Added
Fixed
- [#1413] Regression test failures on Windows. (@joocer)
- [#1320] Errors due to binding memoization. (@joocer)
- [#1433] Bandit tests misconfigured. (@joocer)
Changes
- [#1421] Improve
CROSS JOIN UNNESTperformance. (@joocer) - [#1410] Whilst working on
JOINimprovements, improvements toDISTINCTmade. (@joocer) - Refactored SQL Fuzzer regression test script. (@joocer)
[0.13.0] - 2024-02-03
Added
Fixed
- [#1402] Force consistent blob read order. (@joocer)
- [#1391] Rolling Log doesn't truncate records. (@joocer)
- [#1395] Incorrect error raised on type errors on
JOINs. (@joocer) - [#1397] Subscript
SPLITresults. (@joocer) - [#1379] Unable to run on ARM. (@joocer)
- [#1183]
JOINon literals fails. (@joocer)
Changed
- [#1374] GCS access improvements (up to 2.5x faster IO). (@joocer)
- [#1393] Performance Tuning blob reading. (@joocer)
- [#1411] Updated sqlparser-rs to version 0.43.1 dependabot
[0.12.2] - 2024-01-10
Added
- [#1355] Shortcut
ORevaluations (@joocer) - [#1363] Shortcut nested
ANDevaluations (@joocer) - [#21] Support
UNIONstatements (@joocer) - [#1354] New Optimization: Constant Expression Evaluations (@joocer)
Fixed
Changed
[0.12.0] - 2024-01-02
Fixed
- [#1080] Windows regression test failures. (@joocer)
- Soundex incorrectly evaluated empty strings as '0000'. (@joocer)
Changed
- [#1083] Simplify the handling of Query Statistics. (@joocer)
- [#1086] Update Pythonize to v0.19 and py03 to v0.20.0 dependabot
- [#1042] Create a generic base connector and use for all access (@joocer)
- [#1032] Introduce a query binder (@joocer)
- [#1158] Resync sqloxide (@joocer)
Added
- [#1117] Cockroach Labs regression tests (@joocer)
- [#1128] Specialized handlers for
IS NOT TRUEandIS NOT FALSE(@joocer) DATES SINCEtemporal filter syntax added. (@joocer)- [#1145] Debug Logging (@joocer)
- [#1156] Bitwise operators and Hex literals (@joocer)
- [#1141] BigQuery regression tests and documentation (@joocer)
- [#1171] Support
NATURAL JOINsyntax (@joocer) - [#1171] Support
SEMIandANTIjoin syntax (@joocer) - [#1219] Extended
FAKEsyntax (@joocer) - [#1219]
DISTINCT ONsyntax added (@joocer) - [#1339] Updated sqlparser-rs to version 0.41.0 dependabot
- [#1329] Add Redis as remote read cache option. (@joocer)
- [#1337] Support
RLIKE(@joocer) - [#1344] Initial Support for
ANYandALLarray containment syntax (@joocer)
Removed
- Python 3.8 is no longer supported. (@joocer)
[0.11.0] - 2023-06-16
Fixed
- [#1069] Minor improvements identified during code review of code to generate numeric series. (@joocer)
- [#1072] Minor improvements identified during code review of code to handle dates and intervals. (@joocer)
- [#1026] Removed pin to version 0.11 of PyArrow dependabot
- [#1077] Removed pin to version 0.7.1 of DuckDB dependabot
Changed
- [#808] Rewrite of AST to Logical plan. (@joocer)
- [#1031]
.to_dfdeprecation complete. (@joocer) - [#356] Prepositioning changes for extended types. (@joocer)
- [#1046] Updated sqlparser-rs to version 0.34.0 dependabot
- [#1017] Fuzzy matching for suggestions is punctuation insensitive (@joocer)
- [#1060] Conditional test execution made more explicit. (@joocer)
- [#1026] Timeout FireStore connection. (@joocer)
Added
- [#1034] Schemas added for the internal sample datasets. (@joocer)
- [#1038] Able to pass SqlAlchemy Engine to the SQL Connectors, allowing for more complex authentication scenarios. (@joocer)
- [#1065] Support integer division operator
DIV. (@joocer)
[0.10.0] - 2023-05-03
Warnings
.to_df()will be replaced with.pandas()in version 0.11.
Fixed
- [#929] Improved error messages for malformed temporal clauses. (@joocer)
- [#735] (correction) Cursor
fetchoneandfetchmanystep over the record set. (@joocer) - [#994]
LIMITdidn't prevent additional files from being read after limit was met. (@joocer) - [#996] Performance issues with
LIMITand serialization steps. (@joocer) - [#1008]
JOINon a literal fails when attempting to find good match. (@joocer) - [#1006] Errors handling filenames with multiple dots in the name. (@joocer)
- [#1010] Predicates not pushed for ZSTD compressed files. (@joocer)
- [#1007] Wildcards not interpreted correctly in some projection pushdowns (@joocer)
- [#1015] Column comparisons not working as expected in predicate pushdowns (@joocer)
Changed
- [#925] Updated sqlparser-rs to version 0.31.0 dependabot
- [#931] Cursor
fetchno longer acceptasdictparameter, instead each tuple (an orso Row) has anas_dictmethod (@joocer) - [#906] Cursor extends an orso DataFrame, providing additional functionality (@joocer)
- [#938] Updated sqlparser-rs to version 0.32.0 dependabot
- [#940] CityHash moved to orso.cityhash (@joocer)
- [#942] Profiler (and distogram) moved to orso (@joocer)
- [#965] (MySQL Compatibility)
SHOW STORESrenamed toSHOW DATABASES(@joocer) - [#973] Improved readability of Sort nodes in
EXPLAINqueries (@joocer) - [#984] Updated sqlparser-rs to version 0.33.0 dependabot
- [#999] Improved error messages when using subscript functions (@joocer)
- [#1019] CircularLog renamed RollingLog (@joocer)
Added
- [#952] Implement statement-based permissions model (@joocer)
- [#951] Initial Support for Prepared Statements (
EXECUTEqueries) (@joocer) - [#958] Log of recent queries (@joocer)
- [#905] CLI includes REPL mode (@joocer)
- [#942] CLI has option to output to Markdown format (@joocer)
- [#967] Initial Information Schema capability (@joocer)
- [#978] Initial support for
USEqueries (@joocer) - [#989] REPL supports limited dot commands
.helpand.exit(@joocer) - [#991] Added
SPLITfunction (@joocer) - [#969] New functions supporting Power BI integration (@joocer)
- [#999]
STRUCTcasting functions (@joocer) - [#1003] DuckDB compatibility tests (@joocer)
- [#1002] Limit function ignored (@joocer)
[0.9.3] - 2023-03-04
Fixed
- [#916] Profile error on morsel with all nulls in column (@joocer)
- Correctness of LRU-K algorithm (@joocer)
- [#917] Comparisons failed on very long and skinny tables (@joocer)
[0.9.2] - 2023-02-28
Fixed
- [#909] Divide by Zero error handling empty pages (@joocer)
- [#912] Literal expressioned which evaluate to a boolean were ignored (@joocer)
Changed
- [#901] Generate Series no longer accepts single numbers or IP ranges, provide explicit start or use
|to test IP address containment (@joocer) - [#848] Collection and SQL Connectors dynamically size reads to fill target morsel size (@joocer)
[0.9.1] - 2023-02-23
Fixed
[0.9.0] - 2023-02-19
Fixed
- [#797] Name collisons with aliases cause issues in
ORDER BY. (@joocer) - [#833] Unhelpful error when no statement is provided (@joocer)
- [#870] Repeated columns in
GROUP BYnot processed (@joocer) - [#873] 2 x CodeQL security issues (@joocer)
Changed
- [#799] Chunk large blob reads. (@joocer)
- [#812] Abstract the tree structure that plans are built from. (@joocer)
- [#808] Split Logical and Physical planning (partial). (@joocer)
- [#825] Remove HyperLogLog from profiling. (@joocer)
- [#750] More CLI improvements. (@joocer)
- [#589] Moved conditional imports out of program initialization (@joocer)
- [#836] Use PyArrow 11s exposure of underlying date values in profiler (@joocer)
- [#853] CaskDB replaces RocksDB as default KV store (@joocer)
- [#855] Caches have been renamed and separated from KV Stores to discourage incorrect use; The Memcache Cache is now imported using
from opteryx.managers.cache import MemcachedCache(@joocer) - [#857] Removed PyYAML install (@joocer)
- [#865] Replaced third-party
DATE_TRUNCimplementation with a first-party implementation (@joocer) - [#861] Replaced third-party
bitarraylibrary with a first-party implementation (@joocer) - [#871] Consistently name internal variables relating to chunks of data to 'morsel' (technically breaking, but no user impact expected) (@joocer)
- [#880] Minor performance improvements (@joocer)
Added
- [#801] New helper function
opteryx.query(). (@joocer) - [#818] Save query plans to disk (partial). (@joocer)
- [#163] Initial support for SQL databases as a data source. (@joocer)
- [#844] Materialize results as a Polars dataframe. (@joocer)
- [#869] Introduce a SQL Fuzzer. (@joocer)
- [#877] Initial experimental implementation of internal KV database, HadroDB (@joocer)
[0.8.3] - 2023-01-10
Fixed
Changed
- [#789] Updated sqlparser-rs to version 0.30.0 dependabot
Added
- [#521] Query files directly. (@joocer)
- [#786] Save dataset as pandas DataFrame. (@joocer)
- [#787] Run queries against pandas DataFrames. (@joocer)
[0.8.2] - 2023-01-06
Fixed
- [#757] Multiple bugs in config manager. (@joocer)
- [#769]
ARRAY_AGGcouldn't be nested. (@joocer) - [#775] Connection function
.arrow()materializes before applying limit. (@joocer)
Changed
- Internal refactoring relating to creation of metadata service. (@joocer)
- [#761] Updated sqlparser-rs to version 0.29.0 dependabot
Added
[0.8.1] - 2022-12-30
Fixed
[0.8.0] - 2022-12-27
Fixed
- [#703]
ORDER BYcolumns not inSELECTclause. (@joocer) - [#712] Aggregates on literals when combined with a
GROUP BYclause. (@joocer) - [#710]
SEARCHmishandles pages with empty values in first row. (@joocer) - [#711]
DATE_TRUNCis case sensitive. (@joocer)
Changed
- [#707] First try to estimate unique values using the Distogram for
SHOW EXTENDED COLUMNS. (@joocer) - [#707]
SHOW EXTENDED COLUMNScreates histograms of 20 bins. (@joocer) - [#707] Distogram (data profiler) significant performance improvements. (@joocer)
- [#722] Allow temporal
FORafter aliasASclauses. (@joocer) - [#743] 'Did you mean' prompt for columns better suggestions when casing is different. (@joocer)
Added
- [#515] Implement various new functions. (@joocer)
- [#19] Initial support for CTE expressions. (@joocer)
- [#204] Initial support predicate pushdowns. (@joocer)
- [#721] Improved temporal range error messages. (@joocer)
[0.7.0] - 2022-12-02
Fixed
- [#653]
LIKEandFORclauses cannot coexist inSHOWqueries. (@joocer) - [#669]
COUNT(*)cannot be mixed with other aggregates. (@joocer) - [#518]
SELECT *andGROUP BYcan't be used together. (@joocer) - [#689]
IScomparisons cannot be combined with other comparisons when optimization is off. (@joocer)
Changed
- [#662] Updated sqlparser-rs to version 0.27.0 dependabot
Added
- [#629] Optimizer pre-evaluates constant expressions. (@joocer)
- [#439] Support
SHOW STORES. (@joocer) - [#542] Support
POSITION. (@joocer) - [#22] Support
CASEstatements. (@joocer) - [#665] Partial support of
ARRAY_AGGfunction. (@joocer) - [#668] Optimizer exchanges functions with constant results. (@joocer)
- [#300] Support advanced
TRIMsyntax. (@joocer) - [#570] Optimizer implements De Morgan's Law. (@joocer)
[0.6.0] - 2022-11-08
Fixed
- [#568] Unable to perform aggregates on literals. (@joocer)
- [#592] Dates not always handled correctly. (@joocer)
- [#600] Parameterization when used on query batches fails. (@joocer)
- [#580] Empty result sets have no column information. (@joocer)
- [#548] 'did you mean' message restored for dataset
WITHhints. (@joocer) - [#640]
COUNT(*)shortcut only used when in uppercase. (@joocer) - [#645] (correction)
nullvalues not handled correctly in comparisions. (@joocer) - Problem installing on M1 Mac. (@joocer)
- Support
AND,OR, andXORinSELECTstatement. (@joocer) - [#646] Temporal clauses in incorrect place were ignored (@joocer)
Changed
- [#566] Change from using SQLite3 to DuckDB for SQL comparision tests in Wrenchy-Bench. (@joocer)
- [#584] (clarity)
enable_page_managementconfiguration and parameter renamedenable_page_defragmentationwith some minor refactoring of approach to defragmentation. (@joocer) - (alignment)
TIMESTAMPcasting no longer supports casting from a number. (@joocer) - [#588] Integrate sqloxide into Opteryx to reduce lag with sqlparser-rs updates. (@joocer)
- [#619] Page defragmentation moved to an Operator and positioned by the Optimizer. (@joocer)
- (correction) cursor 'fetch*' methods return Python tuple, rather than Python lists. (@joocer)
Added
- [#533] Support
LIKEonSHOW FUNCTIONS, see sqlparser-rs/#620. (@joocer) - [#570] Query Optimizer rule to reduce steps in expression evaluation by partial elimination of negatives. (@joocer)
- [#129] Support
FORclauses for all datasets. (@joocer) - [#543] Support 'type string' notation for casting values. (@joocer)
- [#596] Optimizer replaces
ORDER BYandLIMITplan steps with a single 'HeapSort' plan step. (@joocer) - [#515]
NULLIFfunction. (@joocer) - [#581] New SQL Battery test that tests results, and initial set of tests. (@joocer)
- [#577] Hierarchical buffer pool and configuration. (@joocer)
[0.5.0] - 2022-10-02
Fixed
- [#528]
.shape()and.count()not working as expected. (@joocer) - Numbers expressed in the form
+nnot parsed correctly. (@joocer)
Changed
- (alignment)
.as_arrow()renamed to.arrow()to align to DuckDB naming. (@joocer) - (consistency)
SHOW COLUMNSreturns the column name in thenamecolumn, previouslycolumn_name(@joocer) - (correction) cursor 'fetch*' methods returns tuples rather than dictionaries as defaults, this is correcting a bug in PEP249 compatibility. (@joocer)
- [#517] (security) Placeholder changed from '%s' to '?'. (@joocer)
- [#522] Implementation of LRU-K(2) for cache evictions. (@joocer)
- [#537] Significant refactor of Query Planner. (@joocer)
Added
- [#397] Time Travel with '$planets' dataset. (@joocer)
- [#519] Introduce a size limit on
.as_arrow(). (@joocer) - [#324] Support
IN UNNEST(). (@joocer) - [#386] Support
SETstatements. (@joocer) - [#531] Support
SHOW VARIABLESandSHOW PARAMETERS. (@joocer) - [#464] Support
LEFT JOIN <relation> USING(@joocer) - [#402]
INNER JOIN ONsupports multiple conditions (@joocer) - [#551] Document stores (MongoDb + FireStore) return '_id' column holding string version of document ID. (@joocer)
- [#532] Runtime parameters are able to be altered using the
SETstatement. (@joocer) - [#524] Query Optimizer - conjunctive predicate splitter. (@joocer)
[0.4.1] - 2022-09-12
Fixed
- Fixed missing
__init__file. (@joocer)
[0.4.0] - 2022-09-12
Added
- [#366] Implement 'function not found' suggestions. (@joocer)
- [#443] Introduce a CLI. (@joocer)
- [#351] Support
SHOW FUNCTIONS. (@joocer) - [#442] Various functions. (@joocer)
- [#483] Support
SHOW CREATE TABLE. (@joocer) - [#375] Results to an Arrow Table. (@joocer)
- [#486] Support functions on aggregates and aggregates on functions. (@joocer)
- Initial support for
INTERVALs. (@joocer) - [#395] Support reading CSV files. (@joocer)
- [#498] CLI support writing CSV/JSONL/Parquet. (@joocer)
Changed
Fixed
- [#448]
VERSION()failed and missing from regression suite. (@joocer) - [#404]
COALESCEfails for NaN values. (@joocer) - [#453] PyArrow bug with long lists creating new columns. (@joocer)
- [#444] Very low cardinality
INNER JOINSexceed memory allocation. (@joocer) - [#459] Functions lose some detail on non-first page. (@joocer)
- [#465] Pages aren't matched to schema for simple queries. (@joocer)
- [#468] Parquet reader shows some fields as "item". (@joocer)
- [#471] Column aliases not correctly applied when the relation has an alias. (@joocer)
- [#489] Intermittent behaviour on hash
JOINalgorithm. (@joocer)
[0.3.0] - 2022-08-28
Added
- [#196] Partial implementation of projection pushdown (Parquet Only). (@joocer)
- [#41] Enable the results of functions to be used as parameters for other functions. (@joocer)
- [#42] Enable inline operations. (@joocer)
- [#330] Support
SIMILAR TOalias for RegEx match. (@joocer) - [#331] Support
SAFE_CASTalias forTRY_CAST. (@joocer) - [#419] Various simple functions (
SIGN,SQRT,TITLE,REVERSE). (@joocer) - [#364] Support
SOUNDEXfunction. (@joocer) - [#401] Support SHA-based hash algorithm functions. (@joocer)
Changed
- (alignment) Paths to storage adapters has been updated to reflect 'connector' terminology.
- (sensible defaults) Default behaviour changed from Mabel partitioning to no partitioning.
- (correction) - Use of aliases defined in the
SELECTclause can no longer be used inWHEREandGROUP BYclauses - this is a correction to align to standard SQL behaviour. - (correction) - Use of 'None' as an alias for
nullis no longer supported - this is a correction to align to standard SQL behaviour. - [#326] Prefer pyarrow's 'promote' over manually handling missing fields. (@joocer)
- [#39] Rewrite Aggregation Node to use Pyarrow
group_by(). (@joocer) - [#338] Remove Evaluation Node. (@joocer)
- [#58] Performance of
ORDER BY RAND()improved. (@joocer)
Fixed
- [#334] All lists should be cast to lists of strings. (@joocer)
- [#382]
INNER JOINonUNNESTrelation. (@joocer) - [#320] Can't execute functions on results of
GROUP BY. (@joocer) - [#399] Strings in double quotes aren't parsed. (@joocer)
[0.2.0] - 2022-07-31
Added
- [#232] Support
DATEPARTandEXTRACTdate functions. (@joocer) - [#63] Estimate row counts when reading blobs. (@joocer)
- [#231] Implement
DATEDIFFfunction. (@joocer) - [#301] Optimizations for
ISconditions. (@joocer) - [#229] Support
TIME_BUCKETfunction. (@joocer)
Changed
- [#35] Table scan planning done during query planning. (@joocer)
- [#173] Data not found raises different errors under different scenarios. (@joocer)
- Implementation of
LEFTandRIGHTfunctions to reduce execution time. (@joocer) - [#258] Code release approach. (@joocer)
- [#295] Removed redundant projection when
SELECT *. (@joocer) - [#297] Filters on
SHOW COLUMNSexecute before profiling. (@joocer)
Fixed
- [#252] Planner should gracefully convert byte strings to ascii strings. (@joocer)
- [#184] Schema changes cause unexpected and unhelpful failures. (@joocer)
- [#261] Read fails if buffer cache is unavailable. (@joocer)
- [#277] Cache errors should be transparent. (@joocer)
- [#285]
DISTINCTon nulls throws error. (@joocer) - [#281]
SELECTon empty aggregates reports missing columns. (@joocer) - [#312] Invalid dates in
FORclauses treated asTODAY. (@joocer)
[0.1.0] - 2022-07-02
Added
- [#165] Support S3/MinIO data stores for blobs. (@joocer)
FAKEdataset constructor (part of #179). (@joocer)- [#177] Support
SHOW FULL COLUMNSto read entire datasets rather than just the first blob. (@joocer) - [#194] Functions that are abbreviations, should have the full name as an alias. (@joocer)
- [#201]
generate_series()supports CIDR expansion. (@joocer) - [#175] Support
WITH (NO_CACHE)hint to disable using cache. (@joocer) - [#203] When reporting that a column doesn't exist, it should suggest likely correct columns. (@joocer)
- 'Not' Regular Expression match operator,
!~added to supported set of operators. (@joocer) - [#226] Implement
DATE_TRUNCfunction. (@joocer) - [#230] Allow addressing fields as numbers. (@joocer)
- [#234] Implement
SEARCHfunction. (@joocer) - [#237] Implement
COALESCEfunction. (@joocer)
Changed
- Blob-based readers (disk & GCS) moved from 'local' and 'network' paths to a new 'blob' path. (@joocer)
- Query Execution rewritten. (@joocer)
- [#20] Split query planner and query plan into different modules. (@joocer)
- [#164] Split dataset reader into specific types. (@joocer)
- Expression evaluation short-cuts execution when executing evaluations against an array of
null. (@joocer) - [#244] Improve performance of
INtest against literal lists. (@joocer)
Fixed
- [#172]
LIKEon non string column gives confusing error (@joocer) - [#179] Aggregate Node creates new metadata for each chunk (@joocer)
- [#183]
NOTdoesn't display in plan correctly (@joocer) - [#182] Unable to evaluate valid filters (@joocer)
- [#178]
SHOW COLUMNSreturns type OTHER when it can probably work out the type (@joocer) - [#128]
JOINfails, using PyArrow .join() (@joocer) - [#189] Explicit
JOINalgorithm exceeds memory (@joocer) - [#199]
SHOW EXTENDED COLUMNSblows memory allocations on large tables (@joocer) - [#169] Selection nodes in
EXPLAINhave nested parentheses. (@joocer) - [#220]
LIKEclause fails for columns that contain nulls. (@joocer) - [#222] Column of
NULLdetects asVARCHAR. (@joocer) - [#225]
UNNESTdoes not assign a type to the column when all of the values areNULL. (@joocer)
[0.0.2] - 2022-06-03
Added
- [#72] Configuration is now read from
opteryx.yamlrather than the environment. (@joocer) - [#139] Gather statistics on planning reading of segments. (@joocer)
- [#151] Implement
SELECT table.*. (@joocer) - [#137]
GENERATE_SERIESfunction. (@joocer)
Fixed
- [#106]
ORDER BYon qualified fields fails (@joocer) - [#103]
ORDER BYafterJOINerrors (@joocer) - [#110] SubQueries
ASstatement ignored (@joocer) - [#112]
SHOW COLUMNSdoesn't work for non sample datasets (@joocer) - [#113] Sample data has "NaN" as a string, rather than the value
NaN(@joocer) - [#111]
CROSS JOIN UNNESTshould return aNONEwhen the list is empty (orNONE) (@joocer) - [#119] 'NoneType' object is not iterable error on
UNNEST(@joocer) - [#127] Reading from segments appears to only read the first segment (@joocer)
- [#132] Multiprocessing regressed Caching functionality (@joocer)
- [#140] Appears to have read both frames rather than the latest frame (@joocer)
- [#144] Multiple
JOINSin one query aren't recognized (@joocer)
[0.0.1] - 2022-05-09
Added
- Additional statistics recording the time taken to scan partitions (@joocer)
- Support for
FULL JOINandRIGHT JOIN(@joocer)
Changed
- Use PyArrow implementation for
INNER JOINandLEFT JOIN(@joocer)
Fixed
- [#99] Grouping by a list gives an unhelpful error message (@joocer)
- [#100] Projection ignores field qualifications (@joocer)
[0.0.0]
- Initial Version