Executing SQL Queries on CSV Files with Opteryx
You can use Opteryx to execute SQL queries on CSV files. Simply replace the table name in a query with the location of the CSV file.
This feature can also be used to query all of the file types Opteryx supports, including Parquet, Avro, JSONL, and ORC.
Querying CSV files is supported in both the CLI and Python APIs.
Command Line
To execute an SQL query on a CSV file using the CLI, use the following command:
Note
The terminal may require some special characters to be escaped.
Python
To execute an SQL query on a CSV file using the Python API, use the following code:
The full power of SQL is available, including the ability to join CSV files with other files.
Advanced Examples
Filtering and Aggregation
import opteryx
# Aggregate data from CSV
result = opteryx.query("""
SELECT
pickup_borough,
COUNT(*) as trip_count,
AVG(fare_amount) as avg_fare
FROM 'taxi_trips.csv'
WHERE trip_distance > 5
GROUP BY pickup_borough
ORDER BY trip_count DESC
""")
print(result)
Joining Multiple CSV Files
import opteryx
# Join customer data with orders
result = opteryx.query("""
SELECT
c.customer_name,
o.order_id,
o.order_total
FROM 'customers.csv' AS c
JOIN 'orders.csv' AS o
ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01'
""")
print(result.head())
Querying Multiple Files
You can query multiple CSV files by running separate queries or by combining them with UNION:
import opteryx
# Combine multiple CSV files with UNION
result = opteryx.query("""
SELECT * FROM 'sales_january.csv'
UNION ALL
SELECT * FROM 'sales_february.csv'
UNION ALL
SELECT * FROM 'sales_march.csv'
""")
Supported File Formats
In addition to CSV, Opteryx supports:
- Parquet (.parquet) - Columnar format, best for large datasets
- JSONL (.jsonl) - JSON Lines format
- Avro (.avro) - Binary format with schema
- ORC (.orc) - Optimized Row Columnar format
Related Guides
- Convert CSV to Parquet - Transform file formats for better performance
- Query AWS S3 - Query CSV files in cloud storage
- Query Google Cloud Storage - Access CSV files in GCS
- Pandas Integration - Convert results to Pandas DataFrames