Converting CSV to Parquet using Opteryx
You can use Opteryx to convert CSV files to Parquet files from the command line.
This feature can also be used to convert between all of the file types Opteryx supports. Input file types include Parquet, Avro, JSONL, and ORC. Output file types include Parquet, JSONL, and CSV.
Command Line
To convert a CSV file to a Parquet file, use the following command:
Note
The terminal may require some special characters to be escaped.
Advanced Examples
Convert with Filtering
Apply transformations and filters during conversion:
$ python -m opteryx --o 'filtered_trips.parquet' "SELECT * FROM 'taxi_trips.csv' WHERE fare_amount > 10;"
Aggregate and Convert
Create aggregated output files:
$ python -m opteryx --o 'monthly_sales.parquet' "
SELECT
DATE_TRUNC('month', date) as month,
product_category,
SUM(amount) as total_sales
FROM 'sales_data.csv'
GROUP BY month, product_category
"
Convert Multiple File Formats
Convert JSONL to Parquet:
Convert Parquet to CSV:
Supported Conversions
Input Formats:
- Parquet (.parquet)
- CSV (.csv)
- Tab-delimited (.tsv)
- JSONL (.jsonl)
- JSONL Zstandard compressed (.zstd)
- Avro (.avro)
- ORC (.orc)
- Feather/Arrow (.arrow)
- Arrow IPC (.ipc)
- Excel (.xlsx)
- Apache Vortex (.vortex)
Output Formats:
- Parquet (.parquet) - Recommended for analytics
- JSONL (.jsonl) - Line-delimited JSON
- CSV (.csv) - Comma-separated values
Note
- ORC is not supported on Windows environments
- CSV and TSV support is limited and not recommended beyond trivial use cases
- Avro and Vortex are not recommended for performance-sensitive contexts
Performance Tips
- Use Parquet for large datasets - Parquet is columnar and compressed, making it ideal for analytics
- Apply filters during conversion - Reduce output file size by filtering unnecessary data
- Partition large outputs - For very large datasets, consider partitioning by date or category
Related Guides
- Execute SQL on CSV Files - Query CSV files without conversion
- Query AWS S3 - Convert files in cloud storage
- Using Opteryx with Jupyter - Interactive file processing
- Pandas Integration - Convert to DataFrames for analysis