Integration Guides
Opteryx provides flexible integration options to query data across multiple sources. These guides will help you connect Opteryx to various databases, cloud storage systems, and data tools.
Getting Started
New to Opteryx? Start here:
- Quickstart Guide - Get Opteryx up and running in minutes
- Python API - Learn the Python API basics
- Using with Jupyter Notebooks - Interactive data analysis
Data Sources
Query data from various storage systems without moving or copying it.
Cloud Storage
- AWS S3 - Query Parquet, CSV, and JSONL files directly from Amazon S3
- Google Cloud Storage - Access data files in GCS buckets
- Apache Iceberg - Query Iceberg tables with catalog support
File Formats
- Execute SQL on CSV Files - Query CSV files directly without loading into a database
- Convert CSV to Parquet - Transform file formats using SQL
SQL Databases
Connect to relational databases and run federated queries across them.
- PostgreSQL - Connect to Postgres databases using SQLAlchemy
- MySQL - Query MySQL databases and join with other sources
- BigQuery - Access Google BigQuery tables
- DuckDB - Query DuckDB databases for analytics
- SQLite - Work with SQLite databases
Data Analysis Tools
Integrate with popular Python data analysis libraries.
DataFrames
- Pandas - Query Pandas DataFrames and output results as Pandas
- Polars - High-performance DataFrame integration with Polars
Notebooks
- Jupyter Notebooks - Interactive data exploration and visualization
Common Use Cases
Federated Queries
Combine data from multiple sources in a single query:
import opteryx
from opteryx.connectors import SqlConnector, AwsS3Connector
from sqlalchemy import create_engine
# Register S3 connector
opteryx.register_store("my-bucket", AwsS3Connector)
# Register PostgreSQL
postgres_engine = create_engine("postgresql+psycopg2://user:pass@host/")
opteryx.register_store("pg", SqlConnector, remove_prefix=True, engine=postgres_engine)
# Query across S3 and Postgres
result = opteryx.query("""
SELECT
s3_data.customer_id,
s3_data.purchase_amount,
pg_data.customer_name
FROM my-bucket.sales AS s3_data
JOIN pg.customers AS pg_data
ON s3_data.customer_id = pg_data.id
""")
Data Pipeline Integration
Use Opteryx in data pipelines:
import opteryx
from opteryx.connectors import GcpCloudStorageConnector
# Register GCS connector
opteryx.register_store("data", GcpCloudStorageConnector)
# Read from GCS
df = opteryx.query("""
SELECT
date,
product_id,
SUM(amount) as total_sales
FROM data.sales
WHERE date >= '2024-01-01'
GROUP BY date, product_id
""").pandas()
# Continue processing with your preferred tool
df.to_csv('processed_sales.csv', index=False)
Need Help?
- Discord Community - Get help from the community
- GitHub Issues - Report bugs or request features
- Documentation - Explore the full documentation
Contributing
Found an issue or want to contribute a guide? Check out our Contributing Guide.