Using Opteryx with Jupyter Notebooks

This guide demonstrates how to use Opteryx, a high-performance SQL query engine, with Jupyter Notebooks to analyze and visualize data interactively. Opteryx is designed to work seamlessly with large datasets and integrates directly with Python’s most popular data science libraries.

Installation

For this guide, we assume you have a working Jupyter environment. For assistance on setting up Jupyter, see Installing Jupyter.

Prebuilt Binder notebooks are also available for experimenting with Opteryx without any local setup.

To install Opteryx in your Jupyter environment, run the following command in a notebook cell. This ensures that Opteryx is installed in the same Python environment your notebook is using:

# Ensure that Opteryx is installed in the current Jupyter kernel
import sys
!{sys.executable} -m pip install opteryx

Getting Started

Here's how to execute a basic SQL query against an internal sample dataset and display the results directly in Jupyter:

import opteryx

sql_statement = """
SELECT *
  FROM $planets
"""
results = opteryx.query(sql_statement)

# Display results in a Table
results

Integration with Other Libraries

Opteryx supports seamless conversion of query results to formats suitable for further analysis or visualization:

Arrow: Ideal for high-performance data processing. results.arrow()
Pandas: Best for interactive data manipulation and analysis. results.pandas()
Polars: Use when working with larger data sets or requiring faster performance. results.polars()

Visualizing Data with Matplotlib

After querying data with Opteryx, you can easily visualize it.

Here's an example of creating a pie chart to show the distribution of missions by company using an internal dataset of missions to space:

import matplotlib.pyplot as plt
import opteryx

# Query to count missions per company
sql_statement = """
SELECT COUNT(*) as Missions, Company 
  FROM $missions
 GROUP BY Company
 ORDER BY Missions DESC;
"""
results = opteryx.query(sql_statement)

# Prepare data for the pie chart
missions = list(results["Missions"])
companies = list(results["Company"])
if len(missions) > 9:
    missions = missions[:9] + [sum(missions[9:])]
    companies = companies[:9] + ["Other"]

# Create a pie chart with an 'explode' effect for the largest segment
explode = [0.1 if i == 0 else 0 for i in range(len(missions))]
fig, ax = plt.subplots()
ax.pie(missions, labels=companies, autopct='%1.1f%%', explode=explode, startangle=90, counterclock=False)
ax.set_title('Missions Per Company')
plt.show()

Next Steps

This guide has introduced you to the basics of using Opteryx with Jupyter Notebooks.

For a deeper dive into Opteryx, including detailed documentation on advanced features, visit the documentation at Opteryx.dev. The site provides extensive resources and documentation to support your data projects.

Happy querying!