Jupyter Notebooks

New here? Start with the Quickstart so you have a running Icelake before wiring up the notebook.

Data scientists don’t want to leave their notebook. This guide connects a VS Code Jupyter notebook to Icelake via the JupySQL magic, runs SQL straight against your S3-backed lakehouse, returns results as pandas DataFrames, and plots them with matplotlib.

What you’ll build

A single notebook cell pattern that takes you from raw data on S3 to a matplotlib chart — no leaving VS Code, no separate BI tool, no export/import dance.

Matplotlib line chart rendered from a JupySQL query result inside a VS Code notebook

Prerequisites

An Icelake account — sign up at app.icelake.eu or follow the Quickstart to get one.
Sample data in Icelake — upload the example-metrics.csv from CSV Upload so there’s a concrete dataset the notebook can query.
Python 3.10+ and VS Code with the Python and Jupyter extensions installed.

1. Create API credentials

Open the Icelake admin UI and go to Data In → API Keys → Create key. You’ll get a client ID and a secret starting with ilk_. The secret is shown exactly once — copy it now.

Admin UI showing a newly-created API key with the client ID and ilk_ secret revealed in a one-shot modal

Export them in your shell so the notebook can pick them up without hardcoding anything:

export ICELAKE_CLIENT_ID="your-client-id"
export ICELAKE_API_KEY="ilk_..."

Launch VS Code from the same shell so the variables carry into the Jupyter kernel.

2. Set up the notebook environment

Install the packages you need:

pip install jupysql sqlalchemy 'psycopg[binary]' pandas matplotlib

Why these:

JupySQL — the %sql and %%sql notebook magics. Actively maintained successor to ipython-sql.
SQLAlchemy + psycopg[binary] — the driver JupySQL uses to talk to Icelake’s pgwire endpoint (PostgreSQL v3 client, pre-compiled wheels, no toolchain required).
pandas + matplotlib — result DataFrames and plotting.

Create a new .ipynb file in VS Code (File → New File → Jupyter Notebook), pick your Python kernel, and run:

%load_ext sql

import os
ICELAKE_URL = (
  "postgresql+psycopg://"
  f"{os.environ['ICELAKE_CLIENT_ID']}:{os.environ['ICELAKE_API_KEY']}"
  "@sql.icelake.eu:5432/icelake?sslmode=require"
)
%sql $ICELAKE_URL

Tip: keep credentials out of version control by putting them in a .env file next to the notebook and loading with from dotenv import load_dotenv; load_dotenv() before reading os.environ. JupySQL’s connection string docs cover other secret-management patterns.

3. Query the sensor data

With the connection active, any cell prefixed with %%sql runs against Icelake and renders the result as a table.

%%sql
SELECT
timestamp,
temperature_celsius,
humidity_percent,
energy_kwh
FROM metrics
WHERE sensor_id = 'sensor-001'
ORDER BY timestamp
LIMIT 20;

VS Code Jupyter notebook showing a %%sql cell with its tabular result directly below

4. Plot with matplotlib

Capture the last query result into a pandas DataFrame with JupySQL’s << operator, then plot with matplotlib’s dataframe integration:

%%sql result <<
SELECT timestamp, temperature_celsius, humidity_percent
FROM metrics
WHERE sensor_id = 'sensor-001'
ORDER BY timestamp;

import matplotlib.pyplot as plt

df = result.DataFrame()
df['timestamp'] = df['timestamp'].astype('datetime64[ns]')

ax = df.plot(
  x='timestamp',
  y=['temperature_celsius', 'humidity_percent'],
  figsize=(10, 4),
  title='Sensor 001 — temperature and humidity',
)
ax.set_ylabel('value')
plt.tight_layout()
plt.show()

Working with pandas

Once you have a DataFrame, the rest is ordinary Python — filter, resample, merge with other notebooks’ data, export to Parquet, feed into scikit-learn. A couple of patterns:

# Resample to hourly averages
hourly = (
  df.set_index('timestamp')
    .resample('1H')
    .mean()
    .reset_index()
)

# Correlate temperature and energy production
%sql select corr(temperature_celsius, energy_kwh) as t_vs_energy \
from metrics where sensor_id = 'sensor-001';

Troubleshooting

Symptom	What to check
`could not connect to server` / `connection timed out`	Verify your network reaches `sql.icelake.eu:5432`. Corporate firewalls often block outbound 5432 — ask your network team to allowlist it.
`password authentication failed for user "…"`	Username is the client ID, password is the `ilk_…` secret — the same credentials Grafana, Prometheus, and every other Icelake endpoint use. Check `echo $ICELAKE_CLIENT_ID` isn’t empty.
`SSL connection required`	The managed endpoint enforces TLS — keep `?sslmode=require` on the URL.
`%sql` magic not recognised	Re-run `%load_ext sql` in the current kernel session. If you restarted VS Code, the kernel restarted too.
Slow query on a big scan	Narrow the `WHERE` clause by timestamp and sensor/metric name — Icelake partitions Parquet by date on S3, so bounded ranges read far fewer files.

Quickstart — end-to-end Icelake + Grafana setup in 5 minutes
CSV Upload — the example-metrics.csv this tutorial queries
Query Interfaces — Grafana, pgwire SQL, REST, and the Loki Query API
AI MasterMind — natural-language analytics over the same datasets