Skip to content

Jupyter Notebooks

New here? Start with the Quickstart so you have a running Icelake before wiring up the notebook.

Data scientists don’t want to leave their notebook. This guide connects a VS Code Jupyter notebook to Icelake via the JupySQL magic, runs SQL straight against your S3-backed lakehouse, returns results as pandas DataFrames, and plots them with matplotlib.

A single notebook cell pattern that takes you from raw data on S3 to a matplotlib chart — no leaving VS Code, no separate BI tool, no export/import dance.

Matplotlib line chart rendered from a JupySQL query result inside a VS Code notebook

  • An Icelake account — sign up at app.icelake.eu or follow the Quickstart to get one.
  • Sample data in Icelake — upload the example-metrics.csv from CSV Upload so there’s a concrete dataset the notebook can query.
  • Python 3.10+ and VS Code with the Python and Jupyter extensions installed.

Open the Icelake admin UI and go to Data In → API Keys → Create key. You’ll get a client ID and a secret starting with ilk_. The secret is shown exactly once — copy it now.

Admin UI showing a newly-created API key with the client ID and ilk_ secret revealed in a one-shot modal

Export them in your shell so the notebook can pick them up without hardcoding anything:

shell
export ICELAKE_CLIENT_ID="your-client-id"
export ICELAKE_API_KEY="ilk_..."

Launch VS Code from the same shell so the variables carry into the Jupyter kernel.

Install the packages you need:

Install Python dependencies
pip install jupysql sqlalchemy 'psycopg[binary]' pandas matplotlib

Why these:

  • JupySQL — the %sql and %%sql notebook magics. Actively maintained successor to ipython-sql.
  • SQLAlchemy + psycopg[binary] — the driver JupySQL uses to talk to Icelake’s pgwire endpoint (PostgreSQL v3 client, pre-compiled wheels, no toolchain required).
  • pandas + matplotlib — result DataFrames and plotting.

Create a new .ipynb file in VS Code (File → New File → Jupyter Notebook), pick your Python kernel, and run:

Cell 1 — load JupySQL and connect
%load_ext sql
import os
ICELAKE_URL = (
"postgresql+psycopg://"
f"{os.environ['ICELAKE_CLIENT_ID']}:{os.environ['ICELAKE_API_KEY']}"
"@sql.icelake.eu:5432/icelake?sslmode=require"
)
%sql $ICELAKE_URL

Tip: keep credentials out of version control by putting them in a .env file next to the notebook and loading with from dotenv import load_dotenv; load_dotenv() before reading os.environ. JupySQL’s connection string docs cover other secret-management patterns.

With the connection active, any cell prefixed with %%sql runs against Icelake and renders the result as a table.

Cell 2 — query the uploaded CSV dataset
%%sql
SELECT
timestamp,
temperature_celsius,
humidity_percent,
energy_kwh
FROM metrics
WHERE sensor_id = 'sensor-001'
ORDER BY timestamp
LIMIT 20;

VS Code Jupyter notebook showing a %%sql cell with its tabular result directly below

Capture the last query result into a pandas DataFrame with JupySQL’s << operator, then plot with matplotlib’s dataframe integration:

Cell 3 — assign the result
%%sql result <<
SELECT timestamp, temperature_celsius, humidity_percent
FROM metrics
WHERE sensor_id = 'sensor-001'
ORDER BY timestamp;
Cell 4 — plot with matplotlib
import matplotlib.pyplot as plt
df = result.DataFrame()
df['timestamp'] = df['timestamp'].astype('datetime64[ns]')
ax = df.plot(
x='timestamp',
y=['temperature_celsius', 'humidity_percent'],
figsize=(10, 4),
title='Sensor 001 — temperature and humidity',
)
ax.set_ylabel('value')
plt.tight_layout()
plt.show()

Once you have a DataFrame, the rest is ordinary Python — filter, resample, merge with other notebooks’ data, export to Parquet, feed into scikit-learn. A couple of patterns:

Pandas follow-ups
# Resample to hourly averages
hourly = (
df.set_index('timestamp')
.resample('1H')
.mean()
.reset_index()
)
# Correlate temperature and energy production
%sql select corr(temperature_celsius, energy_kwh) as t_vs_energy \
from metrics where sensor_id = 'sensor-001';
SymptomWhat to check
could not connect to server / connection timed outVerify your network reaches sql.icelake.eu:5432. Corporate firewalls often block outbound 5432 — ask your network team to allowlist it.
password authentication failed for user "…"Username is the client ID, password is the ilk_… secret — the same credentials Grafana, Prometheus, and every other Icelake endpoint use. Check echo $ICELAKE_CLIENT_ID isn’t empty.
SSL connection requiredThe managed endpoint enforces TLS — keep ?sslmode=require on the URL.
%sql magic not recognisedRe-run %load_ext sql in the current kernel session. If you restarted VS Code, the kernel restarted too.
Slow query on a big scanNarrow the WHERE clause by timestamp and sensor/metric name — Icelake partitions Parquet by date on S3, so bounded ranges read far fewer files.
  • Quickstart — end-to-end Icelake + Grafana setup in 5 minutes
  • CSV Upload — the example-metrics.csv this tutorial queries
  • Query Interfaces — Grafana, pgwire SQL, REST, and the Loki Query API
  • AI MasterMind — natural-language analytics over the same datasets