Skip to content

feat: overhaul for relaunch#135

Open
gvwilson wants to merge 20 commits intomainfrom
gvwilson/overhaul-for-relaunch
Open

feat: overhaul for relaunch#135
gvwilson wants to merge 20 commits intomainfrom
gvwilson/overhaul-for-relaunch

Conversation

@gvwilson
Copy link
Copy Markdown
Collaborator

@gvwilson gvwilson commented Mar 11, 2026

  • Create requirements.txt with pinned dependencies for building site.
  • Update .gitignore.
  • Enhance Makefile.
  • Rename existing lessons to be lesson/dd_specific.py (two digits, lower case).
    • Notebooks with other names are not included in index page.
  • Add SQL tutorial.
    • Add scripts to regenerate SQLite databases.
  • Add queueing theory tutorial.
  • Rename scripts directory to bin.
  • Replace old build.py with:
    • bin/extract.py gets metadata from */index.md lesson pages.
    • bin/build.py builds root home page and lesson home pages.
    • bin/check_empty_cells.py: look for empty cells in notebooks (enhanced).
    • bin/check_missing_titles.py: look for notebooks without an H1 title.
    • bin/check_notebook_packages.py: check consistency of package versions within lesson.
  • Add make check_packages NOTEBOOKS="*/??_*.py" to check package consistency within lesson.
    • If NOTEBOOKS not specified, all notebooks are checked.
  • Add make check_exec NOTEBOOKS="*/??_*.py" to check notebook execution.
    • If NOTEBOOKS not specified, all notebooks are executed (slow).
  • Fix missing package imports in notebook headers.
  • Pin package versions in notebook headers.
  • Make content of lesson home pages uniform.
  • Update GitHub workflows to launch commands from Makefile.
    • Requires using uv in workflows.
  • Extract and modify CSS.
  • Putt SVG icons in includeable files in templates/icons/*.svg.
  • Make titles of notebooks more uniform.
  • Building pages/*.md using templates/page.html.
  • Add link checker.
    • Requires local server to be running, and takes 10 minutes or more to execute.
  • Fix multiple bugs in individual lessons.
    • Most introduced by package version pinning.
    • See notes below for outstanding issues.

Note: build marimo_learn package with utilities to localize SQLite database files.

To Do

Add disabled=True to prevent execution of deliberately buggy cells in script mode (?).

polars/03_loading_data.py

The code at line 497–499 calls lz.sink_csv(..., lazy=True). The lazy=True argument was added to return a lazy sink that could be passed to pl.collect_all() for parallel execution — rather than immediately writing the file. However, in polars 1.24.0, the lazy parameter was removed from sink_csv() (and likely sink_parquet(), sink_ndjson() too). The API for collecting multiple sinks in parallel has changed.

polars/03_loading_data.py, polars/05_reactive_plots.py, and polars/11_missing_data.py

These notebook use the hf:// protocol to stream a parquet file directly from Hugging Face:

URL = f"hf://datasets/{repo_id}@{branch}/{file_path}"

Polars is URL-encoding the slash in the repo name when it calls the HF API, which then rejects it as an invalid repo name. The fix is to download the file and store it locally, or make it available in some other location.

polars/07_querying_with_sql.py

Kagglehub requires Kaggle API credentials not available in the browser. Either remove the data-loading step or substitute a bundled sample dataset.

polars/14_user_defined_functions.py and polars/16_lazy_execution.py

Replace numba with a pure-Python alternative for the WASM version, or gate the numba cells with a WASM check and change prose accordingly:

import sys
if "pyodide" not in sys.modules:
    import numba

-   Create `requirements.txt` with pinned dependencies for building site.
-   Update .gitignore.
-   Enhance Makefile.
-   Rename existing lessons to be lesson/dd_specific.py (two digits, lower case).
    -   Notebooks with other names are not included in index page.
-   Add SQL tutorial.
    -   Add scripts to regenerate SQLite databases.
-   Add queueing theory tutorial.
-   Rename `scripts` directory to `bin`.
-   Replace old `build.py` with:
    -   `bin/extract.py` gets metadata from `*/index.md` lesson pages.
    -   `bin/build.py` builds root home page and lesson home pages.
    -   `bin/check_empty_cells.py`: look for empty cells in notebooks (enhanced).
    -   `bin/check_missing_titles.py`: look for notebooks without an H1 title.
    -   `bin/check_notebook_packages.py`: check consistency of package versions within lesson.
-   Add `make check_packages NOTEBOOKS="*/??_*.py"` to check package consistency within lesson.
    -   If `NOTEBOOKS` not specified, all notebooks are checked.
-   Add `make check_exec NOTEBOOKS="*/??_*.py"` to check notebook execution.
    -   If `NOTEBOOKS` not specified, all notebooks are executed (slow).
-   Fix missing package imports in notebook headers.
-   Pin package versions in notebook headers.
-   Make content of lesson home pages uniform.
-   Update GitHub workflows to launch commands from Makefile.
    -   Requires using `uv` in workflows.
-   Extract and modify CSS.
-   Putt SVG icons in includeable files in `templates/icons/*.svg`.
-   Make titles of notebooks more uniform.
-   Building `pages/*.md` using `templates/page.html`.
-   Add link checker.
    -   Requires local server to be running, and takes 10 minutes or more to execute.
-   Fix multiple bugs in individual lessons.
    -   Most introduced by package version pinning.
    -   See notes below for outstanding issues.

Note: build [`marimo_learn`](https://github.com/gvwilson/marimo_learn) package
with utilities to localize SQLite database files.

Add `disabled=True` to prevent execution of deliberately buggy cells in script mode (?).

The code at line 497–499 calls `lz.sink_csv(..., lazy=True)`. The
`lazy=True` argument was added to return a lazy sink that could be
passed to pl.collect_all() for parallel execution — rather than
immediately writing the file.  However, in polars 1.24.0, the lazy
parameter was removed from sink_csv() (and likely sink_parquet(),
sink_ndjson() too). The API for collecting multiple sinks in parallel
has changed.

These notebook use the `hf://` protocol to stream a parquet file
directly from Hugging Face:

```
URL = f"hf://datasets/{repo_id}@{branch}/{file_path}"
```

Polars is URL-encoding the slash in the repo name when it calls the HF
API, which then rejects it as an invalid repo name. The fix is to
download the file and store it locally, or make it available in some
other location.

Kagglehub requires Kaggle API credentials not available in the
browser.  Either remove the data-loading step or substitute a bundled
sample dataset.

Replace numba with a pure-Python alternative for the WASM version, or
gate the numba cells with a WASM check and change prose accordingly:

```
import sys
if "pyodide" not in sys.modules:
    import numba
```
@gvwilson gvwilson force-pushed the gvwilson/overhaul-for-relaunch branch from 20b1785 to c05ae44 Compare March 15, 2026 16:25
@gvwilson gvwilson marked this pull request as ready for review March 15, 2026 16:30
@gvwilson gvwilson force-pushed the gvwilson/overhaul-for-relaunch branch from 9bc73ca to 1291526 Compare March 17, 2026 12:56
@gvwilson gvwilson force-pushed the gvwilson/overhaul-for-relaunch branch from 1291526 to be8d543 Compare March 17, 2026 12:58
@gvwilson gvwilson force-pushed the gvwilson/overhaul-for-relaunch branch from be8d543 to 6c1216c Compare March 18, 2026 14:57
Copy link
Copy Markdown
Contributor

@Haleshot Haleshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvwilson, it's really nice to be reviewing a contribution of yours; left some comments and suggestions below; nits mostly (+ a recurring issue across the queueing series around the (unused) run button).

Haven't reviewed the SQL series yet since I still need to run the marimo_learn setup locally. Will follow up once I do. In the meantime, I came across this intro SQL notebook recently from the notebook.link folks (might be worth checking out).

Also some other libraries / things I wanted to mention: skrub has a nice TableReport that works well as a one-line data overview at the start of a SQL or EDA notebook.
lychee is worth adding as a GitHub Action; lot of URLs here (in this codebase) that'll drift over time (I set it up in another project and it helps track stale links & generates a report (based on the interval you set)). Happy to file a separate issue for this :)

The cross-filtering in the view composition nb is a nice addition!!

Reviewing PRs here reminded me of the work @etrotta and @peter-gy (and others) had done in this repo (some great high quality contribs)!! Peter's still doing great work on the marimo side too (the notebook gallery is shaping up well).

@app.cell(hide_code=True)
def _(mo):
mo.md(r"""
_Yes!_
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a cell below a chart lands on a conclusion -- "yes, " or something along those lines...a mo.callout makes that stand out in a (better) way imo. The callout docs show the available kind options ("info", "success", "warn", "danger").

On a similar note -- same goes for the attribution lines at the start of the notebooks... a mo.callout(mo.md("..."), kind="info") is something students will actually pause on.

Copy link
Copy Markdown
Contributor

@Haleshot Haleshot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got the SQL notebooks running and went through all of them (nice coverage of concepts for learners to progress / follow through).

Re: the widgets (I know we talked about these (and well general visual appeal too) when we met, so I might be a little biased here but they genuinely work!): each one ties into the concept being taught aptly (with nice variety across topics). Kept me hooked and pretty intuitive to use throughout. I don't think you need more; any more would shift focus away from the actual SQL contents. I could brainstorm on other widget types that might fit, but the current set already does a good job.

One small thing I'd suggest across the series: a short summary or key takeaways section at the end of each notebook. I did something similar in the probability series and it helps students get a recap (glimpse / TL;DR / "Here's what we covered", etc.). Even a few bullet points in a mo.callout(kind="info") would do it.

Also, would it make sense to have a mo.callout somewhere visible in each notebook (or at least the first one) pointing to marimo's SQL support? The SQL guide covers mo.sql(), reactive queries via f-strings, and the rich dataframe output, and students can run marimo tutorial sql locally for an interactive walkthrough. The UI does offer niceties like sorting/filtering/aggregating directly on the dataframe display, but in a learning context you'd want students writing queries rather than clicking through the UI -- still imo, it's worth knowing those tools exist for exploration outside the exercises.

I recommend following the pattern of hiding the first 1-2 cell blocks before the main header tag stating the title of the notebook for all notebooks in this repo.

The concept map progression across the series is a neat detail (esp. with each notebook's diagram highlighting the concept which is being explored).

gvwilson and others added 17 commits April 7, 2026 07:38
Co-authored-by: Srihari Thyagarajan <57552973+Haleshot@users.noreply.github.com>
Co-authored-by: Srihari Thyagarajan <57552973+Haleshot@users.noreply.github.com>
Co-authored-by: Srihari Thyagarajan <57552973+Haleshot@users.noreply.github.com>
Co-authored-by: Srihari Thyagarajan <57552973+Haleshot@users.noreply.github.com>
Co-authored-by: Srihari Thyagarajan <57552973+Haleshot@users.noreply.github.com>
Co-authored-by: Srihari Thyagarajan <57552973+Haleshot@users.noreply.github.com>
Co-authored-by: Srihari Thyagarajan <57552973+Haleshot@users.noreply.github.com>
Co-authored-by: Srihari Thyagarajan <57552973+Haleshot@users.noreply.github.com>
Co-authored-by: Srihari Thyagarajan <57552973+Haleshot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants