Cache compiled artifacts and models to speed up repeated runs

Implement a small caching layer for compiled kernels, downloaded model weights, and any expensive preprocessing artifacts, keyed by configuration and model version, to reduce wall‑clock time for repeated runs.

### Why
- Many workflows (especially templates and CI) repeat similar runs with only minor config changes.
- Recomputing or re-downloading heavy artifacts each time wastes GPU and wall-clock time.
- A lightweight cache improves iteration speed without changing any single-run semantics.

### What to do
- Define a cache directory convention (e.g., `~/.cache/disco-diffusion` or a configurable path) and basic versioning scheme.
- Cacheable items could include:
  - Downloaded model weights / checkpoints.
  - Compiled CUDA kernels or other JIT artifacts, where applicable.
  - Precomputed embeddings or other expensive preprocessing outputs that are safe to reuse.
- Add hooks in the existing run path to:
  - Check for cached assets before recomputing/downloading.
  - Populate the cache on first computation.
- Ensure there is a simple way to invalidate or clear the cache when upgrading models or code.

### Acceptance criteria
- Repeated runs with the same (or compatible) configuration reuse cached artifacts and are measurably faster.
- Cache behavior is transparent to users for single runs; correctness is unchanged.
- There is a documented way to clear or disable the cache if needed (e.g., env var or CLI flag).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache compiled artifacts and models to speed up repeated runs #12

Why

What to do

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cache compiled artifacts and models to speed up repeated runs #12

Description

Why

What to do

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions