Fail fast when collector schedule YAML is missing or invalid in production by leostar0412 · Pull Request #227 · cppalliance/boost-data-collector

leostar0412 · 2026-05-21T18:06:13Z

Summary

Strict schedule loading: When DEBUG=False or BOOST_COLLECTOR_SCHEDULE_STRICT=True, missing or invalid config/boost_collector_schedule.yaml raises ScheduleConfigurationError during settings import so Celery Beat cannot start with an empty CELERY_BEAT_SCHEDULE. Local dev with DEBUG=True and strict off still logs a warning and uses {}.
Startup diagnostics: boost_collector_runner logs a schedule summary at app ready (path, group/task counts, Beat entry keys) and exposes BOOST_COLLECTOR_SCHEDULE_STARTUP_OK for checks.
CLI and Celery: run_scheduled_collectors --strict validates the YAML before resolving tasks (useful in CI when DEBUG is True). Celery task forwards strict=True. --stop-on-failure now logs WARNING for each skipped collector with the failed predecessor and reason.

Motivation

Previously, a missing or broken schedule file could leave Beat running with no entries and no hard failure. Production and staging should fail at startup instead of silently scheduling nothing.

Configuration

Mode	When
Strict (fail)	`DEBUG=False` (default in prod), or `BOOST_COLLECTOR_SCHEDULE_STRICT=true`
Lenient (warn + empty beat)	`DEBUG=True` and `BOOST_COLLECTOR_SCHEDULE_STRICT` unset/false

Documented in .env.example, README.md, docs/Workflow.md, and boost_collector_runner/README.md.

Test plan

pytest boost_collector_runner/tests/test_schedule_config.py (strict / non-strict get_beat_schedule)
pytest boost_collector_runner/tests/test_commands.py (--strict on missing YAML)
pytest boost_collector_runner/tests/test_tasks.py (Celery passes --strict)
With valid committed YAML: python manage.py shell -c "from django.conf import settings; print(len(settings.CELERY_BEAT_SCHEDULE))" → > 0
Rename/move schedule YAML locally with DEBUG=False → Django startup raises ScheduleConfigurationError
python manage.py run_scheduled_collectors --schedule daily --strict with missing YAML → non-zero exit / clear error

Close #215

Summary by CodeRabbit

New Features
- Added a --strict flag to validate the schedule YAML before resolving/running collectors.
- New BOOST_COLLECTOR_SCHEDULE_STRICT env var to enforce strict schedule validation at startup (prevents Beat startup if missing/invalid).
- Run behavior now logs skipped collectors (with failed predecessor and reason) when --stop-on-failure stops a run.
- Scheduled-run task can be invoked with strict validation; startup now emits a brief loaded-schedule summary and startup-ok flag.
Documentation
- Updated README(s) and docs to explain strict vs non-strict startup behavior, CLI flags and logging.
Tests
- Added/updated tests covering strict-mode validation and stop-on-failure logging.

… validation

coderabbitai · 2026-05-21T18:06:20Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b111b6f0-6ea5-4c2f-9c57-a142cc48c6d3

📥 Commits

Reviewing files that changed from the base of the PR and between 3676d0b and d63fa60.

📒 Files selected for processing (12)

.env.example
README.md
boost_collector_runner/README.md
boost_collector_runner/apps.py
boost_collector_runner/management/commands/run_scheduled_collectors.py
boost_collector_runner/schedule_config.py
boost_collector_runner/tasks.py
boost_collector_runner/tests/test_commands.py
boost_collector_runner/tests/test_schedule_config.py
boost_collector_runner/tests/test_tasks.py
config/settings.py
docs/Workflow.md

📝 Walkthrough

Walkthrough

Adds strict-mode validation for the boost collector schedule YAML (env-controlled), startup health-check logging, a CLI --strict flag, enhanced stop-on-failure skip warnings, task API forwarding for --strict, and tests/docs covering these behaviors.

Changes

Schedule Strictness & Operational Safety

Layer / File(s)	Summary
Strict Mode Foundation: Settings, Error Class, and Strictness Logic `config/settings.py`, `boost_collector_runner/schedule_config.py`	Introduces `BOOST_COLLECTOR_SCHEDULE_STRICT`, `ScheduleConfigurationError`, `SCHEDULE_STARTUP_OK`, `is_schedule_strict()`, and helpers (`ensure_schedule_yaml_loaded`, `_load_schedule_yaml_data`) and updates `get_beat_schedule(strict=..., yaml_path=...)`.
Django Startup Validation & Schedule Summarization `boost_collector_runner/apps.py`	`BoostCollectorRunnerConfig.ready()` runs once, loads/validates schedule, sets `SCHEDULE_STARTUP_OK`, logs beat/group/task counts on success, and writes `BOOST_COLLECTOR_SCHEDULE_STARTUP_OK` into settings.
CLI Command: Strict Mode, Validation, and Skip-on-Failure Logging `boost_collector_runner/management/commands/run_scheduled_collectors.py`	Adds `--strict` to validate YAML before task resolution, adds `_task_would_run()` gating, iterates tasks with `enumerate` for index context, and WARNING-logs skipped eligible collectors with failed predecessor and reason when `--stop-on-failure` halts execution.
Celery Task Integration: Forward Strict Parameter `boost_collector_runner/tasks.py`	`run_scheduled_collectors_task()` gains `strict=False` parameter and forwards `--strict` to the management command when enabled.
Comprehensive Testing: Strict Mode, YAML Validation, and Skip Logging `boost_collector_runner/tests/*`	New/updated tests assert strict-mode errors for missing/invalid YAML, non-strict warns+returns `{}`, `--stop-on-failure` WARNING-logs skipped collectors with predecessor/reason, and task forwards `--strict`.
Documentation: Config, Setup, CLI, and Workflow Guidance `.env.example`, `README.md`, `boost_collector_runner/README.md`, `docs/Workflow.md`	Document `BOOST_COLLECTOR_SCHEDULE_STRICT`, `--strict`, enhanced `--stop-on-failure` skip warnings, startup summary logging, and strict vs non-strict Celery Beat startup behavior.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as run_scheduled_collectors (management command)
  participant Validator as ensure_schedule_yaml_loaded / schedule_config
  participant Resolver as get_beat_schedule
  participant Executor as task loop (enumerate)
  participant Logger as _log_skipped_after_stop_on_failure

  CLI->>Validator: --strict? call ensure_schedule_yaml_loaded()
  Validator-->>CLI: raises ScheduleConfigurationError (if strict & invalid) or returns
  CLI->>Resolver: get_beat_schedule(strict=..., yaml_path=...)
  Resolver-->>CLI: schedule entries
  CLI->>Executor: iterate tasks with index and _task_would_run gating
  Executor->>Executor: task fails (SystemExit/exception)
  Executor->>Logger: _log_skipped_after_stop_on_failure(failed_index, reason)
  Logger-->>Logger: WARNING logs for each skipped eligible collector with predecessor + reason

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

cppalliance/boost-data-collector#96: Introduces the YAML-driven scheduling system that this PR builds strict-mode validation and exception handling on top of.
cppalliance/boost-data-collector#222: Also modifies schedule loading/beat-building and adds tests/CI/health checks to ensure the schedule is non-empty; closely related to the strict startup behavior here.

Suggested reviewers

jonathanMLDev
wpak-ai

"🐰
I hopped through YAML fields at dawn,
strict checks awake where silence once yawned;
startup logs now shout the schedule's song,
skipped collectors warned — nothing hides long.
Hooray for clear skies and data reborn!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and accurately describes the main change: strict schedule YAML validation in production prevents silent failures when configuration is missing or invalid.
Linked Issues check	✅ Passed	The pull request comprehensively addresses all coding objectives from `#215`: strict schedule loading with ScheduleConfigurationError, per-collector skip logging, startup diagnostics, CLI --strict flag, and comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with `#215` requirements: strict mode validation, startup diagnostics, skip logging, CLI flags, and supporting documentation and tests.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.env.example:
- Around line 110-111: The example and comment are inconsistent: the comment
says to require config/boost_collector_schedule.yaml at startup but the example
sets BOOST_COLLECTOR_SCHEDULE_STRICT=false which disables strict mode; update
the .env.example so the example value matches the intent by changing
BOOST_COLLECTOR_SCHEDULE_STRICT to true (or alternatively adjust the comment to
state the variable is an optional override), and ensure you reference the
BOOST_COLLECTOR_SCHEDULE_STRICT example line so the example value and comment
are aligned.

In `@boost_collector_runner/apps.py`:
- Around line 59-64: The code currently only sets
settings.BOOST_COLLECTOR_SCHEDULE_STARTUP_OK when the attribute is missing,
which can leave a stale/predefined value; in the ready() path replace the
guarded setattr with an unconditional assignment so the runtime-computed
sc.SCHEDULE_STARTUP_OK is always written to
settings.BOOST_COLLECTOR_SCHEDULE_STARTUP_OK (i.e., remove the hasattr(...)
check and always call setattr or direct assignment inside the ready() method
where sc.SCHEDULE_STARTUP_OK is evaluated) so the startup health status reflects
the latest validation result.

In `@boost_collector_runner/README.md`:
- Line 47: Update the README entry for the flag `--stop-on-failure` to describe
the full logged behavior: state that when this flag is used the runner stops
after the first failing collector and that each subsequently skipped collector
is logged with both the identity of the failed predecessor and the reason for
skipping (matching the elsewhere-described skipped-collector log format); edit
the table row text for `--stop-on-failure` so it explicitly mentions "failed
predecessor and reason" rather than only "reason".

In `@boost_collector_runner/schedule_config.py`:
- Around line 102-126: Add an optional yaml_path parameter to
_load_schedule_yaml_data (e.g., def _load_schedule_yaml_data(*, strict: bool,
yaml_path: Optional[Path] = None)) so callers can pass a Path and avoid calling
_get_yaml_path(); inside the function use path = yaml_path if provided else
_get_yaml_path(), then proceed with the existing existence check, load_config
call and exception handling unchanged; update any call sites that run during
settings initialization to pass the known yaml_path and keep default behavior
identical when yaml_path is not supplied.
- Around line 460-476: get_beat_schedule currently always reads schedule YAML
via _load_schedule_yaml_data without ability to pass an explicit path; add a new
parameter yaml_path: str | None to get_beat_schedule signature and pass it
through to _load_schedule_yaml_data(..., yaml_path=yaml_path), preserving
existing strict handling via is_schedule_strict(strict); update any internal
calls (e.g., settings.py caller) to provide the yaml_path when needed and keep
default behavior when yaml_path is None.

In `@config/settings.py`:
- Around line 638-643: The call to get_beat_schedule() during settings
initialization triggers a circular dependency because it reads Django's settings
proxy; change the CELERY_BEAT_SCHEDULE assignment to call
get_beat_schedule(strict=BOOST_COLLECTOR_SCHEDULE_STRICT,
yaml_path=os.path.join(BASE_DIR, "path", "to", "schedule.yaml")) using the local
BASE_DIR and BOOST_COLLECTOR_SCHEDULE_STRICT variables instead of relying on
settings inside get_beat_schedule, and update
boost_collector_runner.schedule_config.get_beat_schedule(signature) to accept
and use the strict and yaml_path parameters (and raise
ScheduleConfigurationError on failures) so schedule loading no longer accesses
Django settings during import.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8bfcfb68-97d3-4d2c-8681-30ec7095bb43

📥 Commits

Reviewing files that changed from the base of the PR and between 70cc837 and 2ce8f47.

📒 Files selected for processing (12)

.env.example
README.md
boost_collector_runner/README.md
boost_collector_runner/apps.py
boost_collector_runner/management/commands/run_scheduled_collectors.py
boost_collector_runner/schedule_config.py
boost_collector_runner/tasks.py
boost_collector_runner/tests/test_commands.py
boost_collector_runner/tests/test_schedule_config.py
boost_collector_runner/tests/test_tasks.py
config/settings.py
docs/Workflow.md

…for strict schedule validation

…schedule-strict-startup

…ading

wpak-ai · 2026-05-21T21:04:11Z

@coderabbitai resume

coderabbitai · 2026-05-21T21:04:17Z

✅ Actions performed

Reviews resumed.

wpak-ai · 2026-05-21T21:04:54Z

@coderabbitai full review

coderabbitai · 2026-05-21T21:05:01Z

✅ Actions performed

Full review triggered.

wpak-ai · 2026-05-21T21:17:41Z

@coderabbitai review

coderabbitai · 2026-05-21T21:17:45Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

feat(boost_collector_runner): implement strict mode for schedule YAML…

2ce8f47

… validation

leostar0412 self-assigned this May 21, 2026

coderabbitai Bot reviewed May 21, 2026

View reviewed changes

chore: update .env.example and README for clarity; refactor settings …

e2764bc

…for strict schedule validation

leostar0412 marked this pull request as ready for review May 21, 2026 19:14

leostar0412 requested review from jonathanMLDev, snowfox1003 and wpak-ai as code owners May 21, 2026 19:14

leostar0412 removed request for snowfox1003 and wpak-ai May 21, 2026 19:14

jonathanMLDev approved these changes May 21, 2026

View reviewed changes

Comment thread config/settings.py

leostar0412 added 2 commits May 21, 2026 13:14

Merge remote-tracking branch 'origin/develop' into feature/collector-…

5872589

…schedule-strict-startup

refactor(settings): enhance strict mode handling for YAML schedule lo…

d63fa60

…ading

leostar0412 requested a review from wpak-ai May 21, 2026 20:23

wpak-ai approved these changes May 21, 2026

View reviewed changes

wpak-ai merged commit 2ddb9ee into cppalliance:develop May 21, 2026
5 checks passed

coderabbitai Bot mentioned this pull request May 21, 2026

feat: production Docker hardening, /health/ readiness, and JSON logging #228

Merged

Conversation

leostar0412 commented May 21, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Configuration

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wpak-ai commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

wpak-ai commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

wpak-ai commented May 21, 2026

Uh oh!

coderabbitai Bot commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leostar0412 commented May 21, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 21, 2026 •

edited

Loading