Skip to content

Fail fast when collector schedule YAML is missing or invalid in production#227

Merged
wpak-ai merged 4 commits into
cppalliance:developfrom
leostar0412:feature/collector-schedule-strict-startup
May 21, 2026
Merged

Fail fast when collector schedule YAML is missing or invalid in production#227
wpak-ai merged 4 commits into
cppalliance:developfrom
leostar0412:feature/collector-schedule-strict-startup

Conversation

@leostar0412
Copy link
Copy Markdown
Collaborator

@leostar0412 leostar0412 commented May 21, 2026

Summary

  • Strict schedule loading: When DEBUG=False or BOOST_COLLECTOR_SCHEDULE_STRICT=True, missing or invalid config/boost_collector_schedule.yaml raises ScheduleConfigurationError during settings import so Celery Beat cannot start with an empty CELERY_BEAT_SCHEDULE. Local dev with DEBUG=True and strict off still logs a warning and uses {}.
  • Startup diagnostics: boost_collector_runner logs a schedule summary at app ready (path, group/task counts, Beat entry keys) and exposes BOOST_COLLECTOR_SCHEDULE_STARTUP_OK for checks.
  • CLI and Celery: run_scheduled_collectors --strict validates the YAML before resolving tasks (useful in CI when DEBUG is True). Celery task forwards strict=True. --stop-on-failure now logs WARNING for each skipped collector with the failed predecessor and reason.

Motivation

Previously, a missing or broken schedule file could leave Beat running with no entries and no hard failure. Production and staging should fail at startup instead of silently scheduling nothing.

Configuration

Mode When
Strict (fail) DEBUG=False (default in prod), or BOOST_COLLECTOR_SCHEDULE_STRICT=true
Lenient (warn + empty beat) DEBUG=True and BOOST_COLLECTOR_SCHEDULE_STRICT unset/false

Documented in .env.example, README.md, docs/Workflow.md, and boost_collector_runner/README.md.

Test plan

  • pytest boost_collector_runner/tests/test_schedule_config.py (strict / non-strict get_beat_schedule)
  • pytest boost_collector_runner/tests/test_commands.py (--strict on missing YAML)
  • pytest boost_collector_runner/tests/test_tasks.py (Celery passes --strict)
  • With valid committed YAML: python manage.py shell -c "from django.conf import settings; print(len(settings.CELERY_BEAT_SCHEDULE))" → > 0
  • Rename/move schedule YAML locally with DEBUG=False → Django startup raises ScheduleConfigurationError
  • python manage.py run_scheduled_collectors --schedule daily --strict with missing YAML → non-zero exit / clear error

Close #215

Summary by CodeRabbit

  • New Features

    • Added a --strict flag to validate the schedule YAML before resolving/running collectors.
    • New BOOST_COLLECTOR_SCHEDULE_STRICT env var to enforce strict schedule validation at startup (prevents Beat startup if missing/invalid).
    • Run behavior now logs skipped collectors (with failed predecessor and reason) when --stop-on-failure stops a run.
    • Scheduled-run task can be invoked with strict validation; startup now emits a brief loaded-schedule summary and startup-ok flag.
  • Documentation

    • Updated README(s) and docs to explain strict vs non-strict startup behavior, CLI flags and logging.
  • Tests

    • Added/updated tests covering strict-mode validation and stop-on-failure logging.

Review Change Stack

@leostar0412 leostar0412 self-assigned this May 21, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b111b6f0-6ea5-4c2f-9c57-a142cc48c6d3

📥 Commits

Reviewing files that changed from the base of the PR and between 3676d0b and d63fa60.

📒 Files selected for processing (12)
  • .env.example
  • README.md
  • boost_collector_runner/README.md
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_schedule_config.py
  • boost_collector_runner/tests/test_tasks.py
  • config/settings.py
  • docs/Workflow.md

📝 Walkthrough

Walkthrough

Adds strict-mode validation for the boost collector schedule YAML (env-controlled), startup health-check logging, a CLI --strict flag, enhanced stop-on-failure skip warnings, task API forwarding for --strict, and tests/docs covering these behaviors.

Changes

Schedule Strictness & Operational Safety

Layer / File(s) Summary
Strict Mode Foundation: Settings, Error Class, and Strictness Logic
config/settings.py, boost_collector_runner/schedule_config.py
Introduces BOOST_COLLECTOR_SCHEDULE_STRICT, ScheduleConfigurationError, SCHEDULE_STARTUP_OK, is_schedule_strict(), and helpers (ensure_schedule_yaml_loaded, _load_schedule_yaml_data) and updates get_beat_schedule(strict=..., yaml_path=...).
Django Startup Validation & Schedule Summarization
boost_collector_runner/apps.py
BoostCollectorRunnerConfig.ready() runs once, loads/validates schedule, sets SCHEDULE_STARTUP_OK, logs beat/group/task counts on success, and writes BOOST_COLLECTOR_SCHEDULE_STARTUP_OK into settings.
CLI Command: Strict Mode, Validation, and Skip-on-Failure Logging
boost_collector_runner/management/commands/run_scheduled_collectors.py
Adds --strict to validate YAML before task resolution, adds _task_would_run() gating, iterates tasks with enumerate for index context, and WARNING-logs skipped eligible collectors with failed predecessor and reason when --stop-on-failure halts execution.
Celery Task Integration: Forward Strict Parameter
boost_collector_runner/tasks.py
run_scheduled_collectors_task() gains strict=False parameter and forwards --strict to the management command when enabled.
Comprehensive Testing: Strict Mode, YAML Validation, and Skip Logging
boost_collector_runner/tests/*
New/updated tests assert strict-mode errors for missing/invalid YAML, non-strict warns+returns {}, --stop-on-failure WARNING-logs skipped collectors with predecessor/reason, and task forwards --strict.
Documentation: Config, Setup, CLI, and Workflow Guidance
.env.example, README.md, boost_collector_runner/README.md, docs/Workflow.md
Document BOOST_COLLECTOR_SCHEDULE_STRICT, --strict, enhanced --stop-on-failure skip warnings, startup summary logging, and strict vs non-strict Celery Beat startup behavior.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as run_scheduled_collectors (management command)
  participant Validator as ensure_schedule_yaml_loaded / schedule_config
  participant Resolver as get_beat_schedule
  participant Executor as task loop (enumerate)
  participant Logger as _log_skipped_after_stop_on_failure

  CLI->>Validator: --strict? call ensure_schedule_yaml_loaded()
  Validator-->>CLI: raises ScheduleConfigurationError (if strict & invalid) or returns
  CLI->>Resolver: get_beat_schedule(strict=..., yaml_path=...)
  Resolver-->>CLI: schedule entries
  CLI->>Executor: iterate tasks with index and _task_would_run gating
  Executor->>Executor: task fails (SystemExit/exception)
  Executor->>Logger: _log_skipped_after_stop_on_failure(failed_index, reason)
  Logger-->>Logger: WARNING logs for each skipped eligible collector with predecessor + reason
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • cppalliance/boost-data-collector#96: Introduces the YAML-driven scheduling system that this PR builds strict-mode validation and exception handling on top of.
  • cppalliance/boost-data-collector#222: Also modifies schedule loading/beat-building and adds tests/CI/health checks to ensure the schedule is non-empty; closely related to the strict startup behavior here.

Suggested reviewers

  • jonathanMLDev
  • wpak-ai

"🐰
I hopped through YAML fields at dawn,
strict checks awake where silence once yawned;
startup logs now shout the schedule's song,
skipped collectors warned — nothing hides long.
Hooray for clear skies and data reborn!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately describes the main change: strict schedule YAML validation in production prevents silent failures when configuration is missing or invalid.
Linked Issues check ✅ Passed The pull request comprehensively addresses all coding objectives from #215: strict schedule loading with ScheduleConfigurationError, per-collector skip logging, startup diagnostics, CLI --strict flag, and comprehensive test coverage.
Out of Scope Changes check ✅ Passed All changes are directly aligned with #215 requirements: strict mode validation, startup diagnostics, skip logging, CLI flags, and supporting documentation and tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.env.example:
- Around line 110-111: The example and comment are inconsistent: the comment
says to require config/boost_collector_schedule.yaml at startup but the example
sets BOOST_COLLECTOR_SCHEDULE_STRICT=false which disables strict mode; update
the .env.example so the example value matches the intent by changing
BOOST_COLLECTOR_SCHEDULE_STRICT to true (or alternatively adjust the comment to
state the variable is an optional override), and ensure you reference the
BOOST_COLLECTOR_SCHEDULE_STRICT example line so the example value and comment
are aligned.

In `@boost_collector_runner/apps.py`:
- Around line 59-64: The code currently only sets
settings.BOOST_COLLECTOR_SCHEDULE_STARTUP_OK when the attribute is missing,
which can leave a stale/predefined value; in the ready() path replace the
guarded setattr with an unconditional assignment so the runtime-computed
sc.SCHEDULE_STARTUP_OK is always written to
settings.BOOST_COLLECTOR_SCHEDULE_STARTUP_OK (i.e., remove the hasattr(...)
check and always call setattr or direct assignment inside the ready() method
where sc.SCHEDULE_STARTUP_OK is evaluated) so the startup health status reflects
the latest validation result.

In `@boost_collector_runner/README.md`:
- Line 47: Update the README entry for the flag `--stop-on-failure` to describe
the full logged behavior: state that when this flag is used the runner stops
after the first failing collector and that each subsequently skipped collector
is logged with both the identity of the failed predecessor and the reason for
skipping (matching the elsewhere-described skipped-collector log format); edit
the table row text for `--stop-on-failure` so it explicitly mentions "failed
predecessor and reason" rather than only "reason".

In `@boost_collector_runner/schedule_config.py`:
- Around line 102-126: Add an optional yaml_path parameter to
_load_schedule_yaml_data (e.g., def _load_schedule_yaml_data(*, strict: bool,
yaml_path: Optional[Path] = None)) so callers can pass a Path and avoid calling
_get_yaml_path(); inside the function use path = yaml_path if provided else
_get_yaml_path(), then proceed with the existing existence check, load_config
call and exception handling unchanged; update any call sites that run during
settings initialization to pass the known yaml_path and keep default behavior
identical when yaml_path is not supplied.
- Around line 460-476: get_beat_schedule currently always reads schedule YAML
via _load_schedule_yaml_data without ability to pass an explicit path; add a new
parameter yaml_path: str | None to get_beat_schedule signature and pass it
through to _load_schedule_yaml_data(..., yaml_path=yaml_path), preserving
existing strict handling via is_schedule_strict(strict); update any internal
calls (e.g., settings.py caller) to provide the yaml_path when needed and keep
default behavior when yaml_path is None.

In `@config/settings.py`:
- Around line 638-643: The call to get_beat_schedule() during settings
initialization triggers a circular dependency because it reads Django's settings
proxy; change the CELERY_BEAT_SCHEDULE assignment to call
get_beat_schedule(strict=BOOST_COLLECTOR_SCHEDULE_STRICT,
yaml_path=os.path.join(BASE_DIR, "path", "to", "schedule.yaml")) using the local
BASE_DIR and BOOST_COLLECTOR_SCHEDULE_STRICT variables instead of relying on
settings inside get_beat_schedule, and update
boost_collector_runner.schedule_config.get_beat_schedule(signature) to accept
and use the strict and yaml_path parameters (and raise
ScheduleConfigurationError on failures) so schedule loading no longer accesses
Django settings during import.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8bfcfb68-97d3-4d2c-8681-30ec7095bb43

📥 Commits

Reviewing files that changed from the base of the PR and between 70cc837 and 2ce8f47.

📒 Files selected for processing (12)
  • .env.example
  • README.md
  • boost_collector_runner/README.md
  • boost_collector_runner/apps.py
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/schedule_config.py
  • boost_collector_runner/tasks.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_schedule_config.py
  • boost_collector_runner/tests/test_tasks.py
  • config/settings.py
  • docs/Workflow.md

Comment thread .env.example Outdated
Comment thread boost_collector_runner/apps.py Outdated
Comment thread boost_collector_runner/README.md Outdated
Comment thread boost_collector_runner/schedule_config.py Outdated
Comment thread boost_collector_runner/schedule_config.py Outdated
Comment thread config/settings.py Outdated
@leostar0412 leostar0412 marked this pull request as ready for review May 21, 2026 19:14
Comment thread config/settings.py
@leostar0412 leostar0412 requested a review from wpak-ai May 21, 2026 20:23
@wpak-ai
Copy link
Copy Markdown
Collaborator

wpak-ai commented May 21, 2026

@coderabbitai resume

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

✅ Actions performed

Reviews resumed.

@wpak-ai
Copy link
Copy Markdown
Collaborator

wpak-ai commented May 21, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

✅ Actions performed

Full review triggered.

@wpak-ai
Copy link
Copy Markdown
Collaborator

wpak-ai commented May 21, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@wpak-ai wpak-ai merged commit 2ddb9ee into cppalliance:develop May 21, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Silent Celery Beat Fallback Hides Missing Schedule Config

3 participants