Fail fast when collector schedule YAML is missing or invalid in production#227
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (12)
📝 WalkthroughWalkthroughAdds strict-mode validation for the boost collector schedule YAML (env-controlled), startup health-check logging, a CLI ChangesSchedule Strictness & Operational Safety
Sequence Diagram(s)sequenceDiagram
participant CLI as run_scheduled_collectors (management command)
participant Validator as ensure_schedule_yaml_loaded / schedule_config
participant Resolver as get_beat_schedule
participant Executor as task loop (enumerate)
participant Logger as _log_skipped_after_stop_on_failure
CLI->>Validator: --strict? call ensure_schedule_yaml_loaded()
Validator-->>CLI: raises ScheduleConfigurationError (if strict & invalid) or returns
CLI->>Resolver: get_beat_schedule(strict=..., yaml_path=...)
Resolver-->>CLI: schedule entries
CLI->>Executor: iterate tasks with index and _task_would_run gating
Executor->>Executor: task fails (SystemExit/exception)
Executor->>Logger: _log_skipped_after_stop_on_failure(failed_index, reason)
Logger-->>Logger: WARNING logs for each skipped eligible collector with predecessor + reason
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.env.example:
- Around line 110-111: The example and comment are inconsistent: the comment
says to require config/boost_collector_schedule.yaml at startup but the example
sets BOOST_COLLECTOR_SCHEDULE_STRICT=false which disables strict mode; update
the .env.example so the example value matches the intent by changing
BOOST_COLLECTOR_SCHEDULE_STRICT to true (or alternatively adjust the comment to
state the variable is an optional override), and ensure you reference the
BOOST_COLLECTOR_SCHEDULE_STRICT example line so the example value and comment
are aligned.
In `@boost_collector_runner/apps.py`:
- Around line 59-64: The code currently only sets
settings.BOOST_COLLECTOR_SCHEDULE_STARTUP_OK when the attribute is missing,
which can leave a stale/predefined value; in the ready() path replace the
guarded setattr with an unconditional assignment so the runtime-computed
sc.SCHEDULE_STARTUP_OK is always written to
settings.BOOST_COLLECTOR_SCHEDULE_STARTUP_OK (i.e., remove the hasattr(...)
check and always call setattr or direct assignment inside the ready() method
where sc.SCHEDULE_STARTUP_OK is evaluated) so the startup health status reflects
the latest validation result.
In `@boost_collector_runner/README.md`:
- Line 47: Update the README entry for the flag `--stop-on-failure` to describe
the full logged behavior: state that when this flag is used the runner stops
after the first failing collector and that each subsequently skipped collector
is logged with both the identity of the failed predecessor and the reason for
skipping (matching the elsewhere-described skipped-collector log format); edit
the table row text for `--stop-on-failure` so it explicitly mentions "failed
predecessor and reason" rather than only "reason".
In `@boost_collector_runner/schedule_config.py`:
- Around line 102-126: Add an optional yaml_path parameter to
_load_schedule_yaml_data (e.g., def _load_schedule_yaml_data(*, strict: bool,
yaml_path: Optional[Path] = None)) so callers can pass a Path and avoid calling
_get_yaml_path(); inside the function use path = yaml_path if provided else
_get_yaml_path(), then proceed with the existing existence check, load_config
call and exception handling unchanged; update any call sites that run during
settings initialization to pass the known yaml_path and keep default behavior
identical when yaml_path is not supplied.
- Around line 460-476: get_beat_schedule currently always reads schedule YAML
via _load_schedule_yaml_data without ability to pass an explicit path; add a new
parameter yaml_path: str | None to get_beat_schedule signature and pass it
through to _load_schedule_yaml_data(..., yaml_path=yaml_path), preserving
existing strict handling via is_schedule_strict(strict); update any internal
calls (e.g., settings.py caller) to provide the yaml_path when needed and keep
default behavior when yaml_path is None.
In `@config/settings.py`:
- Around line 638-643: The call to get_beat_schedule() during settings
initialization triggers a circular dependency because it reads Django's settings
proxy; change the CELERY_BEAT_SCHEDULE assignment to call
get_beat_schedule(strict=BOOST_COLLECTOR_SCHEDULE_STRICT,
yaml_path=os.path.join(BASE_DIR, "path", "to", "schedule.yaml")) using the local
BASE_DIR and BOOST_COLLECTOR_SCHEDULE_STRICT variables instead of relying on
settings inside get_beat_schedule, and update
boost_collector_runner.schedule_config.get_beat_schedule(signature) to accept
and use the strict and yaml_path parameters (and raise
ScheduleConfigurationError on failures) so schedule loading no longer accesses
Django settings during import.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8bfcfb68-97d3-4d2c-8681-30ec7095bb43
📒 Files selected for processing (12)
.env.exampleREADME.mdboost_collector_runner/README.mdboost_collector_runner/apps.pyboost_collector_runner/management/commands/run_scheduled_collectors.pyboost_collector_runner/schedule_config.pyboost_collector_runner/tasks.pyboost_collector_runner/tests/test_commands.pyboost_collector_runner/tests/test_schedule_config.pyboost_collector_runner/tests/test_tasks.pyconfig/settings.pydocs/Workflow.md
…for strict schedule validation
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Summary
DEBUG=FalseorBOOST_COLLECTOR_SCHEDULE_STRICT=True, missing or invalidconfig/boost_collector_schedule.yamlraisesScheduleConfigurationErrorduring settings import so Celery Beat cannot start with an emptyCELERY_BEAT_SCHEDULE. Local dev withDEBUG=Trueand strict off still logs a warning and uses{}.boost_collector_runnerlogs a schedule summary at app ready (path, group/task counts, Beat entry keys) and exposesBOOST_COLLECTOR_SCHEDULE_STARTUP_OKfor checks.run_scheduled_collectors --strictvalidates the YAML before resolving tasks (useful in CI whenDEBUGis True). Celery task forwardsstrict=True.--stop-on-failurenow logs WARNING for each skipped collector with the failed predecessor and reason.Motivation
Previously, a missing or broken schedule file could leave Beat running with no entries and no hard failure. Production and staging should fail at startup instead of silently scheduling nothing.
Configuration
DEBUG=False(default in prod), orBOOST_COLLECTOR_SCHEDULE_STRICT=trueDEBUG=TrueandBOOST_COLLECTOR_SCHEDULE_STRICTunset/falseDocumented in
.env.example,README.md,docs/Workflow.md, andboost_collector_runner/README.md.Test plan
pytest boost_collector_runner/tests/test_schedule_config.py(strict / non-strictget_beat_schedule)pytest boost_collector_runner/tests/test_commands.py(--stricton missing YAML)pytest boost_collector_runner/tests/test_tasks.py(Celery passes--strict)python manage.py shell -c "from django.conf import settings; print(len(settings.CELERY_BEAT_SCHEDULE))"→ > 0DEBUG=False→ Django startup raisesScheduleConfigurationErrorpython manage.py run_scheduled_collectors --schedule daily --strictwith missing YAML → non-zero exit / clear errorClose #215
Summary by CodeRabbit
New Features
Documentation
Tests