Skip to content

feat: production Docker hardening, /health/ readiness, and JSON logging#228

Merged
wpak-ai merged 8 commits into
cppalliance:developfrom
leostar0412:feat/docker-health-and-observability
May 22, 2026
Merged

feat: production Docker hardening, /health/ readiness, and JSON logging#228
wpak-ai merged 8 commits into
cppalliance:developfrom
leostar0412:feat/docker-health-and-observability

Conversation

@leostar0412
Copy link
Copy Markdown
Collaborator

@leostar0412 leostar0412 commented May 21, 2026

Summary

Hardens the VM Docker Compose production path and adds readiness/observability for scheduled collectors.

  • Docker: python:3.13-slim, non-root appuser, image HEALTHCHECKGET /health/, Gunicorn gthread via docker/gunicorn.conf.py; new docker-compose.prod.yml (resource limits, LOG_FORMAT=json, restart: always). Dev compose keeps root entrypoint for volume chown; Celery worker/beat disable the image HTTP healthcheck.
  • Readiness: GET /health/ checks PostgreSQL, Celery worker ping (configurable min count/timeout), and per–daily-group collector freshness from YAML schedule + CollectorGroupRunStatus. Optional bearer token; HEALTH_ENFORCE_COLLECTOR_FRESHNESS (default true; CI/smoke sets false until groups have run once).
  • Collector run tracking: CollectorGroupRunStatus model, boost_collector_runner/services.py, migration; run_scheduled_collectors --group records success/failure.
  • Logging: LOG_FORMAT=json uses CloudLoggingJsonFormatter (severity field) for GCP-friendly structured logs.
  • Deps: Pinecone SDK 6.x (pinecone>=6.0,<7); locks recompiled for Python 3.13.
  • Docs: docs/GCP_Production_Checklist.md, Deployment updates, generated docs/service_api/boost_collector_runner.md.

Deploy notes

  • Prod: docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d, then migrate; verify curl http://127.0.0.1:8000/health/.
  • After first successful daily collector runs, keep HEALTH_ENFORCE_COLLECTOR_FRESHNESS=true in production .env.
  • If upgrading from Pinecone 5.x locally, remove deprecated pinecone-plugin-inference from the venv after uv pip install -r requirements.lock.

Test plan

  • uv run pytest config/tests/test_health.py boost_collector_runner/tests/test_services.py -v
  • python scripts/generate_service_docs.py --check
  • docker compose -f docker-compose.yml -f docker-compose.ci.yml build && docker compose up -d && migrate && make health

Closes: #211

Summary by CodeRabbit

  • New Features

    • GET /health/ readiness endpoint with DB, Celery, collector-freshness and external-sync checks; optional Bearer token.
    • Persisted collector-group run statuses and service APIs to record/query outcomes.
    • Configurable JSON/cloud logging and externalized Gunicorn runtime config.
  • Documentation

    • Deployment and production checklist updated; added service API and health/readiness guidance.
  • Chores

    • Container/runtime and compose updates, non-root runtime user, Python 3.13 image bump, expanded ignore patterns, health make target adjusted, CI env tweak.
  • Tests

    • New/expanded tests for health endpoint, logging formatter, collector services, and command behavior.

Review Change Stack

@leostar0412 leostar0412 requested a review from jonathanMLDev May 21, 2026 20:27
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9492ed93-3346-4935-92d7-c62ac8340b88

📥 Commits

Reviewing files that changed from the base of the PR and between 2676cb0 and eebb4f5.

📒 Files selected for processing (2)
  • config/health.py
  • config/tests/test_health.py

📝 Walkthrough

Walkthrough

Implements a /health/ readiness endpoint, persists collector-group run outcomes, hardens Docker image/runtime and compose overlays with healthchecks, adds Cloud Logging JSON formatting, updates settings/CI/docs, and includes tests covering health and logging behavior.

Changes

Production Deployment Infrastructure

Layer / File(s) Summary
Collector group run status service
boost_collector_runner/models.py, boost_collector_runner/migrations/0001_initial.py, boost_collector_runner/services.py, boost_collector_runner/tests/test_services.py, boost_collector_runner/management/commands/run_scheduled_collectors.py, boost_collector_runner/tests/test_commands.py
CollectorGroupRunStatus model persists last run timestamps and exit codes per collector group; services.py provides record_group_success, record_group_failure, get_group_status, and list_group_statuses; the scheduled collectors management command records outcomes via the service layer; tests verify service and command behavior.
Health check endpoint and readiness logic
config/health.py, config/urls.py, config/tests/test_health.py
Implements /health/ endpoint that reports database connectivity (latency), Celery worker responsiveness, collector-group freshness from YAML schedules, and Pinecone sync status; supports optional Bearer-token gating and returns 200 when healthy or 503 when readiness fails. Tests cover healthy/unhealthy states, auth, and stable error payloads.
Health-related settings and CI env
config/settings.py, config/test_settings.py, .env.example, .github/workflows/actions.yml
Adds LOG_FORMAT selection and health settings (HEALTH_CHECK_TOKEN, HEALTH_CELERY_MIN_WORKERS, HEALTH_CELERY_INSPECT_TIMEOUT, HEALTH_COLLECTOR_STALE_HOURS, HEALTH_ENFORCE_COLLECTOR_FRESHNESS); test settings force freshness enforcement; .env.example documents placeholders; CI smoke job writes HEALTH_ENFORCE_COLLECTOR_FRESHNESS=false.
JSON logging for Cloud Logging
config/logging_formatters.py, config/tests/test_logging_json.py
Adds CloudLoggingJsonFormatter that emits severity and timestamped JSON; settings pick formatter via LOG_FORMAT; tests validate severity and message presence.
Docker image hardening and runtime config
Dockerfile, docker-entrypoint.sh, docker/gunicorn.conf.py
Upgrades base to python:3.13-slim, installs curl for healthchecks, creates appuser with uid/gid 10001 and switches to non-root USER, adds Docker HEALTHCHECK probing /health/, gates chown/user switch in entrypoint, and externalizes Gunicorn config via docker/gunicorn.conf.py.
Docker Compose production overlay and updates
docker-compose.yml, docker-compose.prod.yml
Adds docker-compose.prod.yml with resource limits, restart policies, JSON logging, and Celery task-recycling config; updates docker-compose.yml to allow gated root entrypoint and replace image HTTP healthchecks for Celery with process-based checks.
Makefile and dockerignore
Makefile, .dockerignore
make health now curls the web container /health/ (optional Bearer token) and checks Redis; .dockerignore expanded to exclude additional dev/test artifacts and compose variants.
Documentation, requirements, and contributing
docs/Deployment.md, docs/GCP_Production_Checklist.md, docs/service_api/*, README.md, CONTRIBUTING.md, requirements.in, requirements-dev.in
Adds production deployment guidance and GCP checklist, documents boost_collector_runner.services, updates README/CONTRIBUTING, bumps Pinecone to 6.x and updates example pip-compile Python version to 3.13.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant HealthEndpoint as /health/ endpoint
  participant HealthChecks as run_health_checks()
  participant Database
  participant Celery
  participant CollectorServices
  participant PineconeSyncStatus
  
  Client->>HealthEndpoint: GET /health/ (Bearer token optional)
  HealthEndpoint->>HealthChecks: check authentication, run checks
  HealthChecks->>Database: verify connectivity, measure latency
  Database-->>HealthChecks: ok/error result
  
  alt Database OK
    HealthChecks->>Celery: inspect().ping() for active workers
    Celery-->>HealthChecks: worker list and count
    
    HealthChecks->>CollectorServices: list_group_statuses()
    CollectorServices->>Database: fetch CollectorGroupRunStatus rows
    Database-->>CollectorServices: group staleness data
    
    HealthChecks->>PineconeSyncStatus: query sync timestamps
    PineconeSyncStatus-->>HealthChecks: app-type sync times
  end
  
  HealthChecks-->>HealthEndpoint: aggregated status payload
  HealthEndpoint-->>Client: 200 (healthy) or 503 (unhealthy) + JSON
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • jonathanMLDev
  • snowfox1003
  • wpak-ai

Poem

🐰 I hopped through code and left a trail,
Non-root boots and curl to tell the tale.
Health checks hum, collectors mark the score,
JSON logs twinkle, readiness at the door.
Carrots and tests — deployment, encore!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 21.05% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main changes: Docker hardening, /health/ readiness endpoint, and JSON logging.
Linked Issues check ✅ Passed The PR successfully implements all acceptance criteria from issue #211: hardened Dockerfile with non-root user and pinned packages, /health/ endpoint with collector/Celery/database checks, JSON logging via CloudLoggingJsonFormatter, docker-compose.prod.yml with resource limits, and verification documentation.
Out of Scope Changes check ✅ Passed All changes are within scope of #211 requirements. Docker upgrades (3.13), collector run tracking, health monitoring, logging infrastructure, and documentation changes all directly support production hardening and observability objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@leostar0412 leostar0412 requested review from snowfox1003 and wpak-ai and removed request for snowfox1003 and wpak-ai May 21, 2026 20:27
@leostar0412 leostar0412 self-assigned this May 21, 2026
… custom health checks in the main service configuration.
…ce health check logic in health.py. Added error handling for collector group status checks and updated test settings for collector freshness enforcement. Improved test coverage for health checks under failure conditions.
@leostar0412
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
docker-compose.yml (1)

78-87: 💤 Low value

Healthcheck may fail spuriously if any process exits during scan.

The one-liner iterates all /proc/[0-9]*/cmdline paths and opens each file. If any process exits between glob() and open(), a FileNotFoundError propagates and fails the entire check—even if Celery is running fine.

With 5 retries this is unlikely to cause real issues, but a try/except would make it deterministic:

♻️ More robust alternative
 healthcheck:
   test:
     [
       "CMD-SHELL",
-      'python -c "import glob,sys; sys.exit(0 if any(b\"celery\" in (d:=open(p,\"rb\").read()) and b\"worker\" in d for p in glob.glob(\"/proc/[0-9]*/cmdline\")) else 1)"',
+      'python -c "import glob,sys; found=False
+for p in glob.glob(\"/proc/[0-9]*/cmdline\"):
+  try:
+    d=open(p,\"rb\").read()
+    if b\"celery\" in d and b\"worker\" in d: found=True; break
+  except: pass
+sys.exit(0 if found else 1)"',
     ]

Alternatively, use celery inspect ping for a more semantic check (though it requires broker connectivity).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docker-compose.yml` around lines 78 - 87, The healthcheck one-liner that
scans "/proc/[0-9]*/cmdline" can raise FileNotFoundError if a process exits
between glob() and open(); update the CMD-SHELL python snippet (the string under
healthcheck.test) to catch and ignore FileNotFoundError (and optionally
PermissionError) when opening/reading each cmdline so the scan continues rather
than aborting, preserving the original logic that returns 0 if any cmdline
contains both b"celery" and b"worker"; alternatively, replace the whole test
with a broker-aware check such as running "celery inspect ping" if you prefer a
semantic check.
config/tests/test_health.py (1)

97-97: ⚡ Quick win

Add trailing newline at end of file.

Python files should end with a newline character per PEP 8 convention. Most linters will flag this.

📝 Proposed fix
     assert payload["checks"]["collector_groups"] == {}
+
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/tests/test_health.py` at line 97, The file config/tests/test_health.py
is missing a trailing newline at EOF; open that test file (look near the
assertion line containing assert payload["checks"]["collector_groups"] == {}),
add a single newline character at the end of the file so it ends with a newline
(saving the file will ensure linters/PEP8 are satisfied).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@boost_collector_runner/management/commands/run_scheduled_collectors.py`:
- Around line 221-225: The current logic records a group success whenever
group_id is set and exit_code == 0, which falsely records success when all tasks
were skipped; modify the code around where collectors are executed to track
whether any collector actually ran (e.g., introduce a boolean like any_executed
or an integer executed_count updated inside the collector execution loop), then
change the final block that calls
collector_services.record_group_success(group_id) to require both exit_code == 0
and any_executed (or executed_count > 0); keep the failure path as-is
(record_group_failure with exit_code) when collectors ran or if you still want
failures recorded for attempted runs.

In `@config/health.py`:
- Around line 27-29: Replace all uses of returning str(exc) from the exception
handlers in config/health.py with stable, non-sensitive error responses (e.g.
return {"ok": False, "error": "internal_error"} or a specific error
code/message), and ensure the full exception is recorded via structured logging
(logger.exception(...) or logger.error(..., exc_info=True)) inside the
corresponding except Exception as exc blocks (the three occurrences shown by the
diff and the other blocks at the referenced ranges). Locate the except Exception
as exc blocks in config/health.py and: 1) log the detailed exception with
logger.exception or logger.error(..., exc_info=True); 2) return a generic error
code/message payload to the caller instead of str(exc).
- Around line 64-67: The try/except around load_config currently swallows
parsing/service errors by returning an empty set (causing any_stale to become
False); instead surface failures so readiness fails: in the except block for
load_config (and the similar blocks at the other locations), catch Exception as
e, log the error, and return None (or re-raise) rather than return set() so
callers can detect a None result and mark freshness/readiness as failed; update
callers that check the return to treat None as an error state (e.g., any_stale =
None -> unhealthy).

In `@config/tests/test_health.py`:
- Around line 47-53: The Celery mock in the test uses _check_celery_workers but
returns inconsistent data (workers: [] with responded: 1); update the mock
return so it's realistic and consistent—for example set "responded": 1 and
include a corresponding single worker entry in the "workers" list (or
alternatively set "responded": 0 if keeping an empty list). Locate the mock of
_check_celery_workers in the test and adjust the returned dict so "workers" and
"responded" match.

In `@docs/service_api/boost_collector_runner.md`:
- Line 25: Update the broken markdown link "[Contributing]" that currently
points to "../Contributing.md" so it uses the correct relative path and casing;
replace the target with "../../CONTRIBUTING.md" so the link resolves on
case-sensitive filesystems (change the link target only, leave the link text
"[Contributing]" unchanged).

In `@Makefile`:
- Line 100: The health-check curl invocation in the Makefile should include an
Authorization header when HEALTH_CHECK_TOKEN is set; update the recipe that
contains the $(COMPOSE) exec -T $(APP) sh -c 'curl ...' line to conditionally
add a header like -H "Authorization: Bearer $(HEALTH_CHECK_TOKEN)" when
HEALTH_CHECK_TOKEN is non-empty (e.g. create an AUTH_HDR variable that expands
to the header or empty string and insert it into the curl command), ensuring
quoting/escaping works inside the sh -c command so the token is passed into the
container.

---

Nitpick comments:
In `@config/tests/test_health.py`:
- Line 97: The file config/tests/test_health.py is missing a trailing newline at
EOF; open that test file (look near the assertion line containing assert
payload["checks"]["collector_groups"] == {}), add a single newline character at
the end of the file so it ends with a newline (saving the file will ensure
linters/PEP8 are satisfied).

In `@docker-compose.yml`:
- Around line 78-87: The healthcheck one-liner that scans "/proc/[0-9]*/cmdline"
can raise FileNotFoundError if a process exits between glob() and open(); update
the CMD-SHELL python snippet (the string under healthcheck.test) to catch and
ignore FileNotFoundError (and optionally PermissionError) when opening/reading
each cmdline so the scan continues rather than aborting, preserving the original
logic that returns 0 if any cmdline contains both b"celery" and b"worker";
alternatively, replace the whole test with a broker-aware check such as running
"celery inspect ping" if you prefer a semantic check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ba58817-f2e2-40c2-a494-c3017e1e3bd6

📥 Commits

Reviewing files that changed from the base of the PR and between 3676d0b and cda8945.

⛔ Files ignored due to path filters (2)
  • requirements-dev.lock is excluded by !**/*.lock
  • requirements.lock is excluded by !**/*.lock
📒 Files selected for processing (30)
  • .dockerignore
  • .env.example
  • .github/workflows/actions.yml
  • CONTRIBUTING.md
  • Dockerfile
  • Makefile
  • README.md
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/0001_initial.py
  • boost_collector_runner/models.py
  • boost_collector_runner/services.py
  • boost_collector_runner/tests/test_services.py
  • config/health.py
  • config/logging_formatters.py
  • config/settings.py
  • config/test_settings.py
  • config/tests/__init__.py
  • config/tests/test_health.py
  • config/tests/test_logging_json.py
  • config/urls.py
  • docker-compose.prod.yml
  • docker-compose.yml
  • docker-entrypoint.sh
  • docker/gunicorn.conf.py
  • docs/Deployment.md
  • docs/GCP_Production_Checklist.md
  • docs/service_api/README.md
  • docs/service_api/boost_collector_runner.md
  • requirements-dev.in
  • requirements.in

Comment thread boost_collector_runner/management/commands/run_scheduled_collectors.py Outdated
Comment thread config/health.py Outdated
Comment thread config/health.py Outdated
Comment thread config/tests/test_health.py
Comment thread docs/service_api/boost_collector_runner.md Outdated
Comment thread Makefile Outdated
leostar0412 and others added 2 commits May 21, 2026 14:29
…rization. Update health check error handling to provide stable error messages. Improve logging for database and celery worker checks. Add tests to ensure correct behavior when tasks are skipped and handle exceptions gracefully.
Resolve test_commands.py conflict: keep develop stop-on-failure caplog
assertions and --strict YAML test; keep feature on_release skip test.

Co-authored-by: Cursor <[email protected]>
@leostar0412
Copy link
Copy Markdown
Collaborator Author

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
config/tests/test_health.py (1)

69-73: ⚡ Quick win

Add an explicit invalid-token assertion for the bearer auth path.

This test currently validates missing token + valid token, but not wrong token. Add one assertion to lock down the unauthorized branch.

Suggested patch
 `@override_settings`(HEALTH_CHECK_TOKEN="secret-token")
 def test_health_view_requires_bearer_when_token_set(api_client):
     response = api_client.get("/health/")
     assert response.status_code == 401
+    response = api_client.get("/health/", HTTP_AUTHORIZATION="Bearer wrong-token")
+    assert response.status_code == 401
     response = api_client.get("/health/", HTTP_AUTHORIZATION="Bearer secret-token")
     assert response.status_code in (200, 503)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/tests/test_health.py` around lines 69 - 73, In
test_health_view_requires_bearer_when_token_set, add an explicit check for an
invalid bearer token by calling api_client.get("/health/",
HTTP_AUTHORIZATION="Bearer wrong-token") and asserting the response.status_code
== 401 so the unauthorized branch is covered; update the function
(test_health_view_requires_bearer_when_token_set) to include this extra
assertion alongside the existing missing-token and valid-token checks.
config/settings.py (1)

579-605: ⚡ Quick win

Validate LOG_FORMAT values explicitly.

Unknown values currently downgrade silently to text logging. Failing fast (or warning loudly) avoids accidental loss of structured logs in production.

Proposed fix
+from django.core.exceptions import ImproperlyConfigured
+
 # Logging format: text (default) or json (GCP Cloud Logging on stdout)
 LOG_FORMAT = (env("LOG_FORMAT", default="text") or "text").strip().lower()
+if LOG_FORMAT not in {"text", "json"}:
+    raise ImproperlyConfigured(
+        f"Invalid LOG_FORMAT={LOG_FORMAT!r}. Expected 'text' or 'json'."
+    )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/settings.py` around lines 579 - 605, LOG_FORMAT is currently
normalized but not validated, so unknown values silently fall back to text;
update the block around LOG_FORMAT, _LOG_FORMATTERS and _CONSOLE_FORMATTER to
validate allowed values and fail fast: define an allowed set (e.g.
{"text","json","verbose","simple"}), normalize LOG_FORMAT as already done, then
if LOG_FORMAT not in the allowed set raise a ValueError (or log a clear error
and raise) naming the invalid value and listing allowed values; keep the
existing behavior for valid values: when LOG_FORMAT == "json" add the
"cloud_json" entry to _LOG_FORMATTERS and set _CONSOLE_FORMATTER accordingly,
and map "text" to the same console formatter as "verbose" if you want to
preserve current mapping.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/health.py`:
- Around line 36-37: The numeric environment parsing for
HEALTH_CELERY_MIN_WORKERS and HEALTH_CELERY_INSPECT_TIMEOUT (variables
min_workers and timeout) is unguarded and can raise on invalid input; wrap the
int/float casts for min_workers and timeout (and any other numeric casts around
lines 87-88) in a try/except catching ValueError/TypeError, fall back to the
existing default values, and log a warning so the /health/ handler returns a
structured unhealthy response instead of bubbling an exception. Ensure you
update the same code paths that perform numeric casts on lines ~87-88 as well so
all numeric settings are protected.

In `@Dockerfile`:
- Around line 13-18: The Dockerfile installs apt packages without pinned
versions (libpq5, git, curl, gosu) which risks unreproducible builds; update the
RUN line to pin each package to a specific version (or use package=version
syntax and/or apt-get install with the distro's known version variables), and
add a comment explaining the chosen versions; reference the Dockerfile RUN step
that installs libpq5, git, curl, and gosu and ensure the pinned versions are
tested to exist in the base image's apt sources before committing.
- Around line 39-40: The HEALTHCHECK command should include an Authorization
bearer header when HEALTH_CHECK_TOKEN is present so unauthenticated probes don't
return 401; modify the Dockerfile HEALTHCHECK (the line starting with
HEALTHCHECK ... CMD curl ...) to conditionally add -H "Authorization: Bearer
$HEALTH_CHECK_TOKEN" (or equivalent) only when the env var is set, falling back
to the current curl command when it's empty, and apply the same conditional
header logic to the compose override healthcheck so both probes behave
identically.

---

Nitpick comments:
In `@config/settings.py`:
- Around line 579-605: LOG_FORMAT is currently normalized but not validated, so
unknown values silently fall back to text; update the block around LOG_FORMAT,
_LOG_FORMATTERS and _CONSOLE_FORMATTER to validate allowed values and fail fast:
define an allowed set (e.g. {"text","json","verbose","simple"}), normalize
LOG_FORMAT as already done, then if LOG_FORMAT not in the allowed set raise a
ValueError (or log a clear error and raise) naming the invalid value and listing
allowed values; keep the existing behavior for valid values: when LOG_FORMAT ==
"json" add the "cloud_json" entry to _LOG_FORMATTERS and set _CONSOLE_FORMATTER
accordingly, and map "text" to the same console formatter as "verbose" if you
want to preserve current mapping.

In `@config/tests/test_health.py`:
- Around line 69-73: In test_health_view_requires_bearer_when_token_set, add an
explicit check for an invalid bearer token by calling api_client.get("/health/",
HTTP_AUTHORIZATION="Bearer wrong-token") and asserting the response.status_code
== 401 so the unauthorized branch is covered; update the function
(test_health_view_requires_bearer_when_token_set) to include this extra
assertion alongside the existing missing-token and valid-token checks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1ba6eb73-debb-44cb-90fc-4df64a24cfb1

📥 Commits

Reviewing files that changed from the base of the PR and between 2ddb9ee and 83e30d9.

⛔ Files ignored due to path filters (2)
  • requirements-dev.lock is excluded by !**/*.lock
  • requirements.lock is excluded by !**/*.lock
📒 Files selected for processing (31)
  • .dockerignore
  • .env.example
  • .github/workflows/actions.yml
  • CONTRIBUTING.md
  • Dockerfile
  • Makefile
  • README.md
  • boost_collector_runner/management/commands/run_scheduled_collectors.py
  • boost_collector_runner/migrations/0001_initial.py
  • boost_collector_runner/models.py
  • boost_collector_runner/services.py
  • boost_collector_runner/tests/test_commands.py
  • boost_collector_runner/tests/test_services.py
  • config/health.py
  • config/logging_formatters.py
  • config/settings.py
  • config/test_settings.py
  • config/tests/__init__.py
  • config/tests/test_health.py
  • config/tests/test_logging_json.py
  • config/urls.py
  • docker-compose.prod.yml
  • docker-compose.yml
  • docker-entrypoint.sh
  • docker/gunicorn.conf.py
  • docs/Deployment.md
  • docs/GCP_Production_Checklist.md
  • docs/service_api/README.md
  • docs/service_api/boost_collector_runner.md
  • requirements-dev.in
  • requirements.in

Comment thread config/health.py Outdated
Comment thread Dockerfile
Comment thread Dockerfile Outdated
…rization in Docker configuration. Update health check error handling in health.py to manage invalid settings gracefully. Add tests for invalid configuration scenarios in test_health.py to ensure stability and proper error reporting.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker-compose.prod.yml`:
- Around line 20-24: The Dockerfile HEALTHCHECK uses escaped Compose-style
variables ($${HEALTH_CHECK_TOKEN:-} and $${HEALTH_CHECK_TOKEN}) which get
transformed into $$ and literal braces when evaluated by the container shell;
update the HEALTHCHECK command in the Dockerfile (the HEALTHCHECK/CMD sh -c
block) to use shell variable expansion (${HEALTH_CHECK_TOKEN:-} and
${HEALTH_CHECK_TOKEN}) so the container runtime sees the actual token value and
the Authorization header is constructed correctly at runtime; keep
docker-compose.prod.yml unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9d9aac59-6dcb-4018-a9fb-4a5abdf3da6e

📥 Commits

Reviewing files that changed from the base of the PR and between 83e30d9 and bf8683d.

📒 Files selected for processing (4)
  • Dockerfile
  • config/health.py
  • config/tests/test_health.py
  • docker-compose.prod.yml

Comment thread docker-compose.prod.yml
@leostar0412 leostar0412 marked this pull request as ready for review May 21, 2026 23:20
Comment thread config/health.py
… such as stale status, freshness enforcement, and error reporting.
@leostar0412 leostar0412 requested a review from wpak-ai May 22, 2026 15:44
@wpak-ai wpak-ai merged commit 88b1490 into cppalliance:develop May 22, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Production Deployment Support: Docker Image Hardening + Monitoring

3 participants