feat: Add task state migration from MongoDB to PostgreSQL#128
Draft
feat: Add task state migration from MongoDB to PostgreSQL#128
Conversation
Implement phased migration strategy for task states: - Phase 0: Add feature flag TASK_STATE_STORAGE_PHASE - Add TaskStateORM model and Alembic migration - Create TaskStatePostgresRepository for PostgreSQL storage - Create TaskStateDualRepository for phased rollout Migration phases supported: - mongodb: Legacy behavior (MongoDB only) - dual_write: Write to both, read from MongoDB - dual_read: Write to both, read from both with verification - postgres: PostgreSQL only (target state) Includes: - Datadog StatsD metrics for dual_read verification - Backfill script for existing MongoDB data - Verification script for data consistency checks - Unit tests for all repository operations and metrics
7685cef to
a9f298c
Compare
- Add benchmark scripts for comparing MongoDB vs PostgreSQL performance: - benchmark_task_state.py: Repository-level benchmarks - benchmark_api.py: API-level benchmarks with connection pooling - compare_results.py: Generate markdown comparison reports - locustfile.py: Cluster load tests with Locust - Add storage_backend query parameter to dynamically switch backends: - Enables benchmarking without server restarts - Valid values: mongodb, dual_write, dual_read, postgres - Fix FastAPI dependency injection in TaskStateDualRepository: - Remove 'from __future__ import annotations' which broke DI resolution - Use List from typing to avoid 'list' method name shadowing - Update authorization_shortcuts to use DTaskStateDualRepository Benchmark results show MongoDB ~20-30% faster than PostgreSQL at API level.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements a phased migration strategy for task state storage from MongoDB to PostgreSQL with zero-downtime rollout capability.
Key changes:
TASK_STATE_STORAGE_PHASEfeature flag to control migration phasestask_statesPostgreSQL table with JSONB state columnMigration Phases
mongodbdual_writedual_readpostgresFiles Changed
Core Implementation
src/adapters/orm.py- AddedTaskStateORMmodelsrc/config/environment_variables.py- AddedTASK_STATE_STORAGE_PHASEenv varsrc/domain/repositories/task_state_postgres_repository.py- PostgreSQL repository (new)src/domain/repositories/task_state_dual_repository.py- Dual-write wrapper (new)src/domain/use_cases/states_use_case.py- Updated to use dual repositoryDatabase Migration
database/migrations/alembic/versions/2026_01_12_0000_add_task_states_table_*.py- Alembic migrationScripts
scripts/backfill_task_states.py- Backfill existing MongoDB data to PostgreSQLscripts/verify_task_states.py- Verify data consistency between databasesTests
tests/unit/repositories/test_task_state_postgres_repository.py- 2 teststests/unit/repositories/test_task_state_dual_repository.py- 35 testsMetrics (dual_read phase)
task_state.dual_read.matchtask_state.dual_read.mismatch.missing_postgrestask_state.dual_read.mismatch.missing_mongodbtask_state.dual_read.mismatch.state_contenttask_state.dual_read.list_count_mismatchRollout Plan
TASK_STATE_STORAGE_PHASE=mongodb(no behavior change)python scripts/backfill_task_states.pyTASK_STATE_STORAGE_PHASE=dual_writeTASK_STATE_STORAGE_PHASE=dual_read, monitor metricsTASK_STATE_STORAGE_PHASE=postgresRollback
Set
TASK_STATE_STORAGE_PHASEback to the previous phase at any time.Test plan