Overview
This tracking issue coordinates the migration of 5 legacy Python constant modules from the old JSON-as-data approach to our modern spec + code generation system.
Background
Current (legacy) approach:
- Constants defined in JSON files under
le_utils/resources/
- Python modules load JSON at runtime using
pkgutil.get_data()
- Manual Python constants must be kept in sync with JSON files
- JavaScript code cannot use these constants (no JS export)
- Tests verify Python/JSON sync (which is the pain point)
Target (modern) approach:
- Constants defined once in JSON spec files under
spec/
- Code generation script (
generate_from_specs.py) creates both Python and JavaScript files
- Single source of truth eliminates sync issues
- Automatic JavaScript export enables frontend use
- Already used successfully for 8 modules (modalities, labels, schemas, etc.)
Modules to Migrate
- file_formats.py (FOUNDATION - must be done first)
- licenses.py (blocked by file_formats)
- content_kinds.py (blocked by file_formats, enhances generation for metadata/mappings)
- format_presets.py (blocked by file_formats and content_kinds)
- languages.py (blocked by file_formats)
Migration Strategy
file_formats is the FOUNDATION issue that:
- Enhances
generate_from_specs.py to support namedtuple-based constants
- Establishes the spec format pattern for all other issues
- Includes helper function generation (
getformat())
- Must be completed before the rest can proceed
content_kinds further enhances generation:
- Adds support for metadata-driven code generation (MAPPING dict)
- Must be completed before format_presets
licenses and languages (can be done in parallel after file_formats completes):
- Follow the pattern established in file_formats
- Create spec file using the namedtuple format
- Run generation to create Python/JS files
- Update tests to verify against spec
- Delete old JSON resource file
format_presets must wait for both file_formats and content_kinds to complete.
All 5 modules share a common structure (namedtuples, {MODULE}LIST, choices), with progressive enhancement of the generation script.
Spec File Format
All migrated modules will use this consistent JSON structure in their spec files:
{
"namedtuple": {
"name": "Format",
"fields": ["id", "mimetype"]
},
"constants": {
"mp4": {"mimetype": "video/mp4"},
"webm": {"mimetype": "video/webm"},
"pdf": {"mimetype": "application/pdf"}
}
}
The generation script will use this to create:
- Python namedtuple class:
class Format(namedtuple("Format", ["id", "mimetype"])): pass
- Python LIST variable:
FORMATLIST = [Format(id="mp4", mimetype="video/mp4"), ...]
- Python constants:
MP4 = "mp4", WEBM = "webm", etc.
- Python choices tuple:
choices = ((MP4, "Mp4"), (WEBM, "Webm"), ...)
- JavaScript exports:
export default { MP4: "mp4", WEBM: "webm", ... }
Each module will have different namedtuple fields appropriate to its data:
file_formats: ["id", "mimetype"]
licenses: ["id", "name", "exists", "url", "description", "custom", "copyright_holder_required"]
content_kinds: ["id", "name"] (plus metadata for MAPPING generation)
format_presets: ["id", "kind_id", "allowed_formats", ...] (10+ fields)
languages: ["lang_code", "lang_subcode", "readable_name", ...] (complex structure)
Post-Migration Cleanup
After all 5 modules are migrated:
- Remove
package_data={"le_utils": ["resources/*.json"]} from setup.py
- Delete
le_utils/resources/ directory
- Update README.md to remove manual sync warnings
- Update CHANGELOG.md with migration notes
Benefits
✅ Single source of truth (spec files)
✅ JavaScript export for all constants
✅ Eliminates manual sync requirement
✅ Consistent with modern modules
✅ Better developer experience for contributors
Disclosure
🤖 This issue was written by Claude Code, under supervision, review and final edits by @rtibbles 🤖
Overview
This tracking issue coordinates the migration of 5 legacy Python constant modules from the old JSON-as-data approach to our modern spec + code generation system.
Background
Current (legacy) approach:
le_utils/resources/pkgutil.get_data()Target (modern) approach:
spec/generate_from_specs.py) creates both Python and JavaScript filesModules to Migrate
Migration Strategy
file_formats is the FOUNDATION issue that:
generate_from_specs.pyto support namedtuple-based constantsgetformat())content_kinds further enhances generation:
licenses and languages (can be done in parallel after file_formats completes):
format_presets must wait for both file_formats and content_kinds to complete.
All 5 modules share a common structure (namedtuples, {MODULE}LIST, choices), with progressive enhancement of the generation script.
Spec File Format
All migrated modules will use this consistent JSON structure in their spec files:
{ "namedtuple": { "name": "Format", "fields": ["id", "mimetype"] }, "constants": { "mp4": {"mimetype": "video/mp4"}, "webm": {"mimetype": "video/webm"}, "pdf": {"mimetype": "application/pdf"} } }The generation script will use this to create:
class Format(namedtuple("Format", ["id", "mimetype"])): passFORMATLIST = [Format(id="mp4", mimetype="video/mp4"), ...]MP4 = "mp4",WEBM = "webm", etc.choices = ((MP4, "Mp4"), (WEBM, "Webm"), ...)export default { MP4: "mp4", WEBM: "webm", ... }Each module will have different namedtuple fields appropriate to its data:
file_formats:["id", "mimetype"]licenses:["id", "name", "exists", "url", "description", "custom", "copyright_holder_required"]content_kinds:["id", "name"](plus metadata for MAPPING generation)format_presets:["id", "kind_id", "allowed_formats", ...](10+ fields)languages:["lang_code", "lang_subcode", "readable_name", ...](complex structure)Post-Migration Cleanup
After all 5 modules are migrated:
package_data={"le_utils": ["resources/*.json"]}from setup.pyle_utils/resources/directoryBenefits
✅ Single source of truth (spec files)
✅ JavaScript export for all constants
✅ Eliminates manual sync requirement
✅ Consistent with modern modules
✅ Better developer experience for contributors
Disclosure
🤖 This issue was written by Claude Code, under supervision, review and final edits by @rtibbles 🤖