feat: recursive schema type system with cross-file model resolution#2788
feat: recursive schema type system with cross-file model resolution#2788tempusfrangit wants to merge 10 commits intomainfrom
Conversation
The tree-sitter schema parser rejected unparameterized dict and list
return types with 'unsupported type'. This broke the resnet example
and any predictor returning -> dict or -> list.
- dict/Dict map to TypeAny (JSON Schema: {"type": "object"})
- list/List map to OutputList with TypeAny (JSON Schema: {"type": "array", "items": {"type": "object"}})
This matches the old Python schema gen behavior exactly:
dict -> Dict[str, Any] -> {"type": "object"}
list -> List[Any] -> {"type": "array", "items": {"type": "object"}}
…nsive tests Replace the flat OutputType system with a recursive SchemaType algebraic data type that supports arbitrary nesting (dict[str, list[dict[str, int]]], etc.). Key changes: - SchemaType ADT with 7 kinds: Primitive, Any, Array, Dict, Object, Iterator, ConcatIterator - ResolveSchemaType: recursive resolver replacing ResolveOutputType - Cross-file model resolution: imports from local .py files are found on disk, parsed with tree-sitter, and BaseModel subclasses extracted automatically - Handles all local import permutations: relative, dotted, subpackage, aliased - Clear error messages for unresolvable types (includes import source and guidance) - Remove legacy OutputType, OutputKind, ObjectField, ResolveOutputType - Thread sourceDir through Parser -> ParsePredictor for filesystem access - Rewrite architecture/02-schema.md for the static Go parser Tests: 93 unit tests (12 recursive nesting, 5 unresolvable errors, 3 pydantic compat, 11 cross-file resolution, 1 end-to-end schema gen) + 1 integration test
There was a problem hiding this comment.
Pull request overview
This PR refactors Cog’s static schema generation to use a recursive SchemaType algebraic data type for outputs, adds cross-file BaseModel resolution when outputs are imported from local modules, and improves type-resolution error messages.
Changes:
- Replace the legacy flat output type representation with a recursive
SchemaTypeplusJSONSchema()generation. - Update the Python tree-sitter parser and generator plumbing to support cross-file model resolution via a
sourceDirparameter. - Add/expand unit and integration tests for nested output types, cross-file imports, and new error messages.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/schema/types.go | Removes legacy OutputType and switches PredictorInfo.Output to SchemaType. |
| pkg/schema/schema_type.go | Introduces SchemaType, recursive JSON Schema generation, and ResolveSchemaType. |
| pkg/schema/python/parser.go | Adds sourceDir support and cross-file BaseModel discovery via resolveExternalModels. |
| pkg/schema/python/parser_test.go | Updates parser API usage and adds extensive tests for recursion + cross-file resolution. |
| pkg/schema/openapi.go | Uses SchemaType.JSONSchema() for the output component. |
| pkg/schema/openapi_test.go | Updates OpenAPI tests to construct outputs with SchemaType constructors. |
| pkg/schema/generator.go | Extends parser function signature to accept sourceDir and threads it through generation. |
| pkg/schema/generator_test.go | Updates generator tests for new parser signature and SchemaType output construction. |
| pkg/schema/errors.go | Adds ErrUnresolvableType and new actionable error helpers for unresolved outputs. |
| pkg/image/build.go | Minor formatting-only change in static schema generation path. |
| integration-tests/tests/multi_file_schema.txtar | New integration test validating multi-file output type resolution end-to-end. |
| architecture/02-schema.md | Documentation overhaul describing the new static parser pipeline and SchemaType system. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ursive model fields - Propagate errors from dict value type resolution instead of silently falling back to opaque SchemaAny (dict[str, Tensor] now errors) - Extract UnwrapOptional helper used by ResolveFieldType, resolveUnionSchemaType, and resolveFieldSchemaType (3 callsites) - resolveModelToSchemaType now uses ResolveSchemaType via resolveFieldSchemaType, supporting dict/nested types inside BaseModel fields (previously limited to primitives, Optional[T], List[T]) - Fix stale comments: Optional rejected not nullable, Iterator allows nested types - Fix garbled unicode in architecture docs, fix SchemaAny table entry
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix nullable incorrectly set on non-required fields instead of field.Type.Nullable (debug: bool = False was appearing nullable) - Remove dead CogArrayType/CogArrayDisplay struct fields (hardcoded in coreSchema, never read from struct) - Distinguish os.ErrNotExist from permission errors in cross-file resolution; warn on parse failures instead of silently ignoring - Fix bare dict/list JSON Schema examples in architecture docs - Add regression tests: defaulted non-Optional field not nullable, Optional field nullable in JSON Schema, dict[str, Tensor] errors instead of silently producing SchemaAny (both top-level and inside BaseModel fields)
Four fuzz targets exercising the schema pipeline: - FuzzResolveSchemaType: arbitrary TypeAnnotation trees through the recursive resolver - FuzzJSONSchema: random SchemaType trees through JSON Schema rendering - FuzzParsePredictor: arbitrary bytes as Python source through the tree-sitter parser - FuzzParseTypeAnnotation: arbitrary return type strings in predict signatures Includes: - mise task 'test:fuzz' (FUZZTIME=30s per target by default) - CI job 'fuzz-go' running 30s per target on Go changes - Byte encoder/decoder for deterministic TypeAnnotation and SchemaType tree construction from fuzz corpus
|
Haven't reviewed this in detail, but I can confirm this fixes resnet in cog-examples! |
c52fc0c to
0045ae9
Compare
Summary
Improves the static schema generator to handle complex output types (
dict,list, nested combinations, cross-file BaseModel imports).What this does
Schema type system: Replaces the flat
OutputTypeenum with a recursiveSchemaTypeADT that correctly handlesdict[str, list[int]],list[dict[str, float]], and arbitrarily nested types — these previously collapsed to opaque{}.Cross-file model resolution: When a predictor imports a
BaseModelsubclass from another local.pyfile, the parser finds and parses that file automatically. Handlesfrom output_types import X, relative imports, subpackages, and aliases.Tests