Skip to content

feat: recursive schema type system with cross-file model resolution#2788

Open
tempusfrangit wants to merge 10 commits intomainfrom
fix/dict-list-output-schema
Open

feat: recursive schema type system with cross-file model resolution#2788
tempusfrangit wants to merge 10 commits intomainfrom
fix/dict-list-output-schema

Conversation

@tempusfrangit
Copy link
Member

@tempusfrangit tempusfrangit commented Feb 28, 2026

Summary

Improves the static schema generator to handle complex output types (dict, list, nested combinations, cross-file BaseModel imports).

What this does

Schema type system: Replaces the flat OutputType enum with a recursive SchemaType ADT that correctly handles dict[str, list[int]], list[dict[str, float]], and arbitrarily nested types — these previously collapsed to opaque {}.

Cross-file model resolution: When a predictor imports a BaseModel subclass from another local .py file, the parser finds and parses that file automatically. Handles from output_types import X, relative imports, subpackages, and aliases.

Tests

  • 93 unit tests covering recursive nesting, cross-file resolution, error messages, and pydantic compatibility
  • 4 fuzz targets for the type resolver, JSON Schema generation, and parser
  • Integration test for multi-file resolution

The tree-sitter schema parser rejected unparameterized dict and list
return types with 'unsupported type'. This broke the resnet example
and any predictor returning -> dict or -> list.

- dict/Dict map to TypeAny (JSON Schema: {"type": "object"})
- list/List map to OutputList with TypeAny (JSON Schema: {"type": "array", "items": {"type": "object"}})

This matches the old Python schema gen behavior exactly:
  dict -> Dict[str, Any] -> {"type": "object"}
  list -> List[Any] -> {"type": "array", "items": {"type": "object"}}
@tempusfrangit tempusfrangit requested a review from a team as a code owner February 28, 2026 00:40
@tempusfrangit tempusfrangit requested a review from bfirsh February 28, 2026 00:41
@tempusfrangit tempusfrangit enabled auto-merge (squash) February 28, 2026 00:41
@tempusfrangit tempusfrangit added this to the 0.17.0 Release milestone Feb 28, 2026
…nsive tests

Replace the flat OutputType system with a recursive SchemaType algebraic data
type that supports arbitrary nesting (dict[str, list[dict[str, int]]], etc.).

Key changes:
- SchemaType ADT with 7 kinds: Primitive, Any, Array, Dict, Object, Iterator, ConcatIterator
- ResolveSchemaType: recursive resolver replacing ResolveOutputType
- Cross-file model resolution: imports from local .py files are found on disk,
  parsed with tree-sitter, and BaseModel subclasses extracted automatically
- Handles all local import permutations: relative, dotted, subpackage, aliased
- Clear error messages for unresolvable types (includes import source and guidance)
- Remove legacy OutputType, OutputKind, ObjectField, ResolveOutputType
- Thread sourceDir through Parser -> ParsePredictor for filesystem access
- Rewrite architecture/02-schema.md for the static Go parser

Tests: 93 unit tests (12 recursive nesting, 5 unresolvable errors, 3 pydantic
compat, 11 cross-file resolution, 1 end-to-end schema gen) + 1 integration test
@mfainberg-cf mfainberg-cf changed the title fix: support dict and bare list as prediction output types in schema gen feat: recursive schema type system with cross-file model resolution Feb 28, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors Cog’s static schema generation to use a recursive SchemaType algebraic data type for outputs, adds cross-file BaseModel resolution when outputs are imported from local modules, and improves type-resolution error messages.

Changes:

  • Replace the legacy flat output type representation with a recursive SchemaType plus JSONSchema() generation.
  • Update the Python tree-sitter parser and generator plumbing to support cross-file model resolution via a sourceDir parameter.
  • Add/expand unit and integration tests for nested output types, cross-file imports, and new error messages.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pkg/schema/types.go Removes legacy OutputType and switches PredictorInfo.Output to SchemaType.
pkg/schema/schema_type.go Introduces SchemaType, recursive JSON Schema generation, and ResolveSchemaType.
pkg/schema/python/parser.go Adds sourceDir support and cross-file BaseModel discovery via resolveExternalModels.
pkg/schema/python/parser_test.go Updates parser API usage and adds extensive tests for recursion + cross-file resolution.
pkg/schema/openapi.go Uses SchemaType.JSONSchema() for the output component.
pkg/schema/openapi_test.go Updates OpenAPI tests to construct outputs with SchemaType constructors.
pkg/schema/generator.go Extends parser function signature to accept sourceDir and threads it through generation.
pkg/schema/generator_test.go Updates generator tests for new parser signature and SchemaType output construction.
pkg/schema/errors.go Adds ErrUnresolvableType and new actionable error helpers for unresolved outputs.
pkg/image/build.go Minor formatting-only change in static schema generation path.
integration-tests/tests/multi_file_schema.txtar New integration test validating multi-file output type resolution end-to-end.
architecture/02-schema.md Documentation overhaul describing the new static parser pipeline and SchemaType system.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ursive model fields

- Propagate errors from dict value type resolution instead of silently
  falling back to opaque SchemaAny (dict[str, Tensor] now errors)
- Extract UnwrapOptional helper used by ResolveFieldType,
  resolveUnionSchemaType, and resolveFieldSchemaType (3 callsites)
- resolveModelToSchemaType now uses ResolveSchemaType via
  resolveFieldSchemaType, supporting dict/nested types inside BaseModel
  fields (previously limited to primitives, Optional[T], List[T])
- Fix stale comments: Optional rejected not nullable, Iterator allows
  nested types
- Fix garbled unicode in architecture docs, fix SchemaAny table entry
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix nullable incorrectly set on non-required fields instead of
  field.Type.Nullable (debug: bool = False was appearing nullable)
- Remove dead CogArrayType/CogArrayDisplay struct fields (hardcoded
  in coreSchema, never read from struct)
- Distinguish os.ErrNotExist from permission errors in cross-file
  resolution; warn on parse failures instead of silently ignoring
- Fix bare dict/list JSON Schema examples in architecture docs
- Add regression tests: defaulted non-Optional field not nullable,
  Optional field nullable in JSON Schema, dict[str, Tensor] errors
  instead of silently producing SchemaAny (both top-level and inside
  BaseModel fields)
Four fuzz targets exercising the schema pipeline:
- FuzzResolveSchemaType: arbitrary TypeAnnotation trees through the
  recursive resolver
- FuzzJSONSchema: random SchemaType trees through JSON Schema rendering
- FuzzParsePredictor: arbitrary bytes as Python source through the
  tree-sitter parser
- FuzzParseTypeAnnotation: arbitrary return type strings in predict
  signatures

Includes:
- mise task 'test:fuzz' (FUZZTIME=30s per target by default)
- CI job 'fuzz-go' running 30s per target on Go changes
- Byte encoder/decoder for deterministic TypeAnnotation and SchemaType
  tree construction from fuzz corpus
@bfirsh
Copy link
Member

bfirsh commented Mar 2, 2026

Haven't reviewed this in detail, but I can confirm this fixes resnet in cog-examples!

@tempusfrangit tempusfrangit disabled auto-merge March 2, 2026 18:28
@tempusfrangit tempusfrangit enabled auto-merge (rebase) March 2, 2026 18:28
@tempusfrangit tempusfrangit disabled auto-merge March 3, 2026 12:16
@markphelps markphelps force-pushed the fix/dict-list-output-schema branch from c52fc0c to 0045ae9 Compare March 3, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants