Rewrite StrataCoreToGoto for multi-procedure programs with contracts#657
Rewrite StrataCoreToGoto for multi-procedure programs with contracts#657tautschnig wants to merge 1 commit intomainfrom
Conversation
- Iterate over all procedures and functions in the Core program - Apply call elimination to inline contracts (configurable via --no-call-elim) - Re-type-check after call elimination for new variable discovery - Seed type environment with global variable types - Apply GOTO pipeline passes: function inlining, recursive inlining, datatype partial evaluation, datatype axiom/recursive function axiom generation, datatype expression rewriting - Deduplicate symbol table, add default symbols Tests: - test_multi_proc.core.st: multi-procedure Core program with contracts - test_strata_core_to_goto.sh: verifies translation, symbol table, contract annotations, and GOTO assertions Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates StrataCoreToGoto to support translating multi-procedure Strata Core programs (including procedure contracts) into aggregated CBMC GOTO JSON outputs, and adds a regression test exercising the new behavior.
Changes:
- Reworks
StrataCoreToGototo iterate over all procedures/functions, optionally run call elimination, re-type-check, run GOTO pipeline passes, and emit combined symtab/goto JSON. - Adds a
--no-call-elimflag to optionally skip call elimination. - Adds a multi-procedure Core test input plus a bash test that validates translation output, symbol table content, contract annotations, and presence of ASSERT instructions.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.
| File | Description |
|---|---|
StrataCoreToGoto.lean |
New multi-procedure translation + pipeline passes, symbol table aggregation/dedup, default symbol injection, new CLI flag. |
StrataTest/Backends/CBMC/GOTO/test_strata_core_to_goto.sh |
New end-to-end-ish test script validating multi-proc translation artifacts. |
StrataTest/Backends/CBMC/GOTO/test_multi_proc.core.st |
New Core input program with two procedures, contracts, a global var, and an assert. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let baseName := deriveBaseName file | ||
| let programName := opts.programName.getD baseName | ||
| if programName.any "/" || programName.any ".." then | ||
| IO.println "Error: --program-name must be a simple filename without path separators" | ||
| return 1 | ||
| let dir := System.FilePath.mk opts.outputDir | ||
| IO.FS.createDirAll dir |
There was a problem hiding this comment.
programName is now used directly in output filenames without any validation. A user-supplied --program-name containing path separators or .. can cause writes outside --output-dir (e.g., ../../foo.symtab.json). Please reintroduce a robust validation/sanitization step (and consider platform-specific separators) before constructing symTabFile/gotoFile.
| -- Deduplicate symbol table | ||
| let mut seen : Std.HashSet String := {} | ||
| let mut dedupPairs : List (String × Lean.Json) := [] | ||
| for (k, v) in symtabPairs do | ||
| if !seen.contains k then | ||
| seen := seen.insert k | ||
| dedupPairs := dedupPairs ++ [(k, v)] | ||
|
|
||
| -- Add CBMC default symbols | ||
| for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do | ||
| if !seen.contains k then | ||
| seen := seen.insert k | ||
| dedupPairs := dedupPairs ++ [(k, Lean.toJson v)] | ||
|
|
There was a problem hiding this comment.
The symbol-table deduplication keeps the first entry for a given key. This can silently drop richer entries added later (e.g., collectExtraSymbols’s global variable symbol with isStaticLifetime := true can be discarded if an earlier per-procedure extraSymbols entry for the same global name exists). Consider merging via a map/TreeMap where later inserts override earlier ones, or explicitly prefer collectExtraSymbols entries for globals/datatypes over the generic collectSymbolRefs entries.
| -- Deduplicate symbol table | |
| let mut seen : Std.HashSet String := {} | |
| let mut dedupPairs : List (String × Lean.Json) := [] | |
| for (k, v) in symtabPairs do | |
| if !seen.contains k then | |
| seen := seen.insert k | |
| dedupPairs := dedupPairs ++ [(k, v)] | |
| -- Add CBMC default symbols | |
| for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do | |
| if !seen.contains k then | |
| seen := seen.insert k | |
| dedupPairs := dedupPairs ++ [(k, Lean.toJson v)] | |
| -- Deduplicate and merge symbol table entries, preferring later entries on key conflicts | |
| let mut symtabMap : Std.HashMap String Lean.Json := {} | |
| for (k, v) in symtabPairs do | |
| symtabMap := symtabMap.insert k v | |
| -- Add CBMC default symbols only if not already present | |
| for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do | |
| if !symtabMap.contains k then | |
| symtabMap := symtabMap.insert k (Lean.toJson v) | |
| let dedupPairs := symtabMap.toList |
| -- Add CBMC default symbols | ||
| for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do | ||
| if !seen.contains k then | ||
| seen := seen.insert k | ||
| dedupPairs := dedupPairs ++ [(k, Lean.toJson v)] | ||
|
|
There was a problem hiding this comment.
Default CBMC symbols are currently only added when the key is not already present. Previously wrapSymtab would overwrite existing keys with the built-ins; now a user program (or earlier collected symbols) can shadow required __CPROVER_* symbols, which may break downstream CBMC tooling. Consider either (1) ensuring default symbols override on collision, or (2) detecting collisions with default symbols and failing fast with a clear error.
| -- Add CBMC default symbols | |
| for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do | |
| if !seen.contains k then | |
| seen := seen.insert k | |
| dedupPairs := dedupPairs ++ [(k, Lean.toJson v)] | |
| -- Add CBMC default symbols, failing fast on collisions so required | |
| -- builtin symbols cannot be shadowed by user or collected symbols. | |
| let mut collision : Option String := none | |
| for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do | |
| if seen.contains k then | |
| collision := some k | |
| else | |
| seen := seen.insert k | |
| dedupPairs := dedupPairs ++ [(k, Lean.toJson v)] | |
| match collision with | |
| | some k => | |
| IO.println s!"Error: symbol table already defines required CBMC builtin symbol '{k}'." | |
| IO.println "Please rename the conflicting program symbol or adjust the input before translation." | |
| return 1 | |
| | none => | |
| pure () |
| for p in procs do | ||
| let procName := Core.CoreIdent.toPretty p.header.name | ||
| match procedureToGotoCtx Env p (axioms := axioms) (distincts := distincts) | ||
| (varTypes := tcPgm.getVarTy?) with | ||
| | .error e => IO.println s!"Warning: skipping procedure {procName}: {e}" | ||
| | .ok (ctx, liftedFuncs) => | ||
| allLiftedFuncs := allLiftedFuncs ++ liftedFuncs | ||
| let ctx := rewriteDatatypeExprsInCtx gotoDtInfo ctx | ||
| let ctx ← lowerAddressOfInCtx ctx | ||
| let json ← IO.ofExcept (CoreToGOTO.CProverGOTO.Context.toJson procName ctx) | ||
| match json.symtab with | ||
| | .obj m => symtabPairs := symtabPairs ++ m.toList | ||
| | _ => pure () | ||
| match json.goto with | ||
| | .obj m => | ||
| match m.toList.find? (·.1 == "functions") with | ||
| | some (_, .arr fns) => gotoFns := gotoFns ++ fns | ||
| | _ => pure () | ||
| | _ => pure () | ||
|
|
||
| for f in funcs ++ allLiftedFuncs do | ||
| let funcName := Core.CoreIdent.toPretty f.name | ||
| match functionToGotoCtx Env f with | ||
| | .error e => IO.println s!"Warning: skipping function {funcName}: {e}" | ||
| | .ok ctx => |
There was a problem hiding this comment.
Procedure/function translation errors are currently logged as warnings and then translation continues, but the tool still exits with success and writes output. This can produce incomplete/incorrect GOTO JSON without failing the command. Please accumulate failures from procedureToGotoCtx/functionToGotoCtx and return a non-zero exit code (or stop immediately) when any translation fails.
| let Ctx := { Lambda.LContext.default with | ||
| functions := Core.Factory, knownTypes := Core.KnownTypes } | ||
| let Env := Lambda.TEnv.default |
There was a problem hiding this comment.
Ctx is defined but never used. Lean will emit an unused variable warning, and if CI treats warnings strictly this can fail the build. Please remove Ctx (and any other unused bindings) or use it where intended.
| let Ctx := { Lambda.LContext.default with | |
| functions := Core.Factory, knownTypes := Core.KnownTypes } | |
| let Env := Lambda.TEnv.default |
| dedupPairs := dedupPairs ++ [(k, v)] | ||
|
|
||
| -- Add CBMC default symbols | ||
| for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do | ||
| if !seen.contains k then | ||
| seen := seen.insert k | ||
| dedupPairs := dedupPairs ++ [(k, Lean.toJson v)] | ||
|
|
||
| let symtab := Lean.Json.mkObj [("symbolTable", Lean.Json.mkObj dedupPairs)] |
There was a problem hiding this comment.
dedupPairs := dedupPairs ++ [(k, v)] appends to the end of a List inside a loop, which is O(n) each time and can lead to quadratic behavior as the symbol table grows. Consider using List.cons (and reverse at the end) or inserting into a map/TreeMap as you deduplicate.
| dedupPairs := dedupPairs ++ [(k, v)] | |
| -- Add CBMC default symbols | |
| for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do | |
| if !seen.contains k then | |
| seen := seen.insert k | |
| dedupPairs := dedupPairs ++ [(k, Lean.toJson v)] | |
| let symtab := Lean.Json.mkObj [("symbolTable", Lean.Json.mkObj dedupPairs)] | |
| dedupPairs := (k, v) :: dedupPairs | |
| -- Add CBMC default symbols | |
| for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do | |
| if !seen.contains k then | |
| seen := seen.insert k | |
| dedupPairs := (k, Lean.toJson v) :: dedupPairs | |
| let symtab := Lean.Json.mkObj [("symbolTable", Lean.Json.mkObj dedupPairs.reverse)] |
| with open('$WORK/test_multi_proc.symtab.json') as f: | ||
| st = json.load(f)['symbolTable'] | ||
| assert 'g' in st, 'global variable g not in symtab' | ||
| print(' OK: global variable g in symbol table') |
There was a problem hiding this comment.
Test 3 currently only asserts that g exists in the symbol table. Given the translator also emits a generic per-procedure symbol entry for referenced globals, this check can pass even if the global symbol (with isStaticLifetime := true, initializer, etc.) was dropped during deduplication. Consider asserting on an attribute that distinguishes the real global symbol (e.g., st['g']['isStaticLifetime'] is True).
| print(' OK: global variable g in symbol table') | |
| assert st['g'].get('isStaticLifetime') is True, 'global variable g does not have static lifetime' | |
| print(' OK: global variable g with static lifetime in symbol table') |
| # 3. Contract annotations on procedures | ||
| # 4. Assertions in GOTO output | ||
|
|
||
| set -eo pipefail |
There was a problem hiding this comment.
For consistency with other bash-based tests in this repo, consider using set -euo pipefail instead of set -eo pipefail so unset variables fail fast during CI runs.
| set -eo pipefail | |
| set -euo pipefail |
| let json ← IO.ofExcept (CoreToGOTO.CProverGOTO.Context.toJson procName ctx) | ||
| match json.symtab with | ||
| | .obj m => symtabPairs := symtabPairs ++ m.toList | ||
| | _ => pure () |
There was a problem hiding this comment.
symtabPairs := symtabPairs ++ m.toList in this loop repeatedly appends to the end of a List, which is O(n) per iteration and can become quadratic for larger symbol tables. Consider accumulating into an Array/map (or building with List.cons and reversing once) to keep the overall runtime linear.
FUNCTION_CALL callees (e.g., print) had Empty type in the symbol table because coreStmtsToGoto created the callee expression with .Empty type. CBMC instrument_preconditions crashed trying to_code_type on Empty. Fix: add Ty.Identifier.code and Ty.mkCode, use it for the callee expression in coreStmtsToGoto. The code type flows through collectSymbolRefs into the symbol table naturally. Fixes 16 Python CBMC tests that use class-related function calls. See PRs like #657 for current failures on that. By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. --------- Co-authored-by: Kiro <kiro-agent@users.noreply.github.com> Co-authored-by: Josh Cohen <36058610+joscoh@users.noreply.github.com>
Tests:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.