Skip to content

Rewrite StrataCoreToGoto for multi-procedure programs with contracts#657

Open
tautschnig wants to merge 1 commit intomainfrom
tautschnig/stratacoretogoto
Open

Rewrite StrataCoreToGoto for multi-procedure programs with contracts#657
tautschnig wants to merge 1 commit intomainfrom
tautschnig/stratacoretogoto

Conversation

@tautschnig
Copy link
Copy Markdown
Contributor

  • Iterate over all procedures and functions in the Core program
  • Apply call elimination to inline contracts (configurable via --no-call-elim)
  • Re-type-check after call elimination for new variable discovery
  • Seed type environment with global variable types
  • Apply GOTO pipeline passes: function inlining, recursive inlining, datatype partial evaluation, datatype axiom/recursive function axiom generation, datatype expression rewriting
  • Deduplicate symbol table, add default symbols

Tests:

  • test_multi_proc.core.st: multi-procedure Core program with contracts
  • test_strata_core_to_goto.sh: verifies translation, symbol table, contract annotations, and GOTO assertions

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- Iterate over all procedures and functions in the Core program
- Apply call elimination to inline contracts (configurable via --no-call-elim)
- Re-type-check after call elimination for new variable discovery
- Seed type environment with global variable types
- Apply GOTO pipeline passes: function inlining, recursive inlining,
  datatype partial evaluation, datatype axiom/recursive function axiom
  generation, datatype expression rewriting
- Deduplicate symbol table, add default symbols

Tests:
- test_multi_proc.core.st: multi-procedure Core program with contracts
- test_strata_core_to_goto.sh: verifies translation, symbol table,
  contract annotations, and GOTO assertions

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates StrataCoreToGoto to support translating multi-procedure Strata Core programs (including procedure contracts) into aggregated CBMC GOTO JSON outputs, and adds a regression test exercising the new behavior.

Changes:

  • Reworks StrataCoreToGoto to iterate over all procedures/functions, optionally run call elimination, re-type-check, run GOTO pipeline passes, and emit combined symtab/goto JSON.
  • Adds a --no-call-elim flag to optionally skip call elimination.
  • Adds a multi-procedure Core test input plus a bash test that validates translation output, symbol table content, contract annotations, and presence of ASSERT instructions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
StrataCoreToGoto.lean New multi-procedure translation + pipeline passes, symbol table aggregation/dedup, default symbol injection, new CLI flag.
StrataTest/Backends/CBMC/GOTO/test_strata_core_to_goto.sh New end-to-end-ish test script validating multi-proc translation artifacts.
StrataTest/Backends/CBMC/GOTO/test_multi_proc.core.st New Core input program with two procedures, contracts, a global var, and an assert.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 56 to 59
let baseName := deriveBaseName file
let programName := opts.programName.getD baseName
if programName.any "/" || programName.any ".." then
IO.println "Error: --program-name must be a simple filename without path separators"
return 1
let dir := System.FilePath.mk opts.outputDir
IO.FS.createDirAll dir
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

programName is now used directly in output filenames without any validation. A user-supplied --program-name containing path separators or .. can cause writes outside --output-dir (e.g., ../../foo.symtab.json). Please reintroduce a robust validation/sanitization step (and consider platform-specific separators) before constructing symTabFile/gotoFile.

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +195
-- Deduplicate symbol table
let mut seen : Std.HashSet String := {}
let mut dedupPairs : List (String × Lean.Json) := []
for (k, v) in symtabPairs do
if !seen.contains k then
seen := seen.insert k
dedupPairs := dedupPairs ++ [(k, v)]

-- Add CBMC default symbols
for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do
if !seen.contains k then
seen := seen.insert k
dedupPairs := dedupPairs ++ [(k, Lean.toJson v)]

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symbol-table deduplication keeps the first entry for a given key. This can silently drop richer entries added later (e.g., collectExtraSymbols’s global variable symbol with isStaticLifetime := true can be discarded if an earlier per-procedure extraSymbols entry for the same global name exists). Consider merging via a map/TreeMap where later inserts override earlier ones, or explicitly prefer collectExtraSymbols entries for globals/datatypes over the generic collectSymbolRefs entries.

Suggested change
-- Deduplicate symbol table
let mut seen : Std.HashSet String := {}
let mut dedupPairs : List (String × Lean.Json) := []
for (k, v) in symtabPairs do
if !seen.contains k then
seen := seen.insert k
dedupPairs := dedupPairs ++ [(k, v)]
-- Add CBMC default symbols
for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do
if !seen.contains k then
seen := seen.insert k
dedupPairs := dedupPairs ++ [(k, Lean.toJson v)]
-- Deduplicate and merge symbol table entries, preferring later entries on key conflicts
let mut symtabMap : Std.HashMap String Lean.Json := {}
for (k, v) in symtabPairs do
symtabMap := symtabMap.insert k v
-- Add CBMC default symbols only if not already present
for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do
if !symtabMap.contains k then
symtabMap := symtabMap.insert k (Lean.toJson v)
let dedupPairs := symtabMap.toList

Copilot uses AI. Check for mistakes.
Comment on lines +190 to +195
-- Add CBMC default symbols
for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do
if !seen.contains k then
seen := seen.insert k
dedupPairs := dedupPairs ++ [(k, Lean.toJson v)]

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default CBMC symbols are currently only added when the key is not already present. Previously wrapSymtab would overwrite existing keys with the built-ins; now a user program (or earlier collected symbols) can shadow required __CPROVER_* symbols, which may break downstream CBMC tooling. Consider either (1) ensuring default symbols override on collision, or (2) detecting collisions with default symbols and failing fast with a clear error.

Suggested change
-- Add CBMC default symbols
for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do
if !seen.contains k then
seen := seen.insert k
dedupPairs := dedupPairs ++ [(k, Lean.toJson v)]
-- Add CBMC default symbols, failing fast on collisions so required
-- builtin symbols cannot be shadowed by user or collected symbols.
let mut collision : Option String := none
for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do
if seen.contains k then
collision := some k
else
seen := seen.insert k
dedupPairs := dedupPairs ++ [(k, Lean.toJson v)]
match collision with
| some k =>
IO.println s!"Error: symbol table already defines required CBMC builtin symbol '{k}'."
IO.println "Please rename the conflicting program symbol or adjust the input before translation."
return 1
| none =>
pure ()

Copilot uses AI. Check for mistakes.
Comment on lines +141 to +165
for p in procs do
let procName := Core.CoreIdent.toPretty p.header.name
match procedureToGotoCtx Env p (axioms := axioms) (distincts := distincts)
(varTypes := tcPgm.getVarTy?) with
| .error e => IO.println s!"Warning: skipping procedure {procName}: {e}"
| .ok (ctx, liftedFuncs) =>
allLiftedFuncs := allLiftedFuncs ++ liftedFuncs
let ctx := rewriteDatatypeExprsInCtx gotoDtInfo ctx
let ctx ← lowerAddressOfInCtx ctx
let json ← IO.ofExcept (CoreToGOTO.CProverGOTO.Context.toJson procName ctx)
match json.symtab with
| .obj m => symtabPairs := symtabPairs ++ m.toList
| _ => pure ()
match json.goto with
| .obj m =>
match m.toList.find? (·.1 == "functions") with
| some (_, .arr fns) => gotoFns := gotoFns ++ fns
| _ => pure ()
| _ => pure ()

for f in funcs ++ allLiftedFuncs do
let funcName := Core.CoreIdent.toPretty f.name
match functionToGotoCtx Env f with
| .error e => IO.println s!"Warning: skipping function {funcName}: {e}"
| .ok ctx =>
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Procedure/function translation errors are currently logged as warnings and then translation continues, but the tool still exits with success and writes output. This can produce incomplete/incorrect GOTO JSON without failing the command. Please accumulate failures from procedureToGotoCtx/functionToGotoCtx and return a non-zero exit code (or stop immediately) when any translation fails.

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +76
let Ctx := { Lambda.LContext.default with
functions := Core.Factory, knownTypes := Core.KnownTypes }
let Env := Lambda.TEnv.default
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ctx is defined but never used. Lean will emit an unused variable warning, and if CI treats warnings strictly this can fail the build. Please remove Ctx (and any other unused bindings) or use it where intended.

Suggested change
let Ctx := { Lambda.LContext.default with
functions := Core.Factory, knownTypes := Core.KnownTypes }
let Env := Lambda.TEnv.default

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +196
dedupPairs := dedupPairs ++ [(k, v)]

-- Add CBMC default symbols
for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do
if !seen.contains k then
seen := seen.insert k
dedupPairs := dedupPairs ++ [(k, Lean.toJson v)]

let symtab := Lean.Json.mkObj [("symbolTable", Lean.Json.mkObj dedupPairs)]
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dedupPairs := dedupPairs ++ [(k, v)] appends to the end of a List inside a loop, which is O(n) each time and can lead to quadratic behavior as the symbol table grows. Consider using List.cons (and reverse at the end) or inserting into a map/TreeMap as you deduplicate.

Suggested change
dedupPairs := dedupPairs ++ [(k, v)]
-- Add CBMC default symbols
for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do
if !seen.contains k then
seen := seen.insert k
dedupPairs := dedupPairs ++ [(k, Lean.toJson v)]
let symtab := Lean.Json.mkObj [("symbolTable", Lean.Json.mkObj dedupPairs)]
dedupPairs := (k, v) :: dedupPairs
-- Add CBMC default symbols
for (k, v) in CProverGOTO.defaultSymbols (moduleName := programName) do
if !seen.contains k then
seen := seen.insert k
dedupPairs := (k, Lean.toJson v) :: dedupPairs
let symtab := Lean.Json.mkObj [("symbolTable", Lean.Json.mkObj dedupPairs.reverse)]

Copilot uses AI. Check for mistakes.
with open('$WORK/test_multi_proc.symtab.json') as f:
st = json.load(f)['symbolTable']
assert 'g' in st, 'global variable g not in symtab'
print(' OK: global variable g in symbol table')
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test 3 currently only asserts that g exists in the symbol table. Given the translator also emits a generic per-procedure symbol entry for referenced globals, this check can pass even if the global symbol (with isStaticLifetime := true, initializer, etc.) was dropped during deduplication. Consider asserting on an attribute that distinguishes the real global symbol (e.g., st['g']['isStaticLifetime'] is True).

Suggested change
print(' OK: global variable g in symbol table')
assert st['g'].get('isStaticLifetime') is True, 'global variable g does not have static lifetime'
print(' OK: global variable g with static lifetime in symbol table')

Copilot uses AI. Check for mistakes.
# 3. Contract annotations on procedures
# 4. Assertions in GOTO output

set -eo pipefail
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with other bash-based tests in this repo, consider using set -euo pipefail instead of set -eo pipefail so unset variables fail fast during CI runs.

Suggested change
set -eo pipefail
set -euo pipefail

Copilot uses AI. Check for mistakes.
Comment on lines +150 to +153
let json ← IO.ofExcept (CoreToGOTO.CProverGOTO.Context.toJson procName ctx)
match json.symtab with
| .obj m => symtabPairs := symtabPairs ++ m.toList
| _ => pure ()
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symtabPairs := symtabPairs ++ m.toList in this loop repeatedly appends to the end of a List, which is O(n) per iteration and can become quadratic for larger symbol tables. Consider accumulating into an Array/map (or building with List.cons and reversing once) to keep the overall runtime linear.

Copilot uses AI. Check for mistakes.
@tautschnig tautschnig self-assigned this Mar 25, 2026
github-merge-queue bot pushed a commit that referenced this pull request Mar 26, 2026
FUNCTION_CALL callees (e.g., print) had Empty type in the symbol table
because coreStmtsToGoto created the callee expression with .Empty type.
CBMC instrument_preconditions crashed trying to_code_type on Empty.

Fix: add Ty.Identifier.code and Ty.mkCode, use it for the callee
expression in coreStmtsToGoto. The code type flows through
collectSymbolRefs into the symbol table naturally.

Fixes 16 Python CBMC tests that use class-related function calls. See
PRs like #657 for current failures on that.

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Co-authored-by: Josh Cohen <36058610+joscoh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants