Summary
index_repository can crash the native process while indexing a large .sql file. In a full repository run I observed a Tree-sitter assertion:
Assertion failed: symbol < self->token_count, file internal/cbm/vendored/ts_runtime/src/./language.c, line 79
I then narrowed this down to a single large SQL file. I cannot attach the original file because it contains private/customer database schema and data scripts, but the failure is reproducible with that single file copied into an otherwise empty temporary repository.
Environment
- OS: Windows
- Shell: PowerShell
- Binary:
C:/Users/lichen/.local/bin/codebase-memory-mcp.exe
codebase-memory-mcp --version: codebase-memory-mcp 0.8.1
- MCP
initialize serverInfo reported: codebase-memory-mcp 0.10.0
- Command mode:
codebase-memory-mcp cli index_repository ...
Reproduction Shape
Full repository indexing crashed while scanning a repository with many SQL scripts:
codebase-memory-mcp.exe cli index_repository '{"repo_path":"C:/path/to/repo","mode":"full"}'
The full run reached SQL files under Src/Database and then aborted in the Tree-sitter runtime.
To reduce the input, I copied one SQL file into a new empty directory and indexed only that directory:
$root = "$env:TEMP/cbm-sql-repro"
New-Item -ItemType Directory -Force -Path $root | Out-Null
Copy-Item C:/path/to/CreateDB.sql "$root/CreateDB.sql" -Force
codebase-memory-mcp.exe cli index_repository "{`"repo_path`":`"$($root.Replace('\','/'))`",`"mode`":`"full`"}"
The single file is about 2.6 MiB and contains T-SQL database creation/schema/data script content.
Actual Result
Single-file repro exits with native crash code -1073741571 (0xC00000FD, stack overflow on Windows). The output stops during the definitions pass:
level=info msg=mem.init budget_mb=32767 total_ram_mb=65534
level=info msg=pipeline.discover files=1 elapsed_ms=1
level=info msg=pipeline.route path=full
level=info msg=pass.start pass=structure files=1
level=info msg=pass.done pass=structure nodes=2 edges=1
level=info msg=pass.timing pass=structure elapsed_ms=0
level=info msg=pipeline.mode mode=sequential files=1
level=info msg=pkgmap.scan_repo manifests=0
level=info msg=pkgmap.scan manifests_from_files=0 manifests_from_walk=0 entries=0
level=info msg=pass.start pass=definitions files=1
The full repository run also showed the Tree-sitter assertion above:
Assertion failed: symbol < self->token_count, file internal/cbm/vendored/ts_runtime/src/./language.c, line 79
Expected Result
A malformed, huge, or unsupported SQL file should not abort the whole indexing process. Ideally the file should be skipped with an indexed error, or Tree-sitter parse failures should be isolated so index_repository can continue and report failed files.
Relevant Source Observations
From the current repository source:
.sql maps to CBM_LANG_SQL in src/discover/language.c.
- SQL uses the vendored Tree-sitter SQL grammar.
internal/cbm/vendored/grammars/sql/parser.c has TOKEN_COUNT = 429 and SYMBOL_COUNT = 770.
- The assertion is in
internal/cbm/vendored/ts_runtime/src/language.c:79 inside ts_language_table_entry, which expects symbol < self->token_count.
This looks like a Tree-sitter SQL grammar/runtime crash path triggered by large/complex T-SQL input, not a normal SQL syntax error. Normal parse errors should produce ERROR nodes rather than aborting the process.
Notes
I can help test a patched binary or try to produce a minimized/redacted SQL repro if that would be useful. For now I avoided attaching the source SQL because it is private customer project material.
Summary
index_repositorycan crash the native process while indexing a large.sqlfile. In a full repository run I observed a Tree-sitter assertion:I then narrowed this down to a single large SQL file. I cannot attach the original file because it contains private/customer database schema and data scripts, but the failure is reproducible with that single file copied into an otherwise empty temporary repository.
Environment
C:/Users/lichen/.local/bin/codebase-memory-mcp.execodebase-memory-mcp --version:codebase-memory-mcp 0.8.1initializeserverInfo reported:codebase-memory-mcp0.10.0codebase-memory-mcp cli index_repository ...Reproduction Shape
Full repository indexing crashed while scanning a repository with many SQL scripts:
The full run reached SQL files under
Src/Databaseand then aborted in the Tree-sitter runtime.To reduce the input, I copied one SQL file into a new empty directory and indexed only that directory:
The single file is about 2.6 MiB and contains T-SQL database creation/schema/data script content.
Actual Result
Single-file repro exits with native crash code
-1073741571(0xC00000FD, stack overflow on Windows). The output stops during the definitions pass:The full repository run also showed the Tree-sitter assertion above:
Expected Result
A malformed, huge, or unsupported SQL file should not abort the whole indexing process. Ideally the file should be skipped with an indexed error, or Tree-sitter parse failures should be isolated so
index_repositorycan continue and report failed files.Relevant Source Observations
From the current repository source:
.sqlmaps toCBM_LANG_SQLinsrc/discover/language.c.internal/cbm/vendored/grammars/sql/parser.chasTOKEN_COUNT = 429andSYMBOL_COUNT = 770.internal/cbm/vendored/ts_runtime/src/language.c:79insidets_language_table_entry, which expectssymbol < self->token_count.This looks like a Tree-sitter SQL grammar/runtime crash path triggered by large/complex T-SQL input, not a normal SQL syntax error. Normal parse errors should produce
ERRORnodes rather than aborting the process.Notes
I can help test a patched binary or try to produce a minimized/redacted SQL repro if that would be useful. For now I avoided attaching the source SQL because it is private customer project material.