Skip to content

regression tests from cyber2 and cyber3#267

Draft
AlexandreYang wants to merge 73 commits into
mainfrom
alex/vuln-hunt-codex-cyber
Draft

regression tests from cyber2 and cyber3#267
AlexandreYang wants to merge 73 commits into
mainfrom
alex/vuln-hunt-codex-cyber

Conversation

@AlexandreYang
Copy link
Copy Markdown
Member

@AlexandreYang AlexandreYang commented May 19, 2026

What does this PR do?

Adds vulnerability-hunting test coverage from the 2026-05-19-gpt-5.5-cyber-2 campaign across the shell interpreter and several builtins. Includes both Go tests and YAML scenarios.

Areas covered:

  • Interpreter:
    • Brace groups: redirect blocking (file / /dev/null / dynamic), fd duplication restoring streams, pipeline exit isolation, input redirect isolation, expansion output as data
    • Heredocs: blocked output redirects, pipe reader closing early, quoted unsupported param literals, readonly in unquoted cmdsubst, stdin restoration after statement
    • Functions: blocked in subshell, blocked with redirects (/dev/null and file), blocked with readonly body
    • until loops: composition with readonly body, redirected loop body writes, redirect to /dev/null, subshell isolation
    • while loops: additional clause coverage
    • Command substitution: hardening tests
    • Variable assignment: same-name multi-assign restores prior value
    • Signal handling: pipeline panic recovery
    • Variable size limits
  • Builtins:
    • cut: sandbox and special-file coverage
    • df: more resilient POSIX/-h parsing in tests
    • grep: numeric overflow rejection
    • pwd: symlink cwd coverage
    • strings: vuln hunt coverage

Motivation

Part of the ongoing vuln-hunt campaign to harden the safe shell interpreter against attacks via shell control structures, redirects, and builtin edge cases.

Testing

  • New scenario YAMLs are asserted against bash by default (those that don't intentionally diverge for sandbox reasons).
  • New Go tests run with go test ./....

Checklist

  • Tests added/updated
  • Documentation updated (if applicable)

@datadog-prod-us1-4
Copy link
Copy Markdown

datadog-prod-us1-4 Bot commented May 19, 2026

Pipelines

Fix all issues with BitsAI

⚠️ Warnings

🚦 4 Pipeline jobs failed

CI | Test (windows-latest)   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). 5 tests failed due to unexpected errors related to file operations and syntax issues in test cases under Windows environment.

Compliance | compliance   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). Header line mismatch in multiple test files. Expected license header not found.

Fuzz Tests | Fuzz Differential (wc)   View in Datadog   GitHub Actions

🔧 Fix in code (Fix with Cursor). wc -w stdout mismatch: expected '1 input.txt\n' but got '2 input.txt\n' in wc_differential_fuzz_test.go:171

View all 4 failed jobs.

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 710b093 | Docs | Datadog PR Page | Give us feedback!

@AlexandreYang AlexandreYang changed the title campaign with gpt 5.5 cyber vuln-hunt: add coverage for brace groups, functions, heredocs, loops, and builtins May 20, 2026
@AlexandreYang AlexandreYang changed the title vuln-hunt: add coverage for brace groups, functions, heredocs, loops, and builtins test: add coverage for brace groups, functions, heredocs, loops, and builtins May 20, 2026
@AlexandreYang
Copy link
Copy Markdown
Member Author

@codex review this PR

Please use the following severity format for all findings:

  • P0 (red): Exploitable vulnerability with high impact (RCE, sandbox bypass, data breach). Blocking merge.
  • P1 (orange): Likely exploitable or high-risk — correctness bugs vs bash, data races, panics.
  • P2 (yellow): Potential vulnerability, bash divergence, missing test coverage, missing docs.
  • P3 (blue): Style, minor simplification, hardening suggestion, nice-to-have test.

Prefix each finding title with its priority label, e.g. "P0: ...", "P1: ...", etc.
Include a summary table at the top with columns: # | Priority | File | Finding.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e72f2db57

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{ cat; } </etc/passwd
cat <ok.txt
expect:
stdout: "ok-payload"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Use required block scalar for scenario output

AGENTS.md for /workspace/rshell says to “Always use the YAML |+ block scalar for input.script, expect.stdout, and expect.stderr values, even single-line ones.” This new scenario uses an inline scalar for expect.stdout here, and the same pattern appears in several of the added scenario files, so the tests do not follow the repository’s documented scenario format.

Useful? React with 👍 / 👎.

@AlexandreYang AlexandreYang reopened this May 21, 2026
@AlexandreYang AlexandreYang changed the title test: add coverage for brace groups, functions, heredocs, loops, and builtins regression tests from cyber2 and cyber3 May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant