RubyLLM Responses API

A RubyLLM provider for OpenAI's Responses API.

Installation

gem 'ruby_llm-responses_api'

Quick Start

require 'ruby_llm-responses_api'

RubyLLM.configure do |config|
  config.openai_api_key = ENV['OPENAI_API_KEY']
end

chat = RubyLLM.chat(model: 'gpt-4o-mini', provider: :openai_responses)
response = chat.ask("Hello!")
puts response.content

All standard RubyLLM features work as expected (streaming, tools, vision, structured output).

Stateful Conversations

Conversations automatically chain via previous_response_id:

chat = RubyLLM.chat(model: 'gpt-4o-mini', provider: :openai_responses)
chat.ask("My name is Alice.")
chat.ask("What's my name?")  # => "Your name is Alice."

Rails Persistence

For conversations that survive app restarts, add a migration:

class AddResponseIdToMessages < ActiveRecord::Migration[7.0]
  def change
    add_column :messages, :response_id, :string
  end
end

Then use normally:

# Day 1
chat = Chat.create!(model_id: 'gpt-4o-mini', provider: :openai_responses)
chat.ask("My name is Alice.")

# Day 2 (after restart)
chat = Chat.find(1)
chat.ask("What's my name?")  # => "Alice"

Built-in Tools

The Responses API provides built-in tools that don't require custom implementation. Pass them as hashes via with_params, or use the BuiltInTools helper module.

Web Search

chat.with_params(tools: [{ type: 'web_search_preview' }])
chat.ask("Latest news about Ruby 3.4?")

# Or with helper
tool = RubyLLM::ResponsesAPI::BuiltInTools.web_search(search_context_size: 'high')
chat.with_params(tools: [tool])

Code Interpreter

Execute Python code in a sandbox:

chat.with_params(tools: [{ type: 'code_interpreter' }])
chat.ask("Calculate the first 20 Fibonacci numbers and plot them")

File Search

Search through uploaded files (requires vector store setup):

chat.with_params(tools: [{ type: 'file_search', vector_store_ids: ['vs_abc123'] }])
chat.ask("What does the documentation say about authentication?")

Shell

Execute commands in hosted containers or local terminal environments. Requires GPT-5 family models.

# Auto-provisioned container (default)
chat = RubyLLM.chat(model: 'gpt-5.2', provider: :openai_responses)
chat.with_params(tools: [{ type: 'shell', environment: { type: 'container_auto' } }])
chat.ask("List all Python files in the project")

# Using helper
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell
chat.with_params(tools: [tool])

# Reuse an existing container
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(container_id: 'cntr_abc123')

# With networking (allow specific domains)
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(
  network_policy: {
    type: 'allowlist',
    allowed_domains: ['pypi.org', 'github.com'],
    domain_secrets: [
      { domain: 'github.com', name: 'GITHUB_TOKEN', value: ENV['GITHUB_TOKEN'] }
    ]
  }
)

# With memory limit
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(memory_limit: '4g')

# Local execution (you handle running commands yourself)
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(environment_type: 'local')

Apply Patch

Structured diff-based file editing. Requires GPT-5 family models.

chat = RubyLLM.chat(model: 'gpt-5.2', provider: :openai_responses)
chat.with_params(tools: [{ type: 'apply_patch' }])
chat.ask("Add error handling to the User#save method")

# Using helper
tool = RubyLLM::ResponsesAPI::BuiltInTools.apply_patch
chat.with_params(tools: [tool])

Image Generation

chat.with_params(tools: [{ type: 'image_generation' }])
chat.ask("Generate an image of a sunset over mountains")

MCP (Model Context Protocol)

tool = RubyLLM::ResponsesAPI::BuiltInTools.mcp(
  server_label: 'github',
  server_url: 'https://api.github.com/mcp',
  require_approval: 'never'
)
chat.with_params(tools: [tool])

Combining Tools

chat.with_params(tools: [
  { type: 'web_search_preview' },
  { type: 'code_interpreter' },
  { type: 'shell', environment: { type: 'container_auto' } }
])
chat.ask("Research the latest sorting algorithms and benchmark them")

Server-Side Compaction

For multi-hour agent runs, enable server-side compaction to automatically compress conversation context when it exceeds a token threshold:

chat = RubyLLM.chat(model: 'gpt-4o', provider: :openai_responses)

# Pass directly
chat.with_params(
  context_management: [{ type: 'compaction', compact_threshold: 200_000 }]
)

# Or use the helper
chat.with_params(
  **RubyLLM::ResponsesAPI::Compaction.compaction_params(compact_threshold: 150_000)
)

# Now run a long agent loop without worrying about context limits
loop do
  response = chat.ask(next_prompt)
  break if done?(response)
end

When the token count crosses the threshold, the server automatically compacts the conversation. The compacted state is carried forward transparently via previous_response_id.

Containers API

Manage persistent execution environments for the shell tool and code interpreter:

chat = RubyLLM.chat(model: 'gpt-5.2', provider: :openai_responses)
provider = chat.instance_variable_get(:@provider)

# Create a container
container = provider.create_container(
  name: 'my-analysis-env',
  expires_after: { anchor: 'last_active_at', minutes: 60 },
  memory_limit: '4g'
)

# Use it with the shell tool
tool = RubyLLM::ResponsesAPI::BuiltInTools.shell(container_id: container['id'])
chat.with_params(tools: [tool])
chat.ask("Install pandas and run my analysis script")

# List files created in the container
files = provider.list_container_files(container['id'])

# Retrieve a specific file
content = provider.retrieve_container_file_content(container['id'], file_id)

# Clean up
provider.delete_container(container['id'])

Background Mode

For long-running tasks:

chat = RubyLLM.chat(model: 'gpt-4o', provider: :openai_responses)
chat.with_params(background: true)
response = chat.ask("Analyze this large dataset...")

# Poll for completion
provider = chat.instance_variable_get(:@provider)
result = provider.poll_response(response.response_id, interval: 2.0) do |status|
  puts "Status: #{status['status']}"
end

Parsing Built-in Tool Results

When the API returns results from built-in tools, use the parsers to extract structured data:

# Access raw response output (available via response.raw)
output = response.raw.body['output']

# Parse results by tool type
web_results    = RubyLLM::ResponsesAPI::BuiltInTools.parse_web_search_results(output)
code_results   = RubyLLM::ResponsesAPI::BuiltInTools.parse_code_interpreter_results(output)
file_results   = RubyLLM::ResponsesAPI::BuiltInTools.parse_file_search_results(output)
shell_results  = RubyLLM::ResponsesAPI::BuiltInTools.parse_shell_call_results(output)
patch_results  = RubyLLM::ResponsesAPI::BuiltInTools.parse_apply_patch_results(output)
image_results  = RubyLLM::ResponsesAPI::BuiltInTools.parse_image_generation_results(output)
citations      = RubyLLM::ResponsesAPI::BuiltInTools.extract_citations(message_content)

Batch API

Process many requests asynchronously at 50% lower cost with a 24-hour completion window:

# Create a batch
batch = RubyLLM.batch(model: 'gpt-4o', provider: :openai_responses)

# Add requests (auto-generates IDs or use your own)
batch.add("What is Ruby?")
batch.add("What is Python?", instructions: "Be brief", temperature: 0.5)
batch.add("Translate: hello", id: "translate_1")

# Submit (uploads JSONL file + creates batch)
batch.create!
batch.id  # => "batch_abc123"

# Poll until done
batch.wait!(interval: 60) { |b| puts "#{b.completed_count}/#{b.total_count}" }

# Get results as Messages keyed by custom_id
results = batch.results
results["request_0"].content  # => "Ruby is a dynamic..."
results["translate_1"].content  # => "Hola"

# Resume from a previous session
batch = RubyLLM.batch(id: "batch_abc123", provider: :openai_responses)
batch.results

# Cancel a running batch
batch.cancel!

# List existing batches
RubyLLM.batches(provider: :openai_responses)

Constraints: No web_search/code_interpreter tools, no previous_response_id chaining, max 50k requests per batch, 200MB file limit.

WebSocket Mode

For agentic workflows with many tool-call round trips, WebSocket mode provides lower latency by maintaining a persistent connection instead of HTTP requests per turn.

Requires the websocket-client-simple gem:

gem 'websocket-client-simple'

Usage

Just add transport: :websocket to your params -- the standard chat.ask API works as-is:

chat = RubyLLM.chat(model: 'gpt-4o', provider: :openai_responses)
chat.with_params(transport: :websocket)

chat.ask("Hello!")
chat.ask("What's 2+2?")  # reuses the same WebSocket connection

Streaming works the same way:

chat.ask("Tell me a story") { |chunk| print chunk.content }

Direct WebSocket access

For advanced use cases (raw Responses API format, warmup, explicit connection management):

ws = RubyLLM::ResponsesAPI::WebSocket.new(api_key: ENV['OPENAI_API_KEY'])
ws.connect

ws.create_response(
  model: 'gpt-4o',
  input: [{ type: 'message', role: 'user', content: 'Hello!' }]
) { |chunk| print chunk.content }

# Pre-cache model weights
ws.warmup(model: 'gpt-4o')

ws.disconnect

Why Use the Responses API?

Built-in tools - Web search, code execution, file search, shell, apply patch without custom implementation
Stateful conversations - OpenAI stores context server-side via previous_response_id
Simpler multi-turn - No need to send full message history on each request
Server-side compaction - Run multi-hour agent sessions without hitting context limits
Containers - Persistent execution environments with networking and file management
WebSocket mode - Lower-latency persistent connections for agentic tool-call loops
Batch API - Process bulk requests at 50% lower cost with 24-hour turnaround

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
bin		bin
examples		examples
lib		lib
spec		spec
.gitignore		.gitignore
.mise.toml		.mise.toml
.rspec		.rspec
.rubocop.yml		.rubocop.yml
CHANGELOG.md		CHANGELOG.md
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
ruby_llm-responses_api.gemspec		ruby_llm-responses_api.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RubyLLM Responses API

Installation

Quick Start

Stateful Conversations

Rails Persistence

Built-in Tools

Web Search

Code Interpreter

File Search

Shell

Apply Patch

Image Generation

MCP (Model Context Protocol)

Combining Tools

Server-Side Compaction

Containers API

Background Mode

Parsing Built-in Tool Results

Batch API

WebSocket Mode

Usage

Direct WebSocket access

Why Use the Responses API?

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RubyLLM Responses API

Installation

Quick Start

Stateful Conversations

Rails Persistence

Built-in Tools

Web Search

Code Interpreter

File Search

Shell

Apply Patch

Image Generation

MCP (Model Context Protocol)

Combining Tools

Server-Side Compaction

Containers API

Background Mode

Parsing Built-in Tool Results

Batch API

WebSocket Mode

Usage

Direct WebSocket access

Why Use the Responses API?

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages