This directory contains examples of the SOFAI sampling strategy - a two-tier approach using fast and slow models.
Complete example using SOFAI for constraint satisfaction problems (graph coloring).
Key Features:
- Two-tier model architecture (fast S1, slow S2)
- Iterative feedback loop with fast model
- Escalation to slow model when needed
- Custom validation with detailed feedback
- Consumer-grade hardware friendly
- Two-Tier Architecture: Fast model for iteration, slow model for hard cases
- Escalation Strategy: When to switch from fast to slow model
- Feedback Loops: Providing validation feedback to guide generation
- Constraint Satisfaction: Solving CSP problems with LLMs
- Cost Optimization: Using expensive models only when necessary
User Request
↓
S1 (Fast Model) ←─────┐
↓ │
Validation │
↓ │
Pass? ──No→ Feedback ──┘
│
Yes
↓
Success? ──No→ Escalate to S2 (Slow Model)
│ ↓
Yes Validation
↓ ↓
Result Result
from mellea import start_session
from mellea.backends.ollama import OllamaModelBackend
from mellea.stdlib.sampling import SOFAISamplingStrategy
from mellea.stdlib.requirements import req
# Create fast and slow backends
s1_backend = OllamaModelBackend(model_id="granite4:micro")
s2_backend = OllamaModelBackend(model_id="granite4:latest")
# Create SOFAI strategy
strategy = SOFAISamplingStrategy(
s1_backend=s1_backend,
s2_backend=s2_backend,
s1_loop_budget=3, # Try fast model 3 times
escalate_on_failure=True
)
# Use with requirements
m = start_session()
result = m.instruct(
"Solve the problem...",
requirements=[req("Must satisfy constraint X")],
strategy=strategy
)SOFAISamplingStrategy(
s1_backend=fast_backend, # Fast model for iteration
s2_backend=slow_backend, # Slow model for escalation
s1_loop_budget=3, # Max attempts with fast model
escalate_on_failure=True, # Use slow model if fast fails
feedback_strategy="all_errors" # How to provide feedback
)"simple": Basic pass/fail feedback"first_error": Feedback on first failing requirement"all_errors": Detailed feedback on all failures
- Constraint Satisfaction: Graph coloring, scheduling, planning
- Complex Reasoning: Multi-step problems requiring iteration
- Cost Optimization: Use expensive models only when needed
- Quality Assurance: Fast iteration with slow model fallback
- granite4:micro
- llama3.2:3b
- mistral:7b
- granite4:latest
- llama3:70b
- mixtral:8x7b
- Choose appropriate S1 budget: Balance speed vs. success rate
- Use specific requirements: Clear constraints help both models
- Provide good feedback: Detailed validation helps iteration
- Profile your use case: Measure when escalation is needed
- See
mellea/stdlib/sampling/sofai.pyfor implementation - See
test/stdlib/sampling/test_sofai_*.pyfor more examples - See
mellea/stdlib/sampling/for other sampling strategies