Build, deploy, and manage production AI agents with a single API. Multi-provider LLM orchestration, sandboxed code execution, versioned storage, and real-time streaming — all out of the box.
Works with every major LLM provider
Watch an agent build and deploy a live website — entirely through the API.
A complete backend for AI agents — from LLM calls to live deployment. No glue code, no duct tape.
Anthropic, OpenAI, Google, Ollama, Azure, OpenRouter. Switch providers in one field. Prompt caching cuts costs 85-95% automatically.
Python and Bash sandboxes with a virtual filesystem. Write
open("public/page.html", "w")
and it's live instantly. No deploy step.
Every change tracked with SHA256 dedup. Up to 100 versions per object. Rollback to any previous state with a single API call.
7 hook points — Python or Lua scripts that agents can rewrite themselves. Control stop conditions, tool filters, and result transforms on the fly.
Agents spawn agents with configurable depth and count limits. Structured results let orchestrators read outcomes without parsing messages.
Per-project spend limits, per-session budgets, and sliding-window rate limiting. Agents pause gracefully — no runaway bills, ever.
No SDKs to install, no frameworks to learn. Just HTTP.
curl -X POST https://api.softref.com/v1/sessions \
-H "Authorization: Bearer $TOKEN" \
-d '{"model": "anthropic/claude-sonnet-4-6",
"system": "You are a creative web developer."}'
Pick any model from any provider. Add a system prompt, tools config, hooks, or budget — all optional.
curl -X POST https://api.softref.com/v1/sessions/$ID/messages \
-d '{"role": "user",
"content": "Build a landing page for a coffee shop with
a hero section, menu, and contact form."}'
curl -X POST https://api.softref.com/v1/sessions/$ID/run
The agent loop takes over — LLM reasoning, tool execution, file writes, all handled automatically.
curl https://api.softref.com/v1/sessions/$ID/stream
event: content_delta
data: {"text": "I'll build this with a modern, warm design..."}
event: tool_started
data: {"name": "bash"}
event: tool_completed
data: {"files_written": ["public/index.html"]}
event: session_completed
data: {"status": "idle"}
12 SSE event types give you full visibility into reasoning, tool calls, and results — as they happen.
Architecture
Most agent frameworks run a while-loop in memory. If the process crashes, everything is lost. Softref uses a ratchet pattern — each LLM call is a persistent job. Server restart? The agent picks up exactly where it left off.
Beyond the basics — everything you need to run agents in production.
12 SSE event types — content deltas, tool starts/completions, thinking tokens. 30-second heartbeat. Works with every provider.
Connect from Claude Desktop, Cursor, or any MCP-compatible client. Full OAuth 2.1 with PKCE, dynamic registration, and token rotation.
Any object with a public/ prefix
is served as a live web page. ETag caching, try_files resolution, custom domains.
HMAC-signed HTTP callbacks for session events — completions, errors, tool calls, spend limit breaches. Async delivery with retry.
Schedule agent runs on any cron expression. Execution history, manual triggers, and validation — all through the API.
Project-level chat rooms for humans and agents. Enable human-in-the-loop workflows, multi-agent coordination, and event-driven hooks.
Define custom tools as Lua scripts in object storage. Access objects, HTTP, and JSON APIs. Hot-reloaded on change — no redeploy needed.
Every API request returns a trace ID. Query spans, timings, and errors via the traces API. Full visibility into every LLM call and tool execution.
Branch a session at any message to explore alternative paths. Regenerate the last response, compact long histories, export full transcripts.
No vendor lock-in. No proprietary SDKs. Just a REST API that does what you expect.
Every feature is available through the REST API. The dashboard is just a client. Build your own UI, or go headless.
Use your own API keys for any provider. Stored encrypted as project secrets. Switch models per-session without code changes.
Automatic LLM-based summarization when conversations approach the context window. Model-aware thresholds. Custom compaction prompts via hooks.
Every LLM call logged with tokens, cost, latency. Every tool execution tracked. Session export gives you the complete history as JSON.