Now in public beta

The infrastructure for
AI agents

Build, deploy, and manage production AI agents with a single API. Multi-provider LLM orchestration, sandboxed code execution, versioned storage, and real-time streaming — all out of the box.

Works with every major LLM provider

Anthropic
OpenAI
Google
Azure
Ollama
OpenRouter

From prompt to production in seconds

Watch an agent build and deploy a live website — entirely through the API.

terminal
$ curl -X POST https://api.softref.com/v1/sessions/run \
  -d '{"message": "Build a landing page for a coffee shop"}'
 
event: content_delta
data: {"text": "I'll create a beautiful coffee shop landing page..."}
 
event: tool_started
data: {"name": "bash", "input": "python build_site.py"}
 
event: tool_completed
data: {"files_written": ["public/index.html", "public/style.css"]}
 
event: content_delta
data: {"text": "Your site is live! Here's the URL:"}
 
event: session_completed
data: {"status": "idle"}
 
$ https://coffee-shop.softref.com ✓ Live

Everything agents need to ship

A complete backend for AI agents — from LLM calls to live deployment. No glue code, no duct tape.

Multi-Provider LLM

Anthropic, OpenAI, Google, Ollama, Azure, OpenRouter. Switch providers in one field. Prompt caching cuts costs 85-95% automatically.

Sandboxed Execution

Python and Bash sandboxes with a virtual filesystem. Write open("public/page.html", "w") and it's live instantly. No deploy step.

Versioned Objects

Every change tracked with SHA256 dedup. Up to 100 versions per object. Rollback to any previous state with a single API call.

Self-Programming Hooks

7 hook points — Python or Lua scripts that agents can rewrite themselves. Control stop conditions, tool filters, and result transforms on the fly.

Sub-Agent Orchestration

Agents spawn agents with configurable depth and count limits. Structured results let orchestrators read outcomes without parsing messages.

Budget & Rate Limits

Per-project spend limits, per-session budgets, and sliding-window rate limiting. Agents pause gracefully — no runaway bills, ever.

Three API calls to a running agent

No SDKs to install, no frameworks to learn. Just HTTP.

1

Create a session

bash
curl -X POST https://api.softref.com/v1/sessions \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"model": "anthropic/claude-sonnet-4-6",
       "system": "You are a creative web developer."}'

Pick any model from any provider. Add a system prompt, tools config, hooks, or budget — all optional.

2

Send a message and run

bash
curl -X POST https://api.softref.com/v1/sessions/$ID/messages \
  -d '{"role": "user",
       "content": "Build a landing page for a coffee shop with
       a hero section, menu, and contact form."}'

curl -X POST https://api.softref.com/v1/sessions/$ID/run

The agent loop takes over — LLM reasoning, tool execution, file writes, all handled automatically.

3

Stream results in real time

bash
curl https://api.softref.com/v1/sessions/$ID/stream

event: content_delta
data:  {"text": "I'll build this with a modern, warm design..."}

event: tool_started
data:  {"name": "bash"}

event: tool_completed
data:  {"files_written": ["public/index.html"]}

event: session_completed
data:  {"status": "idle"}

12 SSE event types give you full visibility into reasoning, tool calls, and results — as they happen.

Architecture

Crash-resilient by design

Most agent frameworks run a while-loop in memory. If the process crashes, everything is lost. Softref uses a ratchet pattern — each LLM call is a persistent job. Server restart? The agent picks up exactly where it left off.

  • Each iteration is a discrete, retryable job — not an in-memory loop
  • Rate limits trigger automatic snooze and retry, not failure
  • Messages persist before the next job enqueues — zero data loss
  • Transient errors (429, 500, 503) get exponential backoff automatically
Job starts → validate session
check rate limits, budget, iteration count
Call LLM (streaming)
content_delta, thinking_delta events
Execute tools (up to 5 concurrent)
python, bash, http, objects, sub-agents
Persist messages → enqueue next job
compact if needed, check should_stop hook
repeat until end_turn or limit

The complete agent platform

Beyond the basics — everything you need to run agents in production.

Real-Time Streaming

12 SSE event types — content deltas, tool starts/completions, thinking tokens. 30-second heartbeat. Works with every provider.

MCP + OAuth 2.1

Connect from Claude Desktop, Cursor, or any MCP-compatible client. Full OAuth 2.1 with PKCE, dynamic registration, and token rotation.

Instant Publishing

Any object with a public/ prefix is served as a live web page. ETag caching, try_files resolution, custom domains.

Webhooks

HMAC-signed HTTP callbacks for session events — completions, errors, tool calls, spend limit breaches. Async delivery with retry.

Cron Scheduling

Schedule agent runs on any cron expression. Execution history, manual triggers, and validation — all through the API.

Channels

Project-level chat rooms for humans and agents. Enable human-in-the-loop workflows, multi-agent coordination, and event-driven hooks.

Custom Lua Tools

Define custom tools as Lua scripts in object storage. Access objects, HTTP, and JSON APIs. Hot-reloaded on change — no redeploy needed.

OpenTelemetry Tracing

Every API request returns a trace ID. Query spans, timings, and errors via the traces API. Full visibility into every LLM call and tool execution.

Session Branching

Branch a session at any message to explore alternative paths. Regenerate the last response, compact long histories, export full transcripts.

Built for developers

No vendor lock-in. No proprietary SDKs. Just a REST API that does what you expect.

API-first

Every feature is available through the REST API. The dashboard is just a client. Build your own UI, or go headless.

Bring your own keys

Use your own API keys for any provider. Stored encrypted as project secrets. Switch models per-session without code changes.

Context-aware compaction

Automatic LLM-based summarization when conversations approach the context window. Model-aware thresholds. Custom compaction prompts via hooks.

Full audit trail

Every LLM call logged with tokens, cost, latency. Every tool execution tracked. Session export gives you the complete history as JSON.

Ready to build?

Get started in under a minute. No credit card required.

Start building for free