AI Agent Infrastructure

From prompt to production in seconds

Watch an agent build and deploy a live website — entirely through the API.

terminal

$ curl -X POST https://api.softref.com/v1/sessions/run \

-d '{"message": "Build a landing page for a coffee shop"}'

event: content_delta

data: {"text": "I'll create a beautiful coffee shop landing page..."}

event: tool_started

data: {"name": "bash", "input": "python build_site.py"}

event: tool_completed

data: {"files_written": ["public/index.html", "public/style.css"]}

event: content_delta

data: {"text": "Your site is live! Here's the URL:"}

event: session_completed

data: {"status": "idle"}

$ https://coffee-shop.softref.com ✓ Live

Everything agents need to ship

A complete backend for AI agents — from LLM calls to live deployment. No glue code, no duct tape.

Multi-Provider LLM

Anthropic, OpenAI, Google, Ollama, Azure, OpenRouter. Switch providers in one field. Prompt caching cuts costs 85-95% automatically.

Sandboxed Execution

Python and Bash sandboxes with a virtual filesystem. Write open("public/page.html", "w") and it's live instantly. No deploy step.

Versioned Objects

Every change tracked with SHA256 dedup. Up to 100 versions per object. Rollback to any previous state with a single API call.

Self-Programming Hooks

7 hook points — Python or Lua scripts that agents can rewrite themselves. Control stop conditions, tool filters, and result transforms on the fly.

Sub-Agent Orchestration

Agents spawn agents with configurable depth and count limits. Structured results let orchestrators read outcomes without parsing messages.

Budget & Rate Limits

Per-project spend limits, per-session budgets, and sliding-window rate limiting. Agents pause gracefully — no runaway bills, ever.

Three API calls to a running agent

No SDKs to install, no frameworks to learn. Just HTTP.

1

Create a session

bash

curl -X POST https://api.softref.com/v1/sessions \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"model": "anthropic/claude-sonnet-4-6",
       "system": "You are a creative web developer."}'

Pick any model from any provider. Add a system prompt, tools config, hooks, or budget — all optional.

2

Send a message and run

bash

curl -X POST https://api.softref.com/v1/sessions/$ID/messages \
  -d '{"role": "user",
       "content": "Build a landing page for a coffee shop with
       a hero section, menu, and contact form."}'

curl -X POST https://api.softref.com/v1/sessions/$ID/run

The agent loop takes over — LLM reasoning, tool execution, file writes, all handled automatically.

3

Stream results in real time

bash

curl https://api.softref.com/v1/sessions/$ID/stream

event: content_delta
data:  {"text": "I'll build this with a modern, warm design..."}

event: tool_started
data:  {"name": "bash"}

event: tool_completed
data:  {"files_written": ["public/index.html"]}

event: session_completed
data:  {"status": "idle"}

12 SSE event types give you full visibility into reasoning, tool calls, and results — as they happen.

Architecture

Crash-resilient by design

Most agent frameworks run a while-loop in memory. If the process crashes, everything is lost. Softref uses a ratchet pattern — each LLM call is a persistent job. Server restart? The agent picks up exactly where it left off.

Each iteration is a discrete, retryable job — not an in-memory loop
Rate limits trigger automatic snooze and retry, not failure
Messages persist before the next job enqueues — zero data loss
Transient errors (429, 500, 503) get exponential backoff automatically

Job starts → validate session

check rate limits, budget, iteration count

Call LLM (streaming)

content_delta, thinking_delta events

Execute tools (up to 5 concurrent)

python, bash, http, objects, sub-agents

Persist messages → enqueue next job

compact if needed, check should_stop hook

repeat until end_turn or limit

The complete agent platform

Beyond the basics — everything you need to run agents in production.

Real-Time Streaming

12 SSE event types — content deltas, tool starts/completions, thinking tokens. 30-second heartbeat. Works with every provider.

MCP + OAuth 2.1

Connect from Claude Desktop, Cursor, or any MCP-compatible client. Full OAuth 2.1 with PKCE, dynamic registration, and token rotation.

Instant Publishing

Any object with a public/ prefix is served as a live web page. ETag caching, try_files resolution, custom domains.

Webhooks

HMAC-signed HTTP callbacks for session events — completions, errors, tool calls, spend limit breaches. Async delivery with retry.

Cron Scheduling

Schedule agent runs on any cron expression. Execution history, manual triggers, and validation — all through the API.

Channels

Project-level chat rooms for humans and agents. Enable human-in-the-loop workflows, multi-agent coordination, and event-driven hooks.

Custom Lua Tools

Define custom tools as Lua scripts in object storage. Access objects, HTTP, and JSON APIs. Hot-reloaded on change — no redeploy needed.

OpenTelemetry Tracing

Every API request returns a trace ID. Query spans, timings, and errors via the traces API. Full visibility into every LLM call and tool execution.

Session Branching

Branch a session at any message to explore alternative paths. Regenerate the last response, compact long histories, export full transcripts.

Built for developers

No vendor lock-in. No proprietary SDKs. Just a REST API that does what you expect.

API-first

Every feature is available through the REST API. The dashboard is just a client. Build your own UI, or go headless.

Bring your own keys

Use your own API keys for any provider. Stored encrypted as project secrets. Switch models per-session without code changes.

Context-aware compaction

Automatic LLM-based summarization when conversations approach the context window. Model-aware thresholds. Custom compaction prompts via hooks.

Full audit trail

Every LLM call logged with tokens, cost, latency. Every tool execution tracked. Session export gives you the complete history as JSON.

Ready to build?

Get started in under a minute. No credit card required.

Start building for free →

The infrastructure for AI agents

From prompt to production in seconds

Everything agents need to ship

Multi-Provider LLM

Sandboxed Execution

Versioned Objects

Self-Programming Hooks

Sub-Agent Orchestration

Budget & Rate Limits

Three API calls to a running agent

Create a session

Send a message and run

Stream results in real time

Crash-resilient by design

The complete agent platform

Real-Time Streaming

MCP + OAuth 2.1

Instant Publishing

Webhooks

Cron Scheduling

Channels

Custom Lua Tools

OpenTelemetry Tracing

Session Branching

Built for developers

API-first

Bring your own keys

Context-aware compaction

Full audit trail

Ready to build?

The infrastructure for
AI agents