How Auto-Scaling Works for Paperclip Agents

Your Paperclip agent handles 10 requests per minute during normal hours. Then a Product Hunt launch sends 500 requests per minute. What happens?

If you’re self-hosting, probably a timeout cascade. If you’re on HostAgentes, auto-scaling kicks in. Here’s how it works.

The Problem with Static Capacity

Traditional hosting assigns fixed resources. Your agent runs on a single instance with fixed CPU, memory, and concurrent connection limits. When traffic exceeds those limits:

New requests queue up
Response times increase
Requests start timing out
Users see errors

The fix is manual — spin up more instances, update load balancer config, monitor. By the time you react, the spike is over (and your users are gone).

How HostAgentes Auto-Scaling Works

Request Monitoring

We monitor every request to every agent in real-time:

Request rate (requests/second)
Response latency (p50, p95, p99)
Queue depth (waiting requests)
Instance CPU and memory utilization

Scaling Triggers

When any metric crosses a threshold, scaling initiates:

Trigger	Threshold	Action
Request rate	>80% of capacity	Scale up
P95 latency	>2x baseline	Scale up
Queue depth	>10 waiting	Scale up
CPU utilization	<20% for 5 min	Scale down

Scale-Up Process

Detect — monitoring flags a threshold breach (under 1 second)
Provision — a pre-warmed instance is activated (2-5 seconds)
Route — new requests are distributed across instances
Verify — confirm metrics return to healthy levels

Total time from spike to scaled: under 10 seconds.

Scale-Down Process

After traffic subsides:

Wait — observe for 5 minutes to confirm the spike is over
Drain — stop routing new requests to excess instances
Complete — let in-flight requests finish
Remove — deactivate the extra instances

This ensures no request is dropped during scale-down.

What This Means for You

No Capacity Planning

You don’t need to predict traffic. Deploy your agent and let auto-scaling handle the rest. Whether it’s 1 request or 10,000 per minute, your agent stays responsive.

Pay for What You Use

On the Pro and Scale plans, auto-scaling is included. There are no per-instance charges or surprise bills. Your monthly price stays the same regardless of traffic.

Zero Configuration

No YAML files, no auto-scaling groups, no min/max instance counts. It works out of the box. Just deploy your agent and we handle the rest.

Auto-Scaling by Plan

Feature	Starter	Pro	Scale
Auto-scaling	Basic	Full	Full + priority
Max concurrent	50 req/min	Unlimited	Unlimited
Scale-up speed	~30 sec	~10 sec	~5 sec
Pre-warmed instances	1	3	10+

When You Need Scale

The Scale plan (€45/month) is for teams that need:

Priority scaling during peak traffic
Dedicated pre-warmed instances
Custom scaling thresholds
Scale-to-zero during off-hours (cost savings)
Enterprise SLAs

Most teams do great on Pro. Start there and upgrade when you need priority scaling.

See all plans →