Blog

How Auto-Scaling Works for Paperclip Agents

May 13, 2026 · HostAgentes Team

Your Paperclip agent handles 10 requests per minute during normal hours. Then a Product Hunt launch sends 500 requests per minute. What happens?

If you’re self-hosting, probably a timeout cascade. If you’re on HostAgentes, auto-scaling kicks in. Here’s how it works.

The Problem with Static Capacity

Traditional hosting assigns fixed resources. Your agent runs on a single instance with fixed CPU, memory, and concurrent connection limits. When traffic exceeds those limits:

  • New requests queue up
  • Response times increase
  • Requests start timing out
  • Users see errors

The fix is manual — spin up more instances, update load balancer config, monitor. By the time you react, the spike is over (and your users are gone).

How HostAgentes Auto-Scaling Works

Request Monitoring

We monitor every request to every agent in real-time:

  • Request rate (requests/second)
  • Response latency (p50, p95, p99)
  • Queue depth (waiting requests)
  • Instance CPU and memory utilization

Scaling Triggers

When any metric crosses a threshold, scaling initiates:

TriggerThresholdAction
Request rate>80% of capacityScale up
P95 latency>2x baselineScale up
Queue depth>10 waitingScale up
CPU utilization<20% for 5 minScale down

Scale-Up Process

  1. Detect — monitoring flags a threshold breach (under 1 second)
  2. Provision — a pre-warmed instance is activated (2-5 seconds)
  3. Route — new requests are distributed across instances
  4. Verify — confirm metrics return to healthy levels

Total time from spike to scaled: under 10 seconds.

Scale-Down Process

After traffic subsides:

  1. Wait — observe for 5 minutes to confirm the spike is over
  2. Drain — stop routing new requests to excess instances
  3. Complete — let in-flight requests finish
  4. Remove — deactivate the extra instances

This ensures no request is dropped during scale-down.

What This Means for You

No Capacity Planning

You don’t need to predict traffic. Deploy your agent and let auto-scaling handle the rest. Whether it’s 1 request or 10,000 per minute, your agent stays responsive.

Pay for What You Use

On the Pro and Scale plans, auto-scaling is included. There are no per-instance charges or surprise bills. Your monthly price stays the same regardless of traffic.

Zero Configuration

No YAML files, no auto-scaling groups, no min/max instance counts. It works out of the box. Just deploy your agent and we handle the rest.

Auto-Scaling by Plan

FeatureStarterProScale
Auto-scalingBasicFullFull + priority
Max concurrent50 req/minUnlimitedUnlimited
Scale-up speed~30 sec~10 sec~5 sec
Pre-warmed instances1310+

When You Need Scale

The Scale plan (€45/month) is for teams that need:

  • Priority scaling during peak traffic
  • Dedicated pre-warmed instances
  • Custom scaling thresholds
  • Scale-to-zero during off-hours (cost savings)
  • Enterprise SLAs

Most teams do great on Pro. Start there and upgrade when you need priority scaling.

See all plans →

Ready to deploy your Paperclip agents?

Managed hosting from $15/mo. Zero complications.

See Plans