How to Monitor Your Paperclip Agents

Your agent is deployed and receiving traffic. But is it working well? Monitoring tells you. Here’s what to track and how to set it up.

Two Types of Monitoring

Infrastructure Monitoring

Is the server running? Is it responsive? This is traditional monitoring — CPU, memory, request latency, error rates. Necessary but insufficient.

Quality Monitoring

Is the agent making good decisions? Are users satisfied? Are tool calls succeeding? This is agent-specific monitoring — and it’s what separates good deployments from great ones.

You need both. Here’s how to approach each.

Infrastructure Metrics

These are the baseline health metrics:

Metric	Healthy	Warning	Critical
Response latency (p50)	<2s	2-5s	>5s
Response latency (p95)	<5s	5-10s	>10s
Error rate	<1%	1-5%	>5%
CPU utilization	<60%	60-80%	>80%
Memory utilization	<70%	70-85%	>85%
Queue depth	<5	5-20	>20

How to Monitor on HostAgentes

The built-in dashboard shows all infrastructure metrics in real-time:

Request volume over time
Latency distribution (p50, p95, p99)
Error rate trends
Active instances and scaling events
Token usage by agent

No setup required. It works from the moment you deploy.

Quality Metrics

These measure how well your agent is performing its job:

Conversation Completion Rate

What percentage of conversations reach a natural conclusion (vs. users abandoning mid-conversation)?

Good: >80%
Needs attention: 60-80%
Problem: <60%

Low completion rates suggest your agent isn’t meeting user expectations.

Tool Call Success Rate

What percentage of tool calls succeed?

Good: >98%
Needs attention: 95-98%
Problem: <95%

A declining tool call success rate means something in your tool chain is breaking.

User Satisfaction Score

Track explicit feedback (thumbs up/down or ratings) after conversations:

Good: >4.0/5.0
Needs attention: 3.0-4.0
Problem: <3.0

Hallucination Rate

Monitor for factual errors or fabricated information. This requires periodic human review of conversation logs. Flag conversations where the agent made unsupported claims.

Cost Per Conversation

Total LLM token cost divided by number of conversations. Track this over time to catch cost regressions from prompt changes or model switches.

Setting Up Alerts

On HostAgentes, configure alerts for:

Error rate spike — error rate exceeds 5% for 5 minutes
Latency degradation — p95 latency exceeds 10 seconds
Tool failure — any tool’s success rate drops below 95%
Cost anomaly — daily token spend exceeds 2x the 7-day average
Agent unreachable — health check fails 3 times in a row

Alerts can be sent to email, Slack, or webhook endpoints.

Monitoring Dashboard Walkthrough

The HostAgentes dashboard shows:

Overview Panel

Active agents and their status
Total requests today / this week
Average response quality score
Token usage and costs

Agent Detail View

Request volume over time (1h, 6h, 24h, 7d)
Latency distribution chart
Tool call breakdown with success rates
Recent conversations with quality indicators

Scaling Events

When instances were added or removed
Traffic patterns that triggered scaling
Current instance count and utilization

Best Practices

Check the dashboard daily — even a 2-minute review catches issues early
Review conversation logs weekly — look for patterns in bad responses
Set up alerts, not dashboards — alerts notify you of problems; dashboards show you data
Track quality over time — a single metric snapshot isn’t useful; trends are
Compare before/after changes — every agent update should be followed by a quality comparison

Getting Started with Monitoring

Every HostAgentes deployment includes monitoring from day one. Deploy an agent and the dashboard starts tracking immediately.

Start monitoring your agents →