Intelligence & Analytics
Relay provides a built-in intelligence layer that gives you deep visibility into your real-time infrastructure — traffic patterns, connection health, cost breakdown, and predictive insights.
All intelligence endpoints are under GET /api/v1/apps/{appId}/intelligence/ and require the analytics:read scope.
Channel Heatmap
Visualize message density across all channels in real-time.
GET /api/v1/apps/{appId}/intelligence/heatmap?window=5
| Param | Default | Description |
|---|---|---|
window |
5 |
Time window in minutes (max 60) |
Response:
{
"heatmap": [
{"channel": "chat-room-1", "messages": 342, "subscribers": 28, "bytes": 51200, "heat": 0.95},
{"channel": "notifications", "messages": 89, "subscribers": 150, "bytes": 12800, "heat": 0.4}
],
"generated_at": "2024-03-15T14:30:00Z"
}
The heat value is 0.0–1.0 normalized intensity. Use it to color-code a dashboard visualization.
Predictive Scaling
Predict future connection and message load based on historical patterns. Uses day-of-week and hour-of-day analysis with Welford's online variance algorithm.
GET /api/v1/apps/{appId}/intelligence/predictions?hours_ahead=24
Response:
{
"predictions": [
{
"time": "2024-03-15T15:00:00Z",
"predicted_connections": 850,
"upper_bound": 1200,
"p95_connections": 1100,
"predicted_messages_per_min": 420,
"confidence": "high",
"based_on_samples": 16
}
]
}
Confidence levels:
- high — 12+ data points for this time slot
- medium — 6–11 data points
- low — 4–5 data points
- insufficient — fewer than 4 (prediction not available)
Event Flamegraph
Trace the complete lifecycle of an event — from publish through the relay server to every subscriber, webhook, and edge function.
GET /api/v1/apps/{appId}/intelligence/traces/{traceId}
Response:
{
"trace_id": "550e8400-e29b-41d4-a716-446655440000",
"total_duration_us": 4520,
"span_count": 5,
"spans": [
{"id": "abc", "name": "publish", "channel": "orders", "event": "created", "duration_us": 120, "duration_ms": 0.12},
{"id": "def", "name": "relay", "parent_id": "abc", "duration_us": 2100, "duration_ms": 2.1},
{"id": "ghi", "name": "deliver", "parent_id": "def", "duration_us": 850, "duration_ms": 0.85},
{"id": "jkl", "name": "webhook", "parent_id": "def", "duration_us": 3200, "duration_ms": 3.2},
{"id": "mno", "name": "edge_function", "parent_id": "def", "duration_us": 450, "duration_ms": 0.45}
]
}
Span names: publish, relay, deliver, webhook, edge_function
Render this as a flamegraph or waterfall chart to identify bottlenecks.
Connection Quality Scores
Every WebSocket connection gets a quality score from 0 (terrible) to 100 (perfect), based on:
- Latency — -1 point per 10ms above 50ms
- Reconnects — -5 points per reconnect
- Delivery rate — -2 points per % below 100%
- Dropped messages — -2 points per drop
GET /api/v1/apps/{appId}/intelligence/connection-quality
Response:
{
"average_quality": 87.5,
"total_connections": 142,
"poor_connections": 3,
"connections": [
{
"socket_id": "123.456",
"quality_score": 45,
"avg_latency_ms": 230,
"reconnects": 4,
"delivery_rate": 92.1,
"tags": {"user_id": "42", "device": "mobile"}
}
]
}
Connection Tags
Tag connections with metadata for filtering:
{"user_id": "42", "device": "mobile", "region": "us-east", "plan": "pro"}
Query connections by tag via the API to find patterns — e.g., all mobile connections with low quality scores.
Channel Lifecycle
Track the full lifecycle of any channel: creation, first subscriber, peak activity, last subscriber, destruction.
GET /api/v1/apps/{appId}/intelligence/channels/{channelName}/lifecycle
Response:
{
"channel": "chat-room-42",
"timeline": [
{"type": "created", "at": "2024-03-15T10:00:00Z", "metadata": null},
{"type": "first_subscriber", "at": "2024-03-15T10:00:05Z", "metadata": {"socket_id": "123.1"}},
{"type": "peak_activity", "at": "2024-03-15T14:30:00Z", "metadata": {"subscribers": 28, "messages_per_min": 120}},
{"type": "last_subscriber", "at": "2024-03-15T22:00:00Z", "metadata": {"socket_id": "456.7"}}
],
"ghost_channels": ["old-room-1", "test-channel"]
}
Ghost Channel Detection
Ghost channels are channels with no subscribers for over 60 minutes. They waste resources and may indicate abandoned features. The ghost_channels array in the response helps you identify and clean them up.
Smart Anomaly Baselines
Instead of static alert thresholds ("alert if connections > 1000"), smart baselines learn your app's normal patterns and alert on statistical deviations.
How it works:
- Every 5 minutes, Relay records your connection count and message rate
- Patterns are grouped by day-of-week and hour-of-day
- Mean and standard deviation are calculated using Welford's online algorithm
- Anomalies are detected using z-scores:
| Z-Score | Severity | Meaning |
|---|---|---|
| >= 3.0 | Critical | 99.7% outside normal range |
| >= 2.0 | Warning | 95% outside normal range |
| >= 1.5 | Info | Notable deviation |
The system adapts over time — what's "normal" on Monday at 2pm is different from Sunday at 3am.
Configure in your app's Alerts tab. Smart baselines replace static thresholds when you have 4+ weeks of data.
Cost Attribution
See exactly how much each channel and resource type costs in normalized cost units.
GET /api/v1/apps/{appId}/intelligence/cost-attribution
Response:
{
"by_channel": [
{"channel": "live-feed", "cost_units": 45.2, "resources": ["messages", "bandwidth"]},
{"channel": "chat-room-1", "cost_units": 12.8, "resources": ["messages", "connections"]}
],
"by_resource": [
{"type": "messages", "cost_units": 89.4, "quantity": 450000},
{"type": "connections", "cost_units": 22.1, "quantity": 1200},
{"type": "bandwidth", "cost_units": 15.6, "quantity": 52428800}
],
"date": "2024-03-15"
}
Resource types: messages, connections, bandwidth, storage, compute
Event Query Language
Query your event history using a SQL-like syntax:
POST /api/v1/apps/{appId}/intelligence/query
Content-Type: application/json
{"query": "SELECT * FROM events WHERE channel LIKE 'orders-%' AND payload.amount > 1000 SINCE 5m AGO LIMIT 50"}
Syntax
[SELECT [aggregation]] FROM events
[WHERE condition [AND condition ...]]
[SINCE duration AGO]
[ORDER BY field [ASC|DESC]]
[LIMIT n]
Conditions
| Operator | Example |
|---|---|
= |
channel = 'orders' |
!= |
event != 'heartbeat' |
>, <, >=, <= |
payload_size > 1024 |
LIKE |
channel LIKE 'chat-*' |
JSON Path Queries
Access nested payload fields with dot notation:
WHERE payload.user_id = '42'
WHERE data.amount > 100
WHERE payload.metadata.region = 'us-east'
Aggregations
SELECT COUNT(*) FROM events WHERE channel = 'orders' SINCE 1h AGO
SELECT SUM(payload_size) FROM events SINCE 24h AGO
SELECT AVG(payload_size) FROM events WHERE channel LIKE 'uploads-*' SINCE 1d AGO
Time Durations
5s, 30m, 2h, 7d, 1w
Latency Percentiles
Track delivery latency with percentile breakdowns per channel.
GET /api/v1/apps/{appId}/intelligence/latency?channel=chat-room-1
Response:
{
"current": {
"p50_ms": 12.5,
"p75_ms": 18.2,
"p90_ms": 25.8,
"p95_ms": 42.1,
"p99_ms": 98.3,
"avg_ms": 15.4,
"min_ms": 1.2,
"max_ms": 450.0,
"sample_count": 15000
},
"history": [
{"p50": 11.8, "p95": 38.5, "p99": 85.2, "avg": 14.1, "time": "2024-03-15T13:00:00Z"},
{"p50": 12.5, "p95": 42.1, "p99": 98.3, "avg": 15.4, "time": "2024-03-15T14:00:00Z"}
]
}
Omit the channel parameter to get app-wide latency percentiles.
Use this data to:
- Set SLA targets (e.g., "p95 latency < 50ms")
- Detect performance regressions over time
- Compare latency across channels to find slow consumers
Dependency Graph
Relay automatically discovers relationships between channels based on:
- Event bridges — explicit cross-app routing
- Event traces — events that trigger other events
- Edge function pipelines — functions that publish to other channels
The graph is derived from EventTrace and EventBridge data. Query traces to see which channels interact.