Intelligence & Analytics

Relay provides a built-in intelligence layer that gives you deep visibility into your real-time infrastructure — traffic patterns, connection health, cost breakdown, and predictive insights.

All intelligence endpoints are under GET /api/v1/apps/{appId}/intelligence/ and require the analytics:read scope.


Channel Heatmap

Visualize message density across all channels in real-time.

GET /api/v1/apps/{appId}/intelligence/heatmap?window=5
Param Default Description
window 5 Time window in minutes (max 60)

Response:

{
  "heatmap": [
    {"channel": "chat-room-1", "messages": 342, "subscribers": 28, "bytes": 51200, "heat": 0.95},
    {"channel": "notifications", "messages": 89, "subscribers": 150, "bytes": 12800, "heat": 0.4}
  ],
  "generated_at": "2024-03-15T14:30:00Z"
}

The heat value is 0.0–1.0 normalized intensity. Use it to color-code a dashboard visualization.


Predictive Scaling

Predict future connection and message load based on historical patterns. Uses day-of-week and hour-of-day analysis with Welford's online variance algorithm.

GET /api/v1/apps/{appId}/intelligence/predictions?hours_ahead=24

Response:

{
  "predictions": [
    {
      "time": "2024-03-15T15:00:00Z",
      "predicted_connections": 850,
      "upper_bound": 1200,
      "p95_connections": 1100,
      "predicted_messages_per_min": 420,
      "confidence": "high",
      "based_on_samples": 16
    }
  ]
}

Confidence levels:

  • high — 12+ data points for this time slot
  • medium — 6–11 data points
  • low — 4–5 data points
  • insufficient — fewer than 4 (prediction not available)

Event Flamegraph

Trace the complete lifecycle of an event — from publish through the relay server to every subscriber, webhook, and edge function.

GET /api/v1/apps/{appId}/intelligence/traces/{traceId}

Response:

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "total_duration_us": 4520,
  "span_count": 5,
  "spans": [
    {"id": "abc", "name": "publish", "channel": "orders", "event": "created", "duration_us": 120, "duration_ms": 0.12},
    {"id": "def", "name": "relay", "parent_id": "abc", "duration_us": 2100, "duration_ms": 2.1},
    {"id": "ghi", "name": "deliver", "parent_id": "def", "duration_us": 850, "duration_ms": 0.85},
    {"id": "jkl", "name": "webhook", "parent_id": "def", "duration_us": 3200, "duration_ms": 3.2},
    {"id": "mno", "name": "edge_function", "parent_id": "def", "duration_us": 450, "duration_ms": 0.45}
  ]
}

Span names: publish, relay, deliver, webhook, edge_function

Render this as a flamegraph or waterfall chart to identify bottlenecks.


Connection Quality Scores

Every WebSocket connection gets a quality score from 0 (terrible) to 100 (perfect), based on:

  • Latency — -1 point per 10ms above 50ms
  • Reconnects — -5 points per reconnect
  • Delivery rate — -2 points per % below 100%
  • Dropped messages — -2 points per drop
GET /api/v1/apps/{appId}/intelligence/connection-quality

Response:

{
  "average_quality": 87.5,
  "total_connections": 142,
  "poor_connections": 3,
  "connections": [
    {
      "socket_id": "123.456",
      "quality_score": 45,
      "avg_latency_ms": 230,
      "reconnects": 4,
      "delivery_rate": 92.1,
      "tags": {"user_id": "42", "device": "mobile"}
    }
  ]
}

Connection Tags

Tag connections with metadata for filtering:

{"user_id": "42", "device": "mobile", "region": "us-east", "plan": "pro"}

Query connections by tag via the API to find patterns — e.g., all mobile connections with low quality scores.


Channel Lifecycle

Track the full lifecycle of any channel: creation, first subscriber, peak activity, last subscriber, destruction.

GET /api/v1/apps/{appId}/intelligence/channels/{channelName}/lifecycle

Response:

{
  "channel": "chat-room-42",
  "timeline": [
    {"type": "created", "at": "2024-03-15T10:00:00Z", "metadata": null},
    {"type": "first_subscriber", "at": "2024-03-15T10:00:05Z", "metadata": {"socket_id": "123.1"}},
    {"type": "peak_activity", "at": "2024-03-15T14:30:00Z", "metadata": {"subscribers": 28, "messages_per_min": 120}},
    {"type": "last_subscriber", "at": "2024-03-15T22:00:00Z", "metadata": {"socket_id": "456.7"}}
  ],
  "ghost_channels": ["old-room-1", "test-channel"]
}

Ghost Channel Detection

Ghost channels are channels with no subscribers for over 60 minutes. They waste resources and may indicate abandoned features. The ghost_channels array in the response helps you identify and clean them up.


Smart Anomaly Baselines

Instead of static alert thresholds ("alert if connections > 1000"), smart baselines learn your app's normal patterns and alert on statistical deviations.

How it works:

  1. Every 5 minutes, Relay records your connection count and message rate
  2. Patterns are grouped by day-of-week and hour-of-day
  3. Mean and standard deviation are calculated using Welford's online algorithm
  4. Anomalies are detected using z-scores:
Z-Score Severity Meaning
>= 3.0 Critical 99.7% outside normal range
>= 2.0 Warning 95% outside normal range
>= 1.5 Info Notable deviation

The system adapts over time — what's "normal" on Monday at 2pm is different from Sunday at 3am.

Configure in your app's Alerts tab. Smart baselines replace static thresholds when you have 4+ weeks of data.


Cost Attribution

See exactly how much each channel and resource type costs in normalized cost units.

GET /api/v1/apps/{appId}/intelligence/cost-attribution

Response:

{
  "by_channel": [
    {"channel": "live-feed", "cost_units": 45.2, "resources": ["messages", "bandwidth"]},
    {"channel": "chat-room-1", "cost_units": 12.8, "resources": ["messages", "connections"]}
  ],
  "by_resource": [
    {"type": "messages", "cost_units": 89.4, "quantity": 450000},
    {"type": "connections", "cost_units": 22.1, "quantity": 1200},
    {"type": "bandwidth", "cost_units": 15.6, "quantity": 52428800}
  ],
  "date": "2024-03-15"
}

Resource types: messages, connections, bandwidth, storage, compute


Event Query Language

Query your event history using a SQL-like syntax:

POST /api/v1/apps/{appId}/intelligence/query
Content-Type: application/json

{"query": "SELECT * FROM events WHERE channel LIKE 'orders-%' AND payload.amount > 1000 SINCE 5m AGO LIMIT 50"}

Syntax

[SELECT [aggregation]] FROM events
  [WHERE condition [AND condition ...]]
  [SINCE duration AGO]
  [ORDER BY field [ASC|DESC]]
  [LIMIT n]

Conditions

Operator Example
= channel = 'orders'
!= event != 'heartbeat'
>, <, >=, <= payload_size > 1024
LIKE channel LIKE 'chat-*'

JSON Path Queries

Access nested payload fields with dot notation:

WHERE payload.user_id = '42'
WHERE data.amount > 100
WHERE payload.metadata.region = 'us-east'

Aggregations

SELECT COUNT(*) FROM events WHERE channel = 'orders' SINCE 1h AGO
SELECT SUM(payload_size) FROM events SINCE 24h AGO
SELECT AVG(payload_size) FROM events WHERE channel LIKE 'uploads-*' SINCE 1d AGO

Time Durations

5s, 30m, 2h, 7d, 1w


Latency Percentiles

Track delivery latency with percentile breakdowns per channel.

GET /api/v1/apps/{appId}/intelligence/latency?channel=chat-room-1

Response:

{
  "current": {
    "p50_ms": 12.5,
    "p75_ms": 18.2,
    "p90_ms": 25.8,
    "p95_ms": 42.1,
    "p99_ms": 98.3,
    "avg_ms": 15.4,
    "min_ms": 1.2,
    "max_ms": 450.0,
    "sample_count": 15000
  },
  "history": [
    {"p50": 11.8, "p95": 38.5, "p99": 85.2, "avg": 14.1, "time": "2024-03-15T13:00:00Z"},
    {"p50": 12.5, "p95": 42.1, "p99": 98.3, "avg": 15.4, "time": "2024-03-15T14:00:00Z"}
  ]
}

Omit the channel parameter to get app-wide latency percentiles.

Use this data to:

  • Set SLA targets (e.g., "p95 latency < 50ms")
  • Detect performance regressions over time
  • Compare latency across channels to find slow consumers

Dependency Graph

Relay automatically discovers relationships between channels based on:

  • Event bridges — explicit cross-app routing
  • Event traces — events that trigger other events
  • Edge function pipelines — functions that publish to other channels

The graph is derived from EventTrace and EventBridge data. Query traces to see which channels interact.