Monitoring¶
This guide covers monitoring your La Suite Meet deployment with Prometheus, Grafana, and Sentry.
What to monitor¶
A healthy Meet instance requires monitoring at multiple layers:
| Layer | Key metrics |
|---|---|
| Application | Request rate, error rate, response time, active rooms |
| LiveKit | Participant count, track count, packet loss, bitrate |
| Database | Connection count, query time, replication lag |
| Redis | Memory usage, hit rate, connection count |
| Storage (Garage/S3) | Bucket size, request rate, error rate |
| Infrastructure | CPU, memory, disk I/O, network throughput |
Django / Backend metrics¶
The Django backend exposes metrics via django-prometheus (if installed):
Key metrics:
- django_http_requests_total — request count by method/status
- django_http_request_duration_seconds — response latency histogram
- django_db_execute_total — database query count
Sentry integration¶
The backend integrates with Sentry for error tracking and performance monitoring. Configure:
Sentry also captures throttling rate failures (added in v1.6.0), helping you detect abuse or misconfiguration.
LiveKit metrics¶
LiveKit exposes a Prometheus metrics endpoint:
Key LiveKit metrics:
- livekit_rooms_total — current active rooms
- livekit_participants_total — current participants across all rooms
- livekit_published_tracks_total — active media tracks
- livekit_packet_loss_rate — packet loss percentage
- livekit_nack_total — retransmission requests (indicator of network quality)
- livekit_bytes_in / livekit_bytes_out — bandwidth usage
Prometheus configuration¶
Add scrape configs for Meet and LiveKit:
# prometheus.yml
scrape_configs:
- job_name: meet-backend
static_configs:
- targets: ['meet-backend:8000']
metrics_path: /metrics
- job_name: livekit
static_configs:
- targets: ['livekit:6789']
- job_name: garage
static_configs:
- targets: ['garage:3900']
Grafana dashboards¶
LiveKit dashboard¶
The LiveKit team provides an official Grafana dashboard. Import it by ID: 12452 from grafana.com/dashboards.
Key panels: - Active rooms and participants over time - Bandwidth in/out - Packet loss rate - Track publish/subscribe counts
Django dashboard¶
Import dashboard ID 9528 for Django + Prometheus metrics.
Node exporter dashboard¶
Import dashboard ID 1860 for host-level metrics (CPU, memory, disk, network).
Kubernetes monitoring (Helm)¶
If using the Kubernetes deployment, add monitoring via the kube-prometheus-stack:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
The Meet Helm chart includes PodMonitor / ServiceMonitor resources if Prometheus Operator is detected.
Health checks¶
Backend health check¶
LiveKit health check¶
Garage health check¶
Alerting¶
Recommended alerts¶
| Alert | Condition | Severity |
|---|---|---|
| Backend down | No healthy pods for 2 min | Critical |
| LiveKit down | No response for 1 min | Critical |
| High error rate | HTTP 5xx > 1% for 5 min | Warning |
| High packet loss | Packet loss > 5% for 10 min | Warning |
| Database connections near limit | >80% of max_connections | Warning |
| Disk space | <20% free on storage | Warning |
| Recording webhook failures | Webhook errors > 0 for 15 min | Warning |
Example Prometheus alert rule¶
groups:
- name: meet
rules:
- alert: MeetBackendDown
expr: up{job="meet-backend"} == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Meet backend is down"
- alert: MeetHighErrorRate
expr: |
rate(django_http_requests_total{status=~"5.."}[5m]) /
rate(django_http_requests_total[5m]) > 0.01
for: 5m
labels:
severity: warning
annotations:
summary: "Meet backend error rate above 1%"
Log aggregation¶
For centralized logging, ship container logs to:
- Loki (with Grafana) — lightweight, integrates with the Prometheus stack
- Elasticsearch / OpenSearch — more powerful full-text search
- Cloud logging (CloudWatch, Stackdriver, etc.)
Key log sources:
- meet-backend — Django application logs
- livekit — WebRTC signaling and media events
- celery — Background task results and errors
- livekit-egress — Recording job logs