Monitoring¶

This guide covers monitoring your La Suite Meet deployment with Prometheus, Grafana, and Sentry.

What to monitor¶

A healthy Meet instance requires monitoring at multiple layers:

Layer	Key metrics
Application	Request rate, error rate, response time, active rooms
LiveKit	Participant count, track count, packet loss, bitrate
Database	Connection count, query time, replication lag
Redis	Memory usage, hit rate, connection count
Storage (Garage/S3)	Bucket size, request rate, error rate
Infrastructure	CPU, memory, disk I/O, network throughput

Django / Backend metrics¶

The Django backend exposes metrics via django-prometheus (if installed):

GET /metrics

Key metrics: - django_http_requests_total — request count by method/status - django_http_request_duration_seconds — response latency histogram - django_db_execute_total — database query count

Sentry integration¶

The backend integrates with Sentry for error tracking and performance monitoring. Configure:

SENTRY_DSN=https://your-dsn@sentry.io/project-id
SENTRY_ENVIRONMENT=production

Sentry also captures throttling rate failures (added in v1.6.0), helping you detect abuse or misconfiguration.

LiveKit metrics¶

LiveKit exposes a Prometheus metrics endpoint:

# livekit-server.yaml
prometheus_port: 6789

GET http://livekit:6789/metrics

Key LiveKit metrics: - livekit_rooms_total — current active rooms - livekit_participants_total — current participants across all rooms - livekit_published_tracks_total — active media tracks - livekit_packet_loss_rate — packet loss percentage - livekit_nack_total — retransmission requests (indicator of network quality) - livekit_bytes_in / livekit_bytes_out — bandwidth usage

Prometheus configuration¶

Add scrape configs for Meet and LiveKit:

# prometheus.yml
scrape_configs:
  - job_name: meet-backend
    static_configs:
      - targets: ['meet-backend:8000']
    metrics_path: /metrics

  - job_name: livekit
    static_configs:
      - targets: ['livekit:6789']

  - job_name: garage
    static_configs:
      - targets: ['garage:3900']

Grafana dashboards¶

LiveKit dashboard¶

The LiveKit team provides an official Grafana dashboard. Import it by ID: 12452 from grafana.com/dashboards.

Key panels: - Active rooms and participants over time - Bandwidth in/out - Packet loss rate - Track publish/subscribe counts

Django dashboard¶

Import dashboard ID 9528 for Django + Prometheus metrics.

Node exporter dashboard¶

Import dashboard ID 1860 for host-level metrics (CPU, memory, disk, network).

Kubernetes monitoring (Helm)¶

If using the Kubernetes deployment, add monitoring via the kube-prometheus-stack:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

The Meet Helm chart includes PodMonitor / ServiceMonitor resources if Prometheus Operator is detected.

Health checks¶

Backend health check¶

curl https://meet.example.com/api/v1.0/healthz/
# Expected: HTTP 200

LiveKit health check¶

curl http://livekit:7880/
# Expected: HTTP 200

Garage health check¶

curl http://garage:3900/status
# Expected: HTTP 200

Alerting¶

Recommended alerts¶

Alert	Condition	Severity
Backend down	No healthy pods for 2 min	Critical
LiveKit down	No response for 1 min	Critical
High error rate	HTTP 5xx > 1% for 5 min	Warning
High packet loss	Packet loss > 5% for 10 min	Warning
Database connections near limit	>80% of max_connections	Warning
Disk space	<20% free on storage	Warning
Recording webhook failures	Webhook errors > 0 for 15 min	Warning

Example Prometheus alert rule¶

groups:
  - name: meet
    rules:
      - alert: MeetBackendDown
        expr: up{job="meet-backend"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Meet backend is down"

      - alert: MeetHighErrorRate
        expr: |
          rate(django_http_requests_total{status=~"5.."}[5m]) /
          rate(django_http_requests_total[5m]) > 0.01
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Meet backend error rate above 1%"

Log aggregation¶

For centralized logging, ship container logs to:

Loki (with Grafana) — lightweight, integrates with the Prometheus stack
Elasticsearch / OpenSearch — more powerful full-text search
Cloud logging (CloudWatch, Stackdriver, etc.)

Key log sources: - meet-backend — Django application logs - livekit — WebRTC signaling and media events - celery — Background task results and errors - livekit-egress — Recording job logs