Skip to content

Monitoring

This guide covers monitoring your La Suite Meet deployment with Prometheus, Grafana, and Sentry.

What to monitor

A healthy Meet instance requires monitoring at multiple layers:

Layer Key metrics
Application Request rate, error rate, response time, active rooms
LiveKit Participant count, track count, packet loss, bitrate
Database Connection count, query time, replication lag
Redis Memory usage, hit rate, connection count
Storage (Garage/S3) Bucket size, request rate, error rate
Infrastructure CPU, memory, disk I/O, network throughput

Django / Backend metrics

The Django backend exposes metrics via django-prometheus (if installed):

GET /metrics

Key metrics: - django_http_requests_total — request count by method/status - django_http_request_duration_seconds — response latency histogram - django_db_execute_total — database query count

Sentry integration

The backend integrates with Sentry for error tracking and performance monitoring. Configure:

SENTRY_DSN=https://your-dsn@sentry.io/project-id
SENTRY_ENVIRONMENT=production

Sentry also captures throttling rate failures (added in v1.6.0), helping you detect abuse or misconfiguration.

LiveKit metrics

LiveKit exposes a Prometheus metrics endpoint:

# livekit-server.yaml
prometheus_port: 6789
GET http://livekit:6789/metrics

Key LiveKit metrics: - livekit_rooms_total — current active rooms - livekit_participants_total — current participants across all rooms - livekit_published_tracks_total — active media tracks - livekit_packet_loss_rate — packet loss percentage - livekit_nack_total — retransmission requests (indicator of network quality) - livekit_bytes_in / livekit_bytes_out — bandwidth usage

Prometheus configuration

Add scrape configs for Meet and LiveKit:

# prometheus.yml
scrape_configs:
  - job_name: meet-backend
    static_configs:
      - targets: ['meet-backend:8000']
    metrics_path: /metrics

  - job_name: livekit
    static_configs:
      - targets: ['livekit:6789']

  - job_name: garage
    static_configs:
      - targets: ['garage:3900']

Grafana dashboards

LiveKit dashboard

The LiveKit team provides an official Grafana dashboard. Import it by ID: 12452 from grafana.com/dashboards.

Key panels: - Active rooms and participants over time - Bandwidth in/out - Packet loss rate - Track publish/subscribe counts

Django dashboard

Import dashboard ID 9528 for Django + Prometheus metrics.

Node exporter dashboard

Import dashboard ID 1860 for host-level metrics (CPU, memory, disk, network).

Kubernetes monitoring (Helm)

If using the Kubernetes deployment, add monitoring via the kube-prometheus-stack:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

The Meet Helm chart includes PodMonitor / ServiceMonitor resources if Prometheus Operator is detected.

Health checks

Backend health check

curl https://meet.example.com/api/v1.0/healthz/
# Expected: HTTP 200

LiveKit health check

curl http://livekit:7880/
# Expected: HTTP 200

Garage health check

curl http://garage:3900/status
# Expected: HTTP 200

Alerting

Alert Condition Severity
Backend down No healthy pods for 2 min Critical
LiveKit down No response for 1 min Critical
High error rate HTTP 5xx > 1% for 5 min Warning
High packet loss Packet loss > 5% for 10 min Warning
Database connections near limit >80% of max_connections Warning
Disk space <20% free on storage Warning
Recording webhook failures Webhook errors > 0 for 15 min Warning

Example Prometheus alert rule

groups:
  - name: meet
    rules:
      - alert: MeetBackendDown
        expr: up{job="meet-backend"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Meet backend is down"

      - alert: MeetHighErrorRate
        expr: |
          rate(django_http_requests_total{status=~"5.."}[5m]) /
          rate(django_http_requests_total[5m]) > 0.01
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Meet backend error rate above 1%"

Log aggregation

For centralized logging, ship container logs to:

  • Loki (with Grafana) — lightweight, integrates with the Prometheus stack
  • Elasticsearch / OpenSearch — more powerful full-text search
  • Cloud logging (CloudWatch, Stackdriver, etc.)

Key log sources: - meet-backend — Django application logs - livekit — WebRTC signaling and media events - celery — Background task results and errors - livekit-egress — Recording job logs