Operational procedures
Lekko has robust monitoring, alerting, and mitigation strategies to handle failures quickly.
Monitoring and alerting
Lekko uses the following observability tools for measuring, logging, and alerting based on critical operational metrics across the platform:
- Rockset
- Prometheus
- AWS CloudWatch
- Honeycomb
- Pagerduty
Mitigation and recovery
Lekko maintains a weekly on-call rotation with escalation plans. The on-call engineers are responsible for investigating and mitigating ongoing issues. If an issue is discovered and resolved, Lekko conducts a post-mortem analysis within 48 hours and communicates the results to affected users.