Evolution of application monitoring: Linkerd2

London Microservices User Group - April 2019. Hosted at FarFetch.

This talk focuses on the how we have evolved our approach to application monitoring at Attest over the last 3 years that I’ve been working there.

Overview

Link to slides.

Starting with the history of attest, we walk through the very basic approach we had to monitoring and alerting back in April 2016. Touching on each of the changes that we made taking the architecture and platform through:

  • StatusCake and AMIs baked on EC2 Instances
  • Apps in Docker running on ECS with Application load balancer metrics
  • Linkerd1 and our first experiences with a service mesh
  • Zipkin and the troubles trying to get a distributed tracing system to give aggregated metrics.
  • Kubernetes with Prometheus and AlertManager
  • Linkerd2 and our current approach to building an “Internal PaaS” for the feature teams.

/img/london-microservices-2-20190410.jpg

Demo

To show the power of linkerd2, the talk includes a demo which takes an empty kubernetes cluster (minikube) and in < 4 minutes we:

  • Install the linkerd2 control plane from scratch with zero config
  • Install a demo app in the cluster
  • Inject linkerd proxy into the demo app
  • View metrics and grafana
  • Use the linkerd2 dashboard and cli tooling to debug failing requests in the demo app.

/img/london-microservices-1-20190410.jpg

Linkerd2

Linkerd2 has massively increased the confidence that we, and our feature teams have in being able to debug and resolve incidents. It allows for out of the box metrics and our teams become empowered to own the services that they create.

Without the metrics and observability tooling that linkerd2 provides we would not have been able to move as fast with as much confidence. There are of course other tools, (e.g. envoy), but the simplicity and ease of install / use of linkerd2 was a huge reason that we chose it.