We have open-sourced this under a Creative Commons license and encourage contributions to iteratively improve our content.
Who’s this playbook for?
We’ve created this playbook to help teams and organisations design, plan, execute and review a Chaos Day. It’s not just for engineers; it is for everyone involved in delivering software. Product owners can learn more about the risks and impacts of failure, testers can learn how to explore edge cases and test for resilience and designers can benefit from a greater understanding of the user experience of failure and how to design interfaces that are adaptable.
This playbook is for any organisation, regardless of their tech stack or maturity. You don’t have to use containers, Kubernetes, or be in AWS, GCP, Azure or any other cloud platform to gain the benefits of probing your system’s response to failure.
Chaos Days are great opportunities to run experiments that explore security threats. For a distillation of our thinking on how best to apply security within continuous delivery, look at our Secure Delivery Playbook.
Chaos Days can be run with co-located and distributed teams alike. If some or all of your team is remote, our Remote Working Playbook might be of interest.
Any size of service benefits from Chaos Engineering. This playbook describes an approach that can be scaled up from a single service to an entire platform. We’ve further advice on why, when, and how to build a Digital Platform in our Digital Platform Playbook.
For teams practising the You Build It You Run It (YBIYRI) operating model to build, deploy, operate, and support their own digital services, Chaos Days is a perfect tool to better understand how their services respond to failure. You can learn more about the YBIYRI model in our You Build It, You Run It playbook, by Steve Smith and Bethan Timmins.