Libro Google: Anatomy of an Incident - Site Reliability Engineering (2022)

When it comes to system design, failure is inevitable. Scientists and engineers implement solutions based on the available information, without a complete knowledge of the future. You can’t always anticipate the next zero-day event, viral media trend, weather disaster, or shift in technology. But you can be prepared to respond when incidents like these affect your systems.

With this report, SRE and DevOps practitioners, IT managers, and engineering leaders will explore methods to help your organization prepare for, respond to, and recover from incidents. With advice from Ayelet Sachto, Adrienne Walcer, and Jessie Yang, you’ll learn how to be prepared to handle failure if and when it happens.

Learn the stages of the incident management lifecycle: preparedness, response, recovery, and mitigation