DevOps VS. Site Reliability EngineeringAsmaa Nasr
I’m not a fan of pitting ideas against one another. Our goal should be to ensure that we are assisting our organization in delivering functionality in a timely and secure manner. The precise, moment-by-moment details of this don’t really matter. Site Reliability Engineering (SRE) and DevOps are two major ideas and approaches that are both attempting to achieve the same goal. While they share many similarities, each has some distinguishing characteristics.
Rather than trying to sell you on either, let’s just talk about them both and then figure out how we can decide which is a better fit for our organization.
Site Reliability Engineering
SRE was developed at Google to address the challenges of software development and ongoing operations. The emphasis is squarely on the use of tools to aid in the automation of processes. However, it is also regarded as a legitimate role within an organization. To successfully implement Site Reliability Engineering, a Site Reliability Engineer is required.
The overall goal of SRE is centered on the name’s middle word, Reliability. When implementing an SRE strategy, your first priority is to ensure that you are doing everything possible to keep your systems online and available. This aspect of SRE receives the majority of the automation and testing.
SRE has several core principles that are worth noting:
- Automation: SRE approaches operations as a software problem, automating all aspects of operations from deployment to monitoring. SRE prioritizes automation, which means that automation is present everywhere with an eye toward discovering new possibilities.
- Service Level Objectives: Engineers on an SRE team will collaborate with other teams to define Service Level Agreements (SLA) and Service Level Indictors (SLI) that define what is required in terms of reliability and how that can be measured most effectively. The two are combined to create Service Level Objectives (SLO), which use the SLI and SLA to define what needs to be built.
- Monitoring: Because of the embrace of automation, distributed systems, cutting-edge tooling, and everything else that defines SRE, monitoring, particularly monitoring of distributed systems, has become a critical component of a successful SRE implementation. Furthermore, you can only know if you are meeting your SLO and SLA by using SLI, which you obtain through monitoring.
- Preparation: Embracing development and bringing developers into the operations team to use their skills to help prepare for outages is critical to SRE. All of this is done to ensure the reliability of the systems being developed.
In a nutshell, SRE is a focus on reliability that employs automation and enlists the assistance of the development team.
DevOps has a much more organic history, originating in a variety of organizations and disciplines. The emphasis is entirely on development, but it intentionally includes every IT team, as well as management and the business, when done correctly, as part of a fundamental shift in how functionality is defined and deployed. As a result, the process is much broader but far less well defined than SRE.
The principals of DevOps then are as follows:
- Communication: The most fundamental aspect of implementing DevOps is breaking down metaphorical walls and eliminating silos. At the heart of the union of people, process, and products is the integration of multiple disciplines to improve communication between those disciplines.
- Automation: Automate everything, but particularly testing. The emphasis on automation is also critical to ensuring that silos between teams are eliminated.
- Continuous Delivery: By utilizing automation and communication, the ability to deliver software and services becomes both faster and safer.
- Fail Early and Often: To better protect production environments, DevOps encourages early failure in development and other environments. This, once again, improves protection for manufacturing environments.
To summaries DevOps, it is a focus on rapid development and deployment using automation to help bring in all the other teams.
Choosing one over the other
Which would I choose if I had to choose between SRE and DevOps? Well, I’d argue that it’s a bad option.
Even though it is not one of the core tenets of SRE, there is nothing that prevents continuous delivery. There’s also nothing in SRE that prevents you from enlisting more business involvement to help you achieve reliability. The rest of SRE is enthusiastic about DevOps’ goals. Automation and planning would fit perfectly within a DevOps framework.
In contrast, there are no rules in DevOps that say you can’t have SLAs, SLIs, or SLOs. In fact, the opposite is true. In order to meet the requirements of automated testing and continuous delivery, you should also implement monitoring. While nothing in DevOps defines specific roles, there is no rule that says you can’t have a Site Reliability Engineer supporting your DevOps process.
In short, I don’t believe you have to choose between these two approaches. I believe you can successfully implement either or both. It really depends on where your problems are and how best to address them within your organization.