Site Reliability Engineering (SRE) and its interconnected areas such as Observability, Platform Engineering, and DevOps, have typically operated without Product Managers. I believe that’s happened because IT Operations was seen solely as a cost center and not as a source of competitive advantage.
With the rise of technology giants such as Google, Amazon or Facebook, other companies started adopting similar SRE practices that improve efficiency, security, development speed, and the reliability performance of large-scale systems. Everyone is trying to move at the same speed as big tech and nimble startups. Bets on SRE or DevOps are now seen as investments with positive returns, rather than sunk costs.
There’s little to no literature coming from Google describing how Product Managers can be part of an SRE team. Although there’s been lots to say about the SRE Team Lifecycles and their different topologies, there hasn’t been much around bringing non-engineers into this function. I think that’s going to change soon.
There’s an increasing number of product owners and program managers in SRE and Platform teams because they have to:
An SREs plate is full already so the tasks listed above are arguably stealing time from reliability-ensuring activities. A few weeks ago, not thinking I’d be writing this blog today, I ran a poll on r/sre asking How do you spend most of your time?
The results here were not that surprising. It validated that a large proportion of SREs do actually build and/or manage developer tooling, meaning that they must care for users. Also, those who commented did mention that a portion of their time is spent answering questions, doing admin tasks, and in vendor meetings.
We expect our Technical Product Managers in the platform tribe to have a tight working relationship with the product engineering, infrastructure and security teams as they’re usually the key stakeholders and consumers of the products that our platform teams are building
Product Managers supporting SRE and Platform teams are asked to bring traditional product management techniques, such as user research, roadmap prioritization, and stakeholder alignment into the reliability world. According to several job descriptions I’ve analyzed, their responsibilities often include:
Note: Responsibilities will vary from one organization to another, as well as job titles — SRE Product Lead, Technical Program Manager, SRE Product Owner, etc.
Below is a visual example of how a Product Manager might be part of an SRE team and some of their responsibilities — don’t take the SRE’s work areas as an absolute truth, I know there are many missing and some of these are always shared responsibilities across the team!
Given SRE’s principle of applying software to manage and automate IT, the function has successfully taken on many areas of responsibility. And it has been able to do so with less people than it would normally have been needed to move at the same speed reliably. That means complexity has increased drastically and now there’s a need for a focused strategy, planning and management function within SRE.
I believe that we will start seeing more and more product managers step into this area or, most likely, more engineers formally take on a technical product management role within reliability. My second hypothesis is that the SLO methodology will become the product manager’s best friend because it will allow them to:
More on the above with demos of detech.ai on a future blog post coming soon!
Jen Wohlner, Fastly
Grant Smith, nextgendevops.com
Isabel Lilles, PagerDuty