For sure! Considering Google invented / defined SRE, and it is being widely adopted over the next 3 years, you will be given a leg up.

I have two books to recommend:

This is the initial book, basically written by Google:
https://www.amazon.com/Site-Reliability-Engineering-Production-Systems-ebook/dp/B01DCPXKZ6/ref=sr_1_1?crid=96VEB008DH8W&amp;keywords=site+reliability+engineering+book&amp;qid=1554953445&amp;s=gateway&amp;sprefix=site+reli%2Caps%2C149&amp;sr=8-1

There is a free version online as well:
https://landing.google.com/sre/books/

Also, I bought this a few months ago that is a follow up, kind of an introspective view of the concepts in the first:

https://www.amazon.com/Seeking-SRE-Conversations-Running-Production-ebook/dp/B07GQ2YY1D

Between SRE and DevOps, there's going to be a TON of money to be made in the next 3-7 years. I highly recommend giving this a strong consideration.

I'd say a fresh developer coming into an org, even as a pure dev, having a background in SRE/DevOps will give you a leg up in most situations. You can be the guy to spot issues in the planning sessions / scrums / architecture review / etc, and be able to call out holes or potential issues.

Devs in a large company by and large don't consider the support or resiliency to the degree someone who practices operations does. There's a million gotchas.

SRE/DevOps insight allows you to understand the underpinning of delivery and support of an application at another level. The whole concept of these roles is to catch problems before they happen (via monitoring, instrumentation, oncall trees/escalation, dependencies, resolution practices, understanding of DR and availability, resiliency, yada yada), and how to quickly correct and future-proof the code against similar or related issues going forward.

SRE isn't about being the first line on-call guy. It's about building reliable systems and holding devs and product owners accountable to how they spend their time. The balance between features and "issues" is the error budget. Exceed downtime constraints and they must cut into feature time to make the app more resilient.

It's a lot more than that, but that's the core fundamental of how Ops can push back on Dev (and Product), to a point you do not need much more than a level 1 & 3 ops team (1 being "something is amiss, is it fiber or a DC outage? No? Call level 2 (devs)", and 3 being "it's not a code problem (or it is and we can fix it quicker with infra until we can code around it), get infra/SRE involved").

The end goal is to make sure the code going into prod has been well thought out to cover scale, use case, integration, dependencies, security (I added that), recovery time, data loss, etc.

I've been on both sides of the fence and straddled it for 15 years. Most issues are solved with code. Putting the dev teams on 2nd line is the best route. Aside from a fiber break, in a resilient cloud environment, the issue can be resolved fastest with code. Put them up front, and they will write more resilient code so they don't get woken up.

This is going to be table stakes in 5 years. If you are starting now, get in now at the forefront of how to tackle these problems. You will be the Golden Child.

Related categories:

Reddit mentions of Seeking SRE: Conversations About Running Production Systems at Scale

idea-bulb Interested in what Redditors like? Check out our Shuffle feature

Found 1 comment on Seeking SRE: Conversations About Running Production Systems at Scale:

Interested in what Redditors like? Check out our Shuffle feature