Engineering

How we leverage our Product Responder role to push our pace of development

Picture of incident.ioincident.io

Like many of our own customers, at its heart, incident.io is a software company. Because of this, it means that our work is never truly “done."

One of our primary goals is to help people coordinate their response to situations where things haven’t gone well, and make it easy to always do the right thing.

But we know that there will always be bugs to fix, features to be introduced and improvements to be made, as evidenced by our changelog. And based on the feedback we get from customers, we're already a good job at regularly addressing all of these things.

That said, something users frequently compliment is our ability to deliver this value at pace and they seem genuinely curious about how we're able to do this consistently.

Here, I'm going to dive into some of the things we have in place to push the pace: our Product Responder role.

I'll explain its purpose, importance, and how it enables us to consistently delight customers by quickly resolving issues or implementing small feature requests that make our users lives better.

What's a Product Responder?

A big part of why we can fix issues quickly is the ‘Product Responder,’ which is a rotative role played by engineers who dedicate their week to resolve issues that are in the way of the reliability of our product, affecting a particular customer or group of customers.

Daily activities for this role usually involves triaging and fixing bugs, coordinating with our customer success team to help prioritise issues and communicate with customers.

The role described above isn’t necessarily unique to incident.io, what is unique though, is the level of experience we’re able to provide to our customers and the pace at which we can deliver.

Why does this role exist?

Every software company has a version of this problem: the more successful they are, the more bugs, technical debt, and edge cases they run into.

Having a group of engineers who are always focused on reactive work has big benefits, namely:

  • Focus: a large part of the team can focus solely on non-reactive work
  • Learning: engineers are introduced to areas of the codebase that they wouldn’t necessarily interact with
  • Customer feedback loops: We listen to customers and bugs actually get fixed because we have a dedicated, rotating number of engineers to tackle this backlog continually.

How does the role work?

Instead of covering the exact mechanisms, or being prescriptive about tooling, I’d like to instead share key factors that contribute to the success of our process.

Clarity: From ticket creation templates that make it easy to do the right thing, to sitting together with the customer success team to be able to react to any potential issues quickly and to frequently review prioritization in a cross-team stand-up.

Ensuring that everyone continually knows our ticket lifecycle and what “good looks like,” allows us to consistently keep tickets moving quickly from left to right.

Centralising the noise: One person within the Product Responder team is assigned the leader role, they hold the pager to ensure that we have a in-hours, on-call individual. This allows one person to be particularly interruptible, while the rest of the responders focus on tackling the issues. This also means that out-of-hours on-call have a group of people to handover any potential incidents to during their work day.

Shared responsibility: Each team puts forward a dedicated Product Responder, and this person changes roughly every week. This means that the product’s reliability is everybody’s responsibility and thus we don’t see engineers “passing the buck” to other people, no matter the area of the product people are always willing to get involved.

Close the loop: Communicate when something is done.Whenever a issue is fixed, we let our users know. Users love hearing back when the problem they reported has been fixed, there’s huge value in “closing the loop“, and this is specially magic if it happens within a few hours. It means that action items rarely fall through the cracks, because we can always ask ourselves: “have we let the customer know?”. This also provides a boost to the engineer who has fixed the issue!

Acknowledging surges in demand: Occasionally, we run into particularly busy weeks and our backlogs starts to grow—acknowledging this and taking the time to clear this backlog is something we do often. “Backlog crush weeks” as they are known internally, are a common occurrence here. If needed, we can also ask for help from the wider engineering team.

Focus on the developer experience: This is more of an wider-engineering trait, but our team focuses on investing in patterns and abstractions that allow us to standardize and speed up development, which ultimately also contributes to our ability to move fast.

Want to keep a high bar? Try out Product Responder roles

We are building a product that customers love. We have put of lot of emphasis in our relationship with our users and will continue to do so.

Our Product Responder process is one way in which we continue to keep a high bar for our product’s reliability, and continue to support our users even when things aren’t working as expected.

Hopefully the article above provides some insight into how this process works.

If you’d like to know more about this, feel free to reach out to us on Twitter or LinkedIn!

Operational excellence starts here