What we learnt from implementing Ray Dalio’s Error Log method to encourage failures and mistakes in the team, while learning from them.
Mark Zuckerberg, CEO of Facebook, once said: “Move fast and break things. Unless you are breaking stuff, you are not moving fast enough.”
This is a philosophy followed by Facebook and a lot of their speed in execution could be attributed to this mantra. Startups would love speed (one of their key strengths compared to incumbents in the market) and generally feel pumped to subscribe to this thought process. However what no one highlights is the kind of systems that are built internally at Facebook to ensure that a developer is given freedom to build and break things while not taking the business down. I learnt this from my conversations with engineers at Facebook and it seemed like the right step to take for Facebook when they scaled up as a company considering they never want to take away the benefits of making mistakes.
When very early stage startups blindly follow this philosophy, they try to conveniently ignore that a small error on their side is enough to sometimes take their business down (or a considerable part of it). We tend to forget that we are fragile.
At Typito we tried our best to be cautious whenever we pushed something to the public be it a product update / email campaign or anything. Trying to be over-cautious was an overhead for sure but we were not sure if there’s another way to go about it. Life is nice when we don’t commit mistakes or errors right? But it turns out mistakes and errors are bound to happen and you need to learn to cope with them.
It was a fine morning in 2017. I got up around 6:30 AM and checked my email looking for any important messages. I saw 3 emails from customers asking if our app is down. Immediately I opened a browser, tried opening Typito and realised that the SSL certificates expired. My co founder Srijith knew about this but he thought we had one more day before expiry and he was traveling to his hometown that night. We realised the app was down for more than 2 hours and I decided I had to vent this out on Srijith and did the most silly thing you can do at that moment: go to #general channel on Slack and shouted (typing in CAPS) at him, as if that was going to solve the problem. He was reaching his hometown in Kerala in another hour and we had to wait it out before he could resolve the issue (~ 3.5 hours downtime). After this incident our team started being extra cautious while executing and it slowed us down.
Looking back, I realise it was a miserable thing for me to do and I was setting a precedent to our small team that mistakes would be penalised. All this without realising that we are still a small team and there’s only so much we could do to ensure that we do not commit errors. I apologised to Srijith and my team later about the way I behaved in that instance. But the challenge remained: how do you build a culture that encourages failures and mistakes while being able to learn from them.
It was towards the end of 2017 that I got my hands on Principles a book written by Ray Dalio, CEO of the biggest Hedge Fund firm in the World — Bridgewater Associates. I had already read the summary version of Principlesby then and was a big admirer of his efforts to build a team based on meritocracy. But one of the best parts of the book that I could relate with was the section under “Work Principles” where he explains how he tried to build a culture in which it is okay to make mistakes and unacceptable not to learn from them.
He goes onto elaborate how his company ended up losing to the tune of hundreds of thousand of dollars because of a careless mistake by one of his employees. He understood that letting the person go would build a culture that’s averse to failures and came up with a public log where everyone should list down the mistakes they commit, explain how it impacted the customers / company in detail and with reflection on how it could be avoided in the future. Everyone is expected to read the error logs and this has helped them avoid recurring mistakes or failures while also building a culture that encourages the team to move fast and break things (if needed). (Mark should be happy now!)
Let’s see how we adopted Error Logs by Ray Dalio as a culture experiment in Typito and how it turned out.
Following the Error Logs process from Ray’s Principles should help us (Typito) build a culture that tolerates errors or failures and encourages the team to move fast while not making making the same mistakes.
1. Maintain 2 separate documents: one for logs related to tech / engineering and another for non-tech / marketing / growth. We thought it’s easier to keep this segregation for easier look-up. The documents can be called anything you like and can be on any tool you like. In our case, we call these docs “Production Issue Logs” and maintain the engineering one on Dropbox Paper and the non-engineering one on Trello as a list.
2. Whenever a critical issue happens, the team first works towards resolving it. After that, the person who’s most likely responsible for the issue would spend time to summarise what happened and would add a note on the Production Issue Log. Here’s an example of non-engineering issue log related to an error I committed in April 2018 when we were experimenting with webinars on video marketing to help our customers:
We plan to continue with the Production Issue Log process going forward and we highly recommend this to other startups. It helps build better accountability, trust and discipline in your early days as a team.