Sometimes I meet software engineers who have no knowledge of the CAP theorem, even highly experienced engineers who consider themselves distributed systems developers or architects.
I find this shocking. Data storage and access is a critical element of nearly all the systems we build. How can you choose a suitable data store for your application if you aren’t able to assess it’s trade-offs, especially consistency and availability?
It turns out there are a lot of essential concepts, not just CAP, that all software engineers should be aware of. Linearizability, serializability, read committed, snapshot isolation, and much more.
In Designing Data Intensive Applications, Martin Kleppmann presents all of these essential concepts you need to know about modern data stores, and many more. You can spend the next 10 years slowly learning these concepts, often through systems breaking, or you can take a shortcut and read this book (your systems will still break because databases and distributed systems cannot guarantee reliability).
For the benefit of your career, I highly recommend reading the book. You will realise how much you don’t know about databases and distributed systems. And you’ll learn to look beyond the vendor hype and the urban myths, by being able to choose the most appropriate data stores for the needs of your application.
One of the other great things about this book is the order in which concepts are introduced. Before diving into the real core topics of the book - like replication, partitioning, transactions, and distributed systems, the first few chapters show how to understand the benefits and trade-offs provided by datastores, and even the patterns used to implement them.
This book can be challenging and I certainly need to read some sections a few more times, but despite Kleppmann being respected in academia by no means is it a dry academic read. Quite the opposite. This book has a focus on evolvability - databases and patterns which enable us to evolve our products more easily.
Kleppmann also provides one of the best rationales and descriptions of Event Sourcing I have ever heard. He talks about the benefits of CDC (Change Data Capture) for keeping databases aligned, but then highlights that domain events can be superior because they capture atomic data changes at a level relevant to the business rather than low-level database changes.
I just can’t recommend this book highly enough. Whether you are a junior or senior, there are so many concepts in this book you need to at-least be aware of and nobody does a better job job of packaging them up into a practical and accessible read than Martin Kleppmann.