Book Review: Designing Data-Intensive Applications

Designing Data-Intensive Applications


Martin Kleppmann

My Rating

star star star star star
"The distributed systems and databases book every software engineer needs to read"

Sometimes I meet software engineers who have no knowledge of the CAP theorem, even highly experienced engineers who consider themselves distributed systems developers or architects.

I find this shocking. Data storage and access is a critical element of nearly all the systems we build. How can you choose a suitable data store for your application if you aren’t able to assess it’s trade-offs, especially consistency and availability?

It turns out there are a lot of essential concepts, not just CAP, that all software engineers should be aware of. Linearizability, serializability, read committed, snapshot isolation, and much more.

In Designing Data Intensive Applications, Martin Kleppmann presents all of these essential concepts you need to know about modern data stores, and many more. You can spend the next 10 years slowly learning these concepts, often through systems breaking, or you can take a shortcut and read this book (your systems will still break because databases and distributed systems cannot guarantee reliability).

For the benefit of your career, I highly recommend reading the book. You will realise how much you don’t know about databases and distributed systems. And you’ll learn to look beyond the vendor hype and the urban myths, by being able to choose the most appropriate data stores for the needs of your application.

One of the other great things about this book is the order in which concepts are introduced. Before diving into the real core topics of the book - like replication, partitioning, transactions, and distributed systems, the first few chapters show how to understand the benefits and trade-offs provided by datastores, and even the patterns used to implement them.

This book can be challenging and I certainly need to read some sections a few more times, but despite Kleppmann being respected in academia by no means is it a dry academic read. Quite the opposite. This book has a focus on evolvability - databases and patterns which enable us to evolve our products more easily.

Kleppmann also provides one of the best rationales and descriptions of Event Sourcing I have ever heard. He talks about the benefits of CDC (Change Data Capture) for keeping databases aligned, but then highlights that domain events can be superior because they capture atomic data changes at a level relevant to the business rather than low-level database changes.

I just can’t recommend this book highly enough. Whether you are a junior or senior, there are so many concepts in this book you need to at-least be aware of and nobody does a better job job of packaging them up into a practical and accessible read than Martin Kleppmann.


"While many-to-many relationships and joins are routinely used in relational databases, documentdatabases and NoSQL reopened the debate on how best to represent such relationships in a database.This debate is much older than NoSQL—in fact, it goes back to the very earliest computerized database systems."

"B-trees have stood the test of time very well. They remain the standard index implementation in almost all relational databases, and many nonrelational databases use them too"

"Monotonic Reads: Our second example of an anomaly that can occur when reading from asynchronous followers is that it’s possible for a user to see things moving backward in time."

"a sloppy quorum: writes and reads still require wand r successful responses, but those may include nodes that are not among the designated n “home” nodes for a value."

"Almost all relational databases today, and some non relational databases, support transactions. Most of them follow the style that was introduced in 1975 by IBM System R, the first SQL database"

You may also like...

Programming Erlang

REST in Practice

Org Design for Design Orgs

My Books

Designing Autonomous Teams and Services
Patterns, Principles and Practices of Domain-Driven Design