Decoding the code

Mariusz Krzanowski blog

Review of Designing Data-Intensive Applications by Martin Kleppmann

Today I finished reading a book ‘Designing Data-Intensive Applications’ written by Martin Kleppmann. I would like to share with you my review of this book.

In my view designing and development of any distributed system is a hard work. My opinion is based on over a decade of experience in this subject. The problem is that all modern applications are distributed in some way. If a database is hosted on a different computer than a web server, there is a communication link. You have two services – database service and application service – which share an unreliable network as a communication channel. The browser hosting client code which connects to the web server creates distributed system as well. 

Developing workflow without workflow engine

Disclaimer

Workflow engines are very advanced tools and I saw a lot of projects where they created great business values. They sometimes simplify development. In this article I do not argue that you should not use them. My goal is to show you that there is an alternative – transformation from workflow into set of services distributed in the future.

When aborted equals committed

The traditional SQL databases are used by developers as a fully safe storage. The ACID properties are intuitive for us and give us a sense of safety during application development. We know that if during a database transaction a network error occurred and we received an exception, the whole transaction would be rolled back as atomic part of process. To avoid network issues we can retry the operation later and it should solve the problem. It is true in most cases, but there is one when it is not so easy. Let me drill down this subject.

SQL Trick – Multiple Values Query

Sometimes system flexibility makes the solution complicated 😉 . Imagine a ‘simple’ situation that you have a system in which a client can define new attribute and assign multiple values to it. Later, the client can assign these values to the Customers. If you do not plan to develop new attributes’ set every time a client wants to make a change, you will need to prepare a ‘dynamic’ structure in the database.

Resolving AD groups membership

Big companies have huge internal structures. The problem they have is that huge structures have to be mapped into a permission model. One company I worked for had over 300k groups in Active Directory. As worldwide organization they have multiple domains in AD forest. Of course, various groups have various memberships so the structure was really complicated.

Scaling out your front end with an actor

Connecting front end and back end side is difficult. The main problem is the protocol itself. It is stateless. Connection is being re-created each time. HTTP/2 changes this behavior a little bit, but this is the transport layer only. Multiple requests/responses can be handled in a different order. It means there are many traps waiting for a developer.

World without DTC

Distributed Transaction Coordination is sometimes slow, but guarantees the system consistency. You do not need to care about infrastructure things. I will describe a problem that appears when you have no DTC and you have two independent databases.

SQL – always sort by unique key to guarantee correct paging

SQL query engine is prepared to return correct results as soon as possible. But correctness means being correct from mathematical perspective. Problem that I will describe is obvious, but it is not what you sometimes expect from SQL engine.

How to handle diagrams in sync with your code

Two weeks ago I watched on YouTube a presentation about how important diagrams are. I guess everyone agrees with the title of this presentation that diagrams deliver much more information that text.

Long ping response and slow network kill your application

Business requires fast software delivery. This sometimes leads to a situation in which developers are focused less on performance and more on business requirements. The problem is that badly designed architecture can unable to fix performance in the future. Many times I have explained that page with a total weight of about 2 MB is too large. I  have explained that server side has a limited bandwidth. For 1Gb/s connection speed at server side you can handle about 50 concurrent users in the same time to show them full content in 1 second. I also tell my colleagues that we have to reduce number of requests. Each request not only consumes browser connection pool limit, but also server resources. When clients are located far from our server, pages can load very slowly. When I was on Galapagos I realized (again) how slow can a network connection be and how important wise web design is.

Page 2 of 3

Powered by WordPress & Theme by Anders Norén