The traditional SQL databases are used by developers as a fully safe storage. The ACID properties are intuitive for us and give us a sense of safety during application development. We know that if during a database transaction a network error occurred and we received an exception, the whole transaction would be rolled back as atomic part of process. To avoid network issues we can retry the operation later and it should solve the problem. It is true in most cases, but there is one when it is not so easy. Let me drill down this subject. Network is unreliable and it can fail in multiple scenarios. Let’s start from a simple scenario where everything is OK.
- Client opens a connection to the database.
- Client starts a transaction.
- Client inserts new data to the database.
- Client commits the transaction.
- Client closes the connection to the database.
This scenario can be presented on following diagram. ACK represents acknowledge message sent back to confirm the success of the operation.
Of course operation can fail in the middle of the transaction. Our operation within the database can be a victim of a deadlock and we will receive an exception. In this case retrying the whole transaction would solve the problem.
Until now everything looks OK. Operation failed, we received an exception and it means that the Database Server rolled back our changes. Then, we successfully retry our operation and the database is in a consistent state. The ‘magic’ can happen when we commit the transaction and network connection fails. There is no problem if the connection fails while sending COMMIT command to SQL Server.
If SQL Server does not receive COMMIT command it rolls back all the changes. Client receives an exception during COMMIT command, so it will know that the transaction failed because of connectivity issue. The retry operation should fix the problem. But there is a really rare case when it becomes very painful. It is when the connection fails in one specific network frame. What would happen if Server received a COMMIT command and sent ACK to the Client, but connection failed just after sending ACK?
From the perspective of the SQL Server operation was completed successfully. Server accepted COMMIT, sent ACK to the Client. It means that SQL Server contains modification requested by the Client in the database. The problem is that the Client did not receive the ACK message. Exception occurred, so from Client’s standpoint operation was rolled back. If the Client retries the operation, a duplicate change appears in the database. The probability that this problem occurs is low, but not zero. It means that retrying operation on exception sometimes does not solve the problem. The trust we are giving ACID properties of the database, requires a careful approach. We should always be careful when a network is in the middle of conversation. The problem is that the network is always in the middle of conversation between services.
For insert operation you can of course add an extra field that will store the unique ID of the Client operation. In this case, when error appears after commit (because of network failure), before retrying, you can verify if this specific operation ID created by the Client is inserted into the database. If it is, you can assume that the database already contains your changes.
Other type of error
If you have unique index on one column and you are inserting a new record and described type of failure with missing ACK appears, than your retry operation will be finished with unique index violation error. If you saw unique index violation error in your log only once for billion inserts and you wondered what could had happen after reading this article your mysterious puzzle has probably been solved 😉 .