When retry never succeeds

Introduction

Software development is a craftsmanship in which we are trying to satisfy clients’ needs. Resiliency is one of those needs. We are aware that sometimes database connection can fail or we can have other transient errors like being chosen as a deadlock victim. In case of transient errors a good practice is to retry a failed process, to try to succeed in the next try. In .NET world many developers use e.g. Polly https://github.com/App-vNext/Polly to add a retry policy for chosen methods.

Problem with IoC and Scoped Lifecycle

A bug we found a few months ago in the legacy code has proven that developers usually are focused on success scenarios and do not spend a lot of time on failure analysis in advance.

Tight deadlines force them to focus on the sunny scenarios and to handle errors when they appear. They assume that everything can be handled flawlessly by using frameworks. Adding retry mechanisms in the middleware of ASP .NET Core should solve all issues.

Of course if everything worked as expected this article would never be created. The code I am talking about used Entity Framework DbContext injected via IoC Container. Maybe you already feel what was wrong. For those who never got such problem the hint is that EF Core uses Scoped Lifecycle, so it can live e.g. for the whole HTTP request lifespan. The consequence is that when error occurred and retry mechanism was triggered the Scope was not recovered to the initial state. All already loaded data to EF DbContext were still present in the memory and saving context caused problems. ‘Save operation’ called while retrying the process tried to write those broken data to the database too. The consequence was that another error occurred, and retry never helped to solve the issue. We got lucky in this case, because the operation was never stored successfully in the database, so we never reached an inconsistent state in the database. Imagine what could happen if the retry succeeded by writing invalid data. Even if the method used in retry used Begin Transaction and Commit Transaction inside ‘using statement’, the broken data could be a part of the written state. A developer could think that everything is alright, because transaction scope is different. The truth is that the ‘Save method’ called after Begin Transaction in the retry process could have also saved dirty state from previous execution of the method.

Conclusion

When you decide to implement any retry mechanism, keep in mind that all instances using Scope Lifespan in IoC must be reset to the initial state. If you implement the retry policy in ASP .NET Core Middleware, just dispose of the old scope instances and recreate them – it is the safest way to execute the retry process. Remember that IoC Containers allows you to create scope manually by https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.dependencyinjection.iservicescope?view=dotnet-plat-ext-7.0 . By following this rule you can avoid many errors in the future, especially when a new team member is not aware of retry limitations, caused by a different objects lifespan.

When retry never succeeds

Introduction

Problem with IoC and Scoped Lifecycle

Conclusion

Share this:

Azure Functions V3 and disappearing function.json files

TCP limits that can break your Kubernetes cluster