Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take into account failSafeDefaultValue for factory timeout #67

Merged
merged 1 commit into from
Jul 10, 2022
Merged

Conversation

alexmaek
Copy link
Contributor

@alexmaek alexmaek commented Jul 6, 2022

Hi Jody!

We started using FusionCache at my company and encountered a strange case when dealing with factory tiemouts and failsafe values.

What happens is that when doing a GetOrSetAsync call specifying a factory and its soft timeout and a failsafe value, I expected the factory to terminate after the soft timeout expires and for the call to return the default failsafe value, but it keeps running until finished. This prevents being able to run a fail-fast scenario in first time cache calls.

I've prepared a PoC based on the scenarios on ZiggyCreatures.FusionCache.Playground, this is the simplest case I could came up with, no distributed cache, only in memory and no other options:

using System;
using System.Diagnostics;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;
using Serilog;
using Serilog.Events;

namespace ZiggyCreatures.Caching.Fusion.Playground.Scenarios
{
	public static class SoftTimeoutTestScenario
	{
		private static readonly bool UseLogger = true;

		private static void SetupSerilogLogger(IServiceCollection services, LogEventLevel minLevel = LogEventLevel.Verbose)
		{
			Log.Logger = new LoggerConfiguration()
				.MinimumLevel.Is(minLevel)
				.Enrich.FromLogContext()
				.WriteTo.Console()
				.CreateLogger()
			;

			services.AddLogging(configure => configure.AddSerilog());
		}

		private static void SetupStandardLogger(IServiceCollection services, LogLevel minLevel = LogLevel.Trace)
		{
			services.AddLogging(configure => configure.SetMinimumLevel(minLevel).AddConsole(options => options.IncludeScopes = true));
		}

		public static async Task RunAsync()
		{
			Console.OutputEncoding = Encoding.UTF8;

			// DI
			var services = new ServiceCollection();

			SetupSerilogLogger(services);

			var serviceProvider = services.BuildServiceProvider();

			var logger = UseLogger ? serviceProvider.GetService<ILogger<FusionCache>>() : null;

			// CACHE OPTIONS
			var options = new FusionCacheOptions();

			using (var fusionCache = new FusionCache(options, logger: logger))
			{
				var sw = Stopwatch.StartNew();

				var myValue = await fusionCache.GetOrSetAsync(
					"thekey",
					async (ctx, ct) =>
					{
						await Task.Delay(TimeSpan.FromMilliseconds(5000));

						return "thevalue";
					},
					failSafeDefaultValue: null,
					options => options
						.SetFailSafe(true)
						.SetFactoryTimeouts(TimeSpan.FromMilliseconds(100))
						.SetDuration(TimeSpan.FromMinutes(1))
					);

				sw.Stop();
				Console.WriteLine($"Fusiocache returned an object with value {myValue} in {sw.ElapsedMilliseconds} ms");

				Console.WriteLine("\n\nTHE END");
			}
		}
	}
}

When running this code you'll get a message saying

Fusiocache returned an object with value thevalue in 5117 ms

or some similar time value, even though .SetFactoryTimeouts(TimeSpan.FromMilliseconds(100)) was specified.

Diving into the library's code if narrowed down the tiemout selection to this call: var timeout = options.GetAppropriateFactoryTimeout(memoryEntry is object || distributedEntry is object);. I understand that the factory soft timeout is only set up when there's a "normal" failsafe value, one that was returned correctly in a previous call. But it doesn't take into account a possible failSafeDefaultValue.

In this pull request proposal I've added the check to take into account the failSafeDefaultValue, but would like to get your input on this in case I'm missing something.

@jodydonetti
Copy link
Collaborator

Hi @alexmaek and thanks for using FusionCache!

Sorry for the delay in answering but covid finally got me so, yeah 🤷

Anyway, let me tell you: honestly, good catch!

When laid out like you did it makes total sense, and it feels so strange not to have thought about it before but yeah, I agree that failSafeDefaultValue should be taken into account when reasoning about fallback values and timeouts.
I think that is because the failSafeDefaultValue has been an addition I made later on because of a community request, so I missed connecting the dots immediately.

Let me reason about this a little bit more, to see if that can also impact somewhere else in the code, and I will get back to you asap.

Thanks!

@jodydonetti jodydonetti added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Jul 10, 2022
@jodydonetti
Copy link
Collaborator

Ok, I took some time and everything seems to make sense!

I'm merging this and will release a new version soon along with some other minor things I'm finishing.

Good catch and thanks again!

@jodydonetti jodydonetti merged commit 5ebd71a into ZiggyCreatures:main Jul 10, 2022
@alexmaek
Copy link
Contributor Author

That's great, thanks to you for building this great library! I'll keep my eyes peeled for the next release.

@jodydonetti
Copy link
Collaborator

Hi @alexmaek , I just release v0.11.0 which includes the fix for this, and your very first contribution 🎉

Hope this helps.

@jodydonetti jodydonetti self-assigned this Jul 13, 2022
@jodydonetti
Copy link
Collaborator

jodydonetti commented Jul 13, 2022

Hi @alexmaek , now that I've released the new version with the fix I happened to stumble upon a certain case which I would like to discuss with you.

My scenario

In one of the apps where I'm using FusionCache I've found a GetOrSet call where I also specified a failSafeDefaultValue , which in that particular case was null. I did that simply as a way to say "hey, if nothing else works, just give me back null instead of throwing an exception" which I think makes sense, right?

The thing that surprised me was that in places where usually I got back some data in the first GetOrSet call I instead started getting back null: that was because I am using a soft timeout and previously null was only used as a "last chance" value (meaning the db call failed) to avoid throwing an exception, but now it's instead considered in the same way as a stale value, which (at least in my case) it is not.

So, I would like the failSafeDefaultValue to be used only as a last resort instead of throwing an exception, and not as something that would activate the soft timeout, which I would still prefer to be used only if there's a real stale data.

Next steps

Now, before changing it back immediately to the previous behaviour or start adding new options to configure how the failSafeDefaultValue should be used, I wanted to come back to you for some reasoning together, since I can see a reason for both expectations.

What do you think?

@alexmaek
Copy link
Contributor Author

Hi @jodydonetti I intended to take a look at this with a work colleague during the week, but we've been having quite a rough one.

We'll try to get to it the following one, but don't hesitate to take the decision you think best if you need to fix this quickly.

@jodydonetti
Copy link
Collaborator

Hi @alexmaek , thanks for the update.

I agree with you and so, to avoid unwanted surprises for existing users, I've quickly released v0.11.1 that (temporarily?) rollback the change introduced in v0.11.0.

When you and your colleague will have some time I'd like to discuss this a bit more, since as I've said the change was absolutely reasonable, but I think there should be a way to better control it or anyway a better heuristic for when to use it.

Take your time and thanks again.

@alexmaek
Copy link
Contributor Author

Hi @jodydonetti I've talked with my colleagues but couldn't find a catchall solution. We'll keep investigating alternative ways to make this work.

For now we've devised the following: We're using the soft timeout in order to end the cache query quickly, but continue executing the factory in background. In order to emulate this behavior, we could use a hard timeout to interrupt the cache query (and the factory call) and afterwards relaunch the factory manually in background, saving its result to cache once its finished.

It's not the greatest solution: the factory is called twice and interrupted once and some of these changes may negatively impact code readability. But for now it's all we could come up without proposing any deep internal changes to FusionCache.

As I said, we'll keep thinking it over, let's hope we come up with some other ideas. I'll get back to this thread in a few days.

@jodydonetti
Copy link
Collaborator

Hi @alexmaek , one quick thing I haven't thought of before, and it's very simple: why not just set both the soft + hard timeouts (to the same value) and provide a failSafeDefaultValue?

I mean, let's say you set both of them to 100 ms, ok? That would just return after 100 ms no matter what, and in case the factory has not yet finished running the failSafeDefaultValue will be returned, while keeping the factory running in the background (which will also update the cache automatically as soon as it will finish).

Wouldn't this cover every use case?

In order to emulate this behavior, we could use a hard timeout to interrupt the cache query (and the factory call) and afterwards relaunch the factory manually in background, saving its result to cache once its finished.

You should not need to do this manually, it should already work like this in the way I described above: just set both soft and hard timeouts and specify a failSafeDefailtValue.

Please give it a try and let me know.

@jodydonetti
Copy link
Collaborator

jodydonetti commented Jul 19, 2022

To be more explicit: it would be like saying "hey FusionCache, whether you have a stale value to use as a fallback or not, stop waiting after 100 ms. If you succeed running the factory then save the result and give it to me, and in case you stopped waiting after 100 ms, save the failSafeDefaultValue and give me that, and keep the factory running in the background and if it finish successfully update the cache."

Does this make sense?

@alexmaek
Copy link
Contributor Author

Well, that's exactly what we need. I've wrongly assumed that the hard timeout would interrupt the factory.

I've tested it and works like you described. I'll share the news with my colleague, but we can probably solve our use case this way.

If that's the case then I can only offer my apologies for this whole confusion 😅

@jodydonetti
Copy link
Collaborator

Well, that's exactly what we need. I've wrongly assumed that the hard timeout would interrupt the factory.

Nope, both would keep the factory running in the background or not, based on the AllowTimedOutFactoryBackgroundCompletion option, which is true by default (so you don't have to do anything).

Now that we are talking about this I'm thinking about making it more clear in the docs where it says:

In both cases it is possible to set the bool flag AllowTimedOutFactoryBackgroundCompletion: it is enabled by default, so you don't have to do anything, and it lets the timed-out factory keep running in the background and update the cached value as soon as it finishes. This will give you the best of both worlds: a fast response and fresh data as soon as possible.

Maybe I can add a more explicit note or an example, will think about that.

I've tested it and works like you described. I'll share the news with my colleague, but we can probably solve our use case this way.

Awesome! In case that is not the case let me know and we'll work something out.

If that's the case then I can only offer my apologies for this whole confusion 😅

Ahah no worries, and actually I'm sorry it took me this long to get your scenario: I'm the one who created FusionCache, so I should've suggested this solution way before.
Let's say I was extra tired because of Covid. Yeah, let's say that 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants