Advent of Haskell 2020 Day 11: The `retry` package.

Posted on December 11, 2020 by wjwh


The pure computations we all know and love have the attractive property that they will always return the same value when called with the same arguments. This allows the compiler to rearrange or memoize such functions because the result will not change. Of course, not all computations are pure. In Haskell impure functions are typically represented with monads, and monadic computations can have internal state impacting the result of a functions, access other machines over the internet or vary their result based on the current time.

Especially in the context of distributed systems it is common that a call between systems might fail. There are, however, different types of failure: some errors are ‘fatal’ while others might be remedied if the request is tried again. For example, trying to fetch a file that has been deleted will never succeed. On the other hand, if a server is temporarily overloaded then backing off for a while might help and a second request might succeed. In this blog post we will have a look at the retry package, which implements some higher order functions that allow for retrying arbitrary monadic actions with various retry policies.

Various ways of retrying

Back in 2015 the folks over at AWS posted a famous article investigating different ways of retrying and their impacts on the total completion time. The most basic way of retrying actions would be to just wait a predetermined time between consecutive tries, repeating infinitely until the action succeeds. Usually retrying for an indefinite amount of time is not desirable, so you might want to abort the call and just let it fail after some predetermined amount of retries or time. This strategy does have a significant drawback: when a lot of systems all keep trying to make a call to a remote system this might put a lot of load on the target system. If it is already overloaded this is not desirable. A common improvement is to slowly increase the time between consecutive retries. Exponential increases are often used, by simply doubling the time between retries each iteration.

To prevent the thundering herd problem of many servers trying their retries very close together, it is advisable to also add a little bit of randomness to the retry delays in addition to doubling the retry delays each time. This way of retrying, known as exponential jittered backoff, has since gained a lot of popularity and has proven to be the best way of structuring retry behavior in many situations.

The retry package

Implementing retry behavior is not particularly tricky but there a still a few gotchas, and in any case it is not desirable to spend time on remaking code that someone has already made. Luckily, the retry package has our backs. It exposes the Control.Retry module with various functionality for defining retry policies and retrying monadic actions based on custom conditions. The main function is retrying, which takes a RetryPolicyM, a function to determine whether to retry or not and the action that may need retrying. This is a mouthful, so let’s look at an example:

-- This is the function that will be retried.
-- It prints the number of seconds since midnight
-- to STDOUT, then returns True.
f :: RetryStatus -> IO Bool
f _ = do 
  now <- utctDayTime <$> getCurrentTime
  putStrLn $ "Current time: " <> show now
  return True

-- The retry policy, not applying random jitter yet.
-- The 200000 is the initial delay in microseconds
exponentialPolicy :: RetryPolicy
exponentialPolicy = exponentialBackoff 200000 <> limitRetries 5

-- The function to decide whether to retry or not.
-- This takes the current status of the retries (how 
-- often has been retried already and for how long) and
-- returns whether we should retry or not.
retryDecider :: RetryStatus -> b -> m Bool
retryDecider _ x = return x

-- An example use:
main :: IO ()
main = do
  result <- retrying exponentialPolicy retryDecider f
  putStrLn $ show result

When run, this will result in something like the following:

*Main> main
Current time: 69410.008554412s
Current time: 69410.209382466s
Current time: 69410.610397411s
Current time: 69411.411808059s
Current time: 69413.013661382s
Current time: 69416.217071042s
True

We can see that it will start at a delay of 200ms and double the time waited after every iteration. After five retries it has exhausted the limitRetries 5 part of the policy and so the result is returned even though our retryDecider function would still indicate it needs to be retried. We can very easily add jitter to our delays by replacing exponentialBackoff with fullJitterBackoff; no other changes are required. The documentation has a lot more combinators available to finetune the policy to just the way you want it.

Retrying based on context

The RetryPolicyM instances we’ve seen so far are quite static; they are defined upfront and don’t change during execution. However, in some cases it would be beneficial to be more dynamic about it, for example if an HTTP request returns a 429 (Too many requests) status code. Such a response may contain a “Retry-After” HTTP header which specifies the number of seconds the client should wait until performing the next request. For cases like this the retryingDynamic function allows overriding the original policy and substituting a delay based on the result of the monadic action supplied.

It is also worth mentioning that RetryPolicyM, as the name implies, is a monad. It technically works with any monad but retrying and its variants restrict it to any monad that is a MonadIO. As such, you can base your retry timings on anything that you can get out of an IO action. You could have multiple threads coordinating their retries via a shared IORef, or even across multiple machines communicating through Redis or a database. The sky is the limit here.

Finally, if the action you are retrying does not just communicate errors through its return values but may throw exceptions too, the recovering function basically does what retrying does but allows for catching exceptions as well.

Conclusion

If your Haskell program accesses other systems via a network, chances are that retries are already something you need. The retry package offers a nice set of functions and combinators to ‘skip the boilerplate’ when writing retry logic and lets you focus on the original code.