Advent of Haskell 2020 Day 11: The `retry` package.
The pure computations we all know and love have the attractive property that they will always return the same value when called with the same arguments. This allows the compiler to rearrange or memoize such functions because the result will not change. Of course, not all computations are pure. In Haskell impure functions are typically represented with monads, and monadic computations can have internal state impacting the result of a functions, access other machines over the internet or vary their result based on the current time.
Especially in the context of distributed systems it is common that a call between systems might fail. There are, however, different types of failure: some errors are ‘fatal’ while others might be remedied if the request is tried again. For example, trying to fetch a file that has been deleted will never succeed. On the other hand, if a server is temporarily overloaded then backing off for a while might help and a second request might succeed. In this blog post we will have a look at the retry
package, which implements some higher order functions that allow for retrying arbitrary monadic actions with various retry policies.
Various ways of retrying
Back in 2015 the folks over at AWS posted a famous article investigating different ways of retrying and their impacts on the total completion time. The most basic way of retrying actions would be to just wait a predetermined time between consecutive tries, repeating infinitely until the action succeeds. Usually retrying for an indefinite amount of time is not desirable, so you might want to abort the call and just let it fail after some predetermined amount of retries or time. This strategy does have a significant drawback: when a lot of systems all keep trying to make a call to a remote system this might put a lot of load on the target system. If it is already overloaded this is not desirable. A common improvement is to slowly increase the time between consecutive retries. Exponential increases are often used, by simply doubling the time between retries each iteration.
To prevent the thundering herd problem of many servers trying their retries very close together, it is advisable to also add a little bit of randomness to the retry delays in addition to doubling the retry delays each time. This way of retrying, known as exponential jittered backoff, has since gained a lot of popularity and has proven to be the best way of structuring retry behavior in many situations.
The retry
package
Implementing retry behavior is not particularly tricky but there a still a few gotchas, and in any case it is not desirable to spend time on remaking code that someone has already made. Luckily, the retry
package has our backs. It exposes the Control.Retry
module with various functionality for defining retry policies and retrying monadic actions based on custom conditions. The main function is retrying
, which takes a RetryPolicyM
, a function to determine whether to retry or not and the action that may need retrying. This is a mouthful, so let’s look at an example:
-- This is the function that will be retried.
-- It prints the number of seconds since midnight
-- to STDOUT, then returns True.
f :: RetryStatus -> IO Bool
= do
f _ <- utctDayTime <$> getCurrentTime
now putStrLn $ "Current time: " <> show now
return True
-- The retry policy, not applying random jitter yet.
-- The 200000 is the initial delay in microseconds
exponentialPolicy :: RetryPolicy
= exponentialBackoff 200000 <> limitRetries 5
exponentialPolicy
-- The function to decide whether to retry or not.
-- This takes the current status of the retries (how
-- often has been retried already and for how long) and
-- returns whether we should retry or not.
retryDecider :: RetryStatus -> b -> m Bool
= return x
retryDecider _ x
-- An example use:
main :: IO ()
= do
main <- retrying exponentialPolicy retryDecider f
result putStrLn $ show result
When run, this will result in something like the following:
*Main> main
Current time: 69410.008554412s
Current time: 69410.209382466s
Current time: 69410.610397411s
Current time: 69411.411808059s
Current time: 69413.013661382s
Current time: 69416.217071042s
True
We can see that it will start at a delay of 200ms and double the time waited after every iteration. After five retries it has exhausted the limitRetries 5
part of the policy and so the result is returned even though our retryDecider
function would still indicate it needs to be retried. We can very easily add jitter to our delays by replacing exponentialBackoff
with fullJitterBackoff
; no other changes are required. The documentation has a lot more combinators available to finetune the policy to just the way you want it.
Retrying based on context
The RetryPolicyM
instances we’ve seen so far are quite static; they are defined upfront and don’t change during execution. However, in some cases it would be beneficial to be more dynamic about it, for example if an HTTP request returns a 429 (Too many requests
) status code. Such a response may contain a “Retry-After” HTTP header which specifies the number of seconds the client should wait until performing the next request. For cases like this the retryingDynamic
function allows overriding the original policy and substituting a delay based on the result of the monadic action supplied.
It is also worth mentioning that RetryPolicyM
, as the name implies, is a monad. It technically works with any monad but retrying
and its variants restrict it to any monad that is a MonadIO
. As such, you can base your retry timings on anything that you can get out of an IO
action. You could have multiple threads coordinating their retries via a shared IORef
, or even across multiple machines communicating through Redis or a database. The sky is the limit here.
Finally, if the action you are retrying does not just communicate errors through its return values but may throw exceptions too, the recovering
function basically does what retrying
does but allows for catching exceptions as well.
Conclusion
If your Haskell program accesses other systems via a network, chances are that retries are already something you need. The retry
package offers a nice set of functions and combinators to ‘skip the boilerplate’ when writing retry logic and lets you focus on the original code.