Some tips and tricks for doing Advent of Code with Haskell

Posted on November 30, 2022 by wjwh

As I write this, it is almost time for the yearly Advent of Code again. If you have never done one of those, it is a yearly event where the organizers publish a new programming puzzle every day from the 1st of December until the 25th, with each day being slightly harder than the one before. There is a leaderboard for those who want to try and solve the puzzles as fast as possible, but most people don’t bother. Some use the puzzles to practice a new programming language, to make the most optimized version of the code they can, to make the code “beautiful” in some special way, or even (especially!) just for fun. It is an entirely voluntary activity, so you can do it in any way you want. I usually solve these puzzles in Haskell, as it is the language I enjoy most.

Since every year there seems to be a sizable contingent trying to learn Haskell, I thought it would be nice to share some tips and tricks that I have picked up over the years. This blog post is aimed at intermediate Haskellers; it assumes you understand the basic syntax and have a grasp of what monads and monoids are, but it doesn’t go into super deep type level stuff. It is also not meant to be a style guide for production-grade Haskell code, I merely show off some tricks that have helped me write clearer and more concise code for situations that seem to show up often in Advent of Code problems.

Reducing boilerplate with `<$>`

Many times, Advent of Code puzzles will involve reading some input from a file and then doing something for each line. A very simple example would be to read a file with an integer on each line and return their sum. Reading a file into a String is straightforward with the readFile function, which takes a FilePath and returns the contents of the file. You can then use lines to split the string into separate lines, convert the String to an Integer with read and finally sum them all together. A basic first try would be:

day1part1 = do
  contents <- readFile "input_day1.txt"
  let fileLines = lines contents
  -- Type annotation is needed to let GHC know we want to convert
  -- to a list of Integer and not (say) a list of Floats
  let integers = map read fileLines :: [Integer]
  return $ sum integers

This is probably close to how you would write this in an imperative language and it works fine. However, it is much longer than necessary and uses several intermediate variables that are not really required. One way to shorten this function would be to compose several of the pure functions into one by composing them with the . operator. We can also apply a function to the result of an IO operation with <$>.

day1part1 = do
  integers <- (map read) . lines <$> readFile "input_day1.txt" :: IO [Integer]
  return $ sum integers

This works because <$> works for all functors, readFile is an operation in the IO monad, and all monads are functors. Now it is much clearer what the solution is: just the sum of the lines of the file, after converting them to Integers. This is usually short enough for me, but if you wanted to make this a oneliner you could of course also compose the sum application:

day1part1 = sum . (map read) . lines <$> readFile "input_day1.txt" :: IO Integer

Do be careful though. It is easy to make very long function pipelines this way and they can make your code very difficult to comprehend.

Generating coordinates

Many puzzles in Advent of Code involve doing some operation on a grid. This can be a 2D grid, a 3D grid or even a grid with more dimensions than we can easily visualize. Almost always, you need to compute something for all coordinates on the grid. In imperative languages this would usually take the form of nested for-loops, but in Haskell these can be unwieldy. A more functional way of doing this is to define all the coordinates upfront and then map over them. The simplest way to generate these coordinates is with list comprehensions and with Applicatives.

List comprehensions follow a math-like notation and work a follows:

ghci> [ (x,y) | x <- [1..3], y <- [1..3]]
[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

Another very straightforward way is with Applicatives:

ghci> (,) <$> [1..3] <*> [1..3]
[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

This works by combining the applicative instance for lists and the fact that the tuple constructor (,) is a normal function taking two values. In the above example, [1..3] <*> [1..3] will generate a list of all combinations from both lists, then <$> will feed them to the tuple constructor to generate a list of tuples.

I personally slightly prefer the applicative version, as it is slightly shorter. All that typing does add up over 25 days!

Using laziness to your advantage

A very common pattern in Advent of Code puzzles is to have some state and a step function of type State -> State. The question will then usually be something like “after how many iterations will the state have some property?”. Conway’s Game of Life is a great example of this type of question, see for example … and … . A very neat pattern for solving this kind of question is to use the iterate function. It will take a starting state with some type a and a function from a -> a, and (lazily) generate an infinite list by repeatedly applying the function to the latest state:

ghci> take 5 $ iterate (*2) 1
[1,2,4,8,16]

Using iterate step startState, you can generate a list of all states that the system will reach. Then, the takeWhile and dropWhile functions from Data.List can be used to answer questions about the list of states. For example length . takeWhile verifySomeConditionIsNotSatisfiedYet $ iterate step startState finds how many iterations it takes until the condition is satisfied (for some function verifySomeConditionIsNotSatisfiedYet :: State -> Bool of course), while head . dropWhile verifySomeConditionIsNotSatisfiedYet $ iterate step startState will give you the first state for which the condition is True.

Parsing the input files with parsec

In Advent of Code problems, the input is often given in the form of a text file that will need some parsing before any useful computations can be done with the information. A good example is day 22 from 2021:

on x=-3..47,y=-33..16,z=-29..20
off x=33..46,y=-26..-13,z=-21..-3
on x=-36..16,y=-33..17,z=-7..42
off x=-20..-10,y=-25..-10,z=-11..6
<many more lines with the same format>

While this particular problem would lend itself well to being parsed with regular expressions, Advent of Code inevitably throws in a few problems each year (2021 day 16 for example) that cannot be solved with a regex. Also of course, if you encounter a problem and decide to use a regex you now have two problems. A more robust solution is to use one of the many parsing libraries available. My personal favorites are the ones based upon the idea of parser combinators, which let you build up a big parser from multiple smaller parsers. This allows for solutions that read very natural to humans, as they mirror the way a human would describe the structure of the text to be parsed. For the example above:

First the string “on” or “off”, designating the operation to be done;
A space;
The string "x=" followed by a range for x;
A comma;
The string “y=" followed by a range for y;
A comma;
The string "z=" followed by a range for z.

Where a range is defined as: 1. A signed integer designating the start of the range; 2. The string ".."; 3. Another signed integer designating the end of the range.

This translates almost directly to Haskell:

data Switch = On | Off deriving (Show,Eq)
data Instruction = Ins {
    switch :: Switch, 
    xRange :: (Int,Int),
    yRange :: (Int,Int),
    zRange :: (Int,Int)
  } deriving (Show,Eq)

switchOn :: Parser Switch
switchOn = try $ string "on" *> pure On
switchOff :: Parser Switch
switchOff = try $ string "off" *> pure Off

range :: Parser (Int,Int)
range = do
  min <- integer
  string ".."
  max <- integer
  return (min,max)

instruction :: Parser Instruction
instruction = do
  s <- switchOn <|> switchOff
  string " x="
  xr <- range
  string ",y="
  yr <- range
  string ",z="
  zr <- range
  return $ Ins s xr yr zr

Since parsing input files happens so often, I usually define some utility functions for applying a parser to a file and for applying a parser to each line in a file. Both are very straightforward:

-- Parse an entire file with the given Parser.
parseFile :: Parser a -> FilePath -> IO (Either ParseError a)
parseFile p fp = parse p fp <$> readFile fp

-- Parse each line of a file with the given Parser.
parseFileLines :: Parser a -> FilePath -> IO (Either ParseError [a])
parseFileLines p = parseFile p'
  where p' = p `sepBy` endOfLine >>= \ps -> eof >> return ps

If you want to know more about parser combinators, there are a lot of great posts available describing how they work and how to make custom parsers with them.

String vs Text vs ByteString

As a sidenote, Haskell has long had several different types to represent text. The String type is the oldest and is internally a linked list of Chars. This makes it easy to consume character by character or to pattern match on the first few characters, but is quite memory inefficient. Text is the type you want for storing UTF encoded text in a memory efficient manner. Finally, ByteString is also memory efficient but treats all its contents as just bytes or ASCII characters.

As a rule of thumb, you almost never want to use String. Human readable strings should almost always be Text and binary data should always be ByteString.

Miscellaneous tips and advice

This article is already getting quite long, so I’ll finish with a motley assortment of advice for solving:

For all but the most basic problems, take a moment to think about the best data structure to represent the problem. When your program needs to remember if you have seen an item before, lists and arrays don’t have the best credentials for inserting and lookup time but a (hash)set might be just what you want. The containers and unordered-containers packages offer a host of handy data structures.
The fgl package provides a plethora of useful graph algorithms.
If you are doing a depth first search and the sub-solutions overlap, try memoizing intermediate solutions to prevent having to search an exponentionally growing search space.
If you have a bug and just want to check the state of a variable at some point, Debug.Trace is a great way to quickly do some “printf debugging” even in pure functions where you usually wouldn’t be able to run IO functions.
Every so often, a question pops up that requires a hexagonal grid. The guide over at Red Blob Games has everything you would ever want to know about hexagonal grids and more.

Conclusion

In this post I wrote up a few tips and tricks to solve problems that come up often in Advent of Code problems. I hope you learned something interesting and even if you didn’t, I hope I managed to inspire you to take part in this years’ edition. Happy coding!