Streaming HTTP responses with Ruby and Rack

Posted on November 1, 2020 by wjwh


Almost every Ruby program that handles HTTP requests will use the Rack interface to do so. Rack is an interface describing what applications can expect an HTTP request to look like and how they should respond. Having a common interface for this allows any web framework (such as Rails or Sinatra) to interoperate with any of the available servers (like Unicorn, Puma, or Falcon). In this post we’ll look at a subset of HTTP responses, namely those where the response body is not completely known when the response starts or which are too large to completely fit into memory. These responses must therefore be generated dynamically and streamed to the client.

How Rack handles responses

All Rack responses are of the form [<response code>, <hash with headers>, <response body>], where the response body can be any object that will yield zero or more Strings when each is called on it. A very simple example would be an Array with one String in it, like ['Hello world!']. However, it is also possible to make your own classes that have arbitrary complexity in their each method. With this, you can implement more sophisticated behavior like ranged requests and streaming responses. A very simple example would be the following class that just streams the numbers 1 to 10 at one second intervals:

class SlowStreamer
  def each
    (1..10).each do |i|
      yield (i.to_s + "\n")
      sleep 1
    end
  end
end

You could put a SlowStreamer.new into a Rack response and any Ruby server (ie Puma, Unicorn, Thin, Falcon, etc) will stream this for you without problems. It’s not even required to add a Content-Length header because these servers will automatically apply chunked encoding for you. This basic building block can be used for a lot of different applications.

Zip streaming

The structure of a zip file consists of a number of sections, each of which consists of a header, the actual file contents and then an optional footer. At the end of the file is another section called the “central directory”, which contains more metadata. All these values are just sequences of bytes, which means that they can be represented as Strings with the BINARY encoding. By yielding each of those values in turn, it’s possible to dynamically create and stream out an entire zip file in an HTTP response and this is exactly what the ZipTricks gem provides:

body = ZipTricks::RackBody.new do | zip |
  zip.write_stored_file('mov.mp4') do |sink| # Those MPEG4 files do not compress that well
    File.open('mov.mp4', 'rb'){|source| IO.copy_stream(source, sink) }
  end
  zip.write_deflated_file('long-novel.txt') do |sink|
    File.open('novel.txt', 'rb'){|source| IO.copy_stream(source, sink) }
  end
end
[200, {}, body]

This snippet will take care of generating all the zip-specific bits you will need and yield them in turn. It is quite feasible to have a big list of files and just @files.each all of those files into the RackBody. No matter how large these files are, the amount of memory used for the response will remain small.

Ranged requests

HTTP requests can have a Range header attached to indicate that the client does not want the entire response but only a part of it. This is fairly easy to implement if the response is a short String or even a smallish file on disk, but if the response is composed of several (possibly many) long strings or files together, then things get a lot trickier. Luckily, there’s a gem for that too. The interval_response gem can, for example, be given a list of filepaths to be composed together (if you are streaming log files for example) and will automatically compute which parts of which file is requires. It will make sure that only one file at a time is opened and that only a constant amount of memory is required no matter the size of the file.

lazy_files = log_paths.map { |path| IntervalResponse::LazyFile.new(path) }
interval_sequence = IntervalResponse::Sequence.new(*lazy_files)
response = IntervalResponse.new(interval_sequence, env)
response.to_rack_response_triplet

The trick is once again carefully choosing where and how to yield (or not, in this case) your Strings so that in the end only the parts of the body that the user actually requested are actually computed.

Rails and Sinatra streaming helpers

It’s fairly uncommon in the Ruby world to program directly against the Rack interface that web servers expose. Usually, we use a framework that implements a lot of the boilerplate code like routing, logging, etc. Both Rails and Sinatra have defined helpers to let you easily define streaming responses for your app.

Rails

In Rails, include ActionController::Live into any controller that needs to stream its responses, then provide a stream method that will actually write to the response:

def stream
  100.times {
    response.stream.write "hello world\n"
    sleep 1
  }
ensure
  response.stream.close
end

Sinatra

In Sinatra you can call stream in any route handler. It takes a block that takes an out parameter representing the output which you can << the data into:

get '/my-endpoint' do
  stream do |out|
    100.times do |i|
      out << "hello world\n"
      sleep 1
    end
  end
end

Conclusion

Rack’s choice to allow anything that responds to each to function as a response body makes streaming very straightforward, it also allows us to create dynamic reponse bodies very easily. A small caveat is that you should be careful when creating a response body that will yield many small strings. Most servers will assume that you are doing this intentionally and will use the write() syscall on every string yielded, potentially causing a lot of OS overhead. In one example we were able to get a 4-fold increase in requests per second just by combining the smaller chunks into bigger ones before releasing them to the server.

If you require more fancy response types such as websocket upgrades, you’ll probably want to look into socket hijacking as that allows you a lot more control over the socket. Happy streaming!