Zip and download files from Amazon S3 with Ruby

01 Sep 16

I recently needed to download multiple files from an S3 bucket through Ruby. As handy as the AWS SDK is, it doesn’t offer a way to zip multiple files so you have a single download. To avoid downloading them one at a time, I decided to zip them and download that zip.

Since S3 has no native support for processing files into a zip, this has to be done on our server with Ruby. This process is relatively straightforward:

Connect to S3 and fetch the files we need from the buckets.
After all the files are fetched, create the zip.
With the zip created, all we have to do is to download it.

For the actual zipping of the files, I decided to go with RubyZip as it is the most popular and well supported.

Looking for expert guidance or developers to work on your project? We love working on existing codebases – let’s chat.

Connect to S3 and fetch the files

First, let’s download the files in the folder so they can be zipped. Make sure the AWS-SDK and RubyZip gems are required.

# Configure aws
Aws.config.update({
  region: 'eu-west-1',
  access_key_id: ENV['ACCESS_KEY_ID'],
  secret_access_key: ENV['SECRET_ACCESS_KEY']
})

s3 = Aws::S3::Resource.new

bucket = s3.bucket("files")

files = ["photo1.png", "photo2.png", "photo3.png", "photo4.png"]

folder = "uploads/images"

# Download the files from S3 to a local folder
files.each do |file_name|
  # Get the file object
  file_obj = bucket.object("#{folder}/#{file_name}")
  # Save it on disk
  file_obj.get(response_target: "tmp_dir/#{file_name}")
end

As a quick test to make sure everything is set up, you should be able to run the script and see the files download in the local directory.

Creating the zip

The zipping won’t be complex at all. We can use the basic zip creation example from the RubyZip readme for it.

# Create the zip
Zip::File.open("tmp_dir/photos.zip", Zip::File::CREATE) do |zipfile|
  files.each do |filename|
   # Add the file to the zip
    zipfile.add(filename, "tmp_dir/#{filename}")
  end
end

Download the zip or reupload it to S3

The zip is ready and created on the server’s file system ready for download. If you want, you could also re-upload it back to S3.

# Create the object to upload
zip_obj = bucket.object("#{folder}/photos.zip")
# Upload it
zip_obj.upload_file(zipfile_name)

or use Rails’ send_file method to download it through the browser.

send_file "tmp_dir/photos.zip"

Streaming

One method I tried first to see if it would be quicker in creating the zip was to stream the files from S3. It works as simply as:

Download the file objects from S3 in a stream.
Open an IO Stream for the zip and write the streamed chucks from S3 in the zip.
At the end, you have an IO object that represents your zipped object. You will also be able to download or reupload it if you so choose to.

When we tested with small files, this method was faster and zipped the files more quickly than writing them to disk. However, when we tested it with large files, the workers completely ran out of memory. So, in the end, we decided not to use this method, as it was memory intensive for the workers (our files are huge). It may be useful if you are planning on zipping small sized files, though.

complete code:

Aws.config.update({
  region: 'eu-west-1',
  access_key_id: ENV['ACCESS_KEY_ID'],
  secret_access_key: ENV['SECRET_ACCESS_KEY']
})

s3 = Aws::S3::Resource.new

bucket = s3.bucket("files")

files = ["photo1.png", "photo2.png", "photo3.png", "photo4.png"]

folder = "uploads/images"

# Open the zip stream
zip_stream = Zip::OutputStream.write_buffer do |zip|

  # Loop through the files we want to zip
  files.each do |file_name|

    # Get the file object
    file_obj = bucket.object("#{folder}/#{file_name}")

    # Give a name to the file and start a new entry
    zip.put_next_entry(file_name)

    # Write the file data to zip
    zip.print file_obj.get.body.read
  end
end

# Create s3 bucket object
zip_file = bucket.object("#{folder}/photos_stream.zip")

# Rewind the IO stream so we can read it
zip_stream.rewind

# Create a temp file for the zip
tempZip = Tempfile.new(['photos_stream','.zip'])

# Write the stringIO data
tempZip.binmode
tempZip.write zip_stream.read

# Send it to s3
zip_file.upload_file(tempZip)

# Clean up the tmp file
tempZip.close
tempZip.unlink

That’s pretty much it. If you have any questions or other methods/techniques for zipping files from S3, please share it with us by leaving a comment or getting in touch on Twitter.

Denis Sellu

Tech Lead

Strategy game and fantasy fiction fan. Builds small internet gadgets that might take over the world.