I recently needed to download multiple files from an S3 bucket through Ruby. As handy as the AWS SDK is, it doesn’t offer a way to zip multiple files so you have a single download. To avoid downloading them one at a time, I decided to zip them and download that zip.
Since S3 has no native support for processing files into a zip, this has to be done on our server with Ruby. This process is relatively straightforward:
- Connect to S3 and fetch the files we need from the buckets.
- After all the files are fetched, create the zip.
- With the zip created, all we have to do is to download it.
For the actual zipping of the files, I decided to go with RubyZip as it is the most popular and well supported.
Looking for expert guidance or developers to work on your project? We love working on existing codebases – let’s chat.
Connect to S3 and fetch the files
First, let’s download the files in the folder so they can be zipped. Make sure the AWS-SDK and RubyZip gems are required.
# Configure aws
Aws.config.update({
region: 'eu-west-1',
access_key_id: ENV['ACCESS_KEY_ID'],
secret_access_key: ENV['SECRET_ACCESS_KEY']
})
s3 = Aws::S3::Resource.new
bucket = s3.bucket("files")
files = ["photo1.png", "photo2.png", "photo3.png", "photo4.png"]
folder = "uploads/images"
# Download the files from S3 to a local folder
files.each do |file_name|
# Get the file object
file_obj = bucket.object("#{folder}/#{file_name}")
# Save it on disk
file_obj.get(response_target: "tmp_dir/#{file_name}")
end
As a quick test to make sure everything is set up, you should be able to run the script and see the files download in the local directory.
Creating the zip
The zipping won’t be complex at all. We can use the basic zip creation example
from the RubyZip readme for it.
# Create the zip
Zip::File.open("tmp_dir/photos.zip", Zip::File::CREATE) do |zipfile|
files.each do |filename|
# Add the file to the zip
zipfile.add(filename, "tmp_dir/#{filename}")
end
end
Download the zip or reupload it to S3
The zip is ready and created on the server’s file system ready for download. If you want, you could also re-upload it back to S3.
# Create the object to upload
zip_obj = bucket.object("#{folder}/photos.zip")
# Upload it
zip_obj.upload_file(zipfile_name)
or use Rails’ send_file method to download it through the browser.
send_file "tmp_dir/photos.zip"
Streaming
One method I tried first to see if it would be quicker in creating the zip was to stream the files from S3. It works as simply as:
- Download the file objects from S3 in a stream.
- Open an IO Stream for the zip and write the streamed chucks from S3 in the zip.
- At the end, you have an IO object that represents your zipped object. You will also be able to download or reupload it if you so choose to.
When we tested with small files, this method was faster and zipped the files more quickly than writing them to disk. However, when we tested it with large files, the workers completely ran out of memory. So, in the end, we decided not to use this method, as it was memory intensive for the workers (our files are huge). It may be useful if you are planning on zipping small sized files, though.
complete code:
Aws.config.update({
region: 'eu-west-1',
access_key_id: ENV['ACCESS_KEY_ID'],
secret_access_key: ENV['SECRET_ACCESS_KEY']
})
s3 = Aws::S3::Resource.new
bucket = s3.bucket("files")
files = ["photo1.png", "photo2.png", "photo3.png", "photo4.png"]
folder = "uploads/images"
# Open the zip stream
zip_stream = Zip::OutputStream.write_buffer do |zip|
# Loop through the files we want to zip
files.each do |file_name|
# Get the file object
file_obj = bucket.object("#{folder}/#{file_name}")
# Give a name to the file and start a new entry
zip.put_next_entry(file_name)
# Write the file data to zip
zip.print file_obj.get.body.read
end
end
# Create s3 bucket object
zip_file = bucket.object("#{folder}/photos_stream.zip")
# Rewind the IO stream so we can read it
zip_stream.rewind
# Create a temp file for the zip
tempZip = Tempfile.new(['photos_stream','.zip'])
# Write the stringIO data
tempZip.binmode
tempZip.write zip_stream.read
# Send it to s3
zip_file.upload_file(tempZip)
# Clean up the tmp file
tempZip.close
tempZip.unlink
That’s pretty much it. If you have any questions or other methods/techniques for zipping files from S3, please share it with us by leaving a comment or getting in touch on Twitter.