Skip to content

debek/aws-glacier-multipart-upload

 
 

Repository files navigation

aws-glacier-multipart-upload

Script for uploading large files to AWS Glacier

Helpful AWS Glacier pages:

Running scripts in parallel:

Motivation

The one-liner upload-archive isn't recommend for files over 100 MB, and you should instead use upload-multipart. The difficult part of using using multiupload is that it is really three major commands, with the second needing to repeated for every file to upload, and a custom byte range needs to be defined for each file chunk that is being uploaded. For example, with a 4MB file (4194304 bytes) the first three files need the following argument. This is repeated 1945 times for my 8GB file.

  • aws glacier upload-multipart-part --body partaa --range 'bytes 0-4194303/*' --account-id - --vault-name media1 --upload-id [your upload id here]
  • aws glacier upload-multipart-part --body partab --range 'bytes 4194304-8388607/*' --account-id - --vault-name media1 --upload-id [your upload id here]
  • aws glacier upload-multipart-part --body partac --range 'bytes 8388608-12582911/*' --account-id - --vault-name media1 --upload-id [your upload id here]
  • 1941 commands later...
  • aws glacier upload-multipart-part --body partzbxu --range 'bytes 8153726976-8157921279/*' --account-id - --vault-name media1 --upload-id [your upload id here]

We need a script to handle the math and autogenerate the code.

This script leverages the parallel library, so my 1945 upload scripts are kicked off in parallel, but are queued up until a core is done with one before proceeding to the next. There is even a progress bar built in that shows you what percent is complete, and an estimated wait time until it is done.

Prerequisites

All of the following items in the Prerequisites section only need to be done once to set things up.

The script depends on jq for dealing with json and parallel for submitting the upload commands in parallel. If you are using Fed/CentOS/RHEL, then run the following:

sudo dnf install jq
sudo dnf install parallel

If you are using Mac, you can install with brew:

brew install jq parallel

The script assumes you have:

  1. an AWS account
  2. the AWS Command Line Interface installed on your machine.
  3. configured your aws cli to pass credentials automatically
  4. have java installed

To install the AWS cli:

pip3 install awscli --upgrade --user

To configure your AWS cli:

aws configure --profile your-aws-profile-name

Before jumping into the script, verify that your connection works by describing the vault you have created, which is backups in my case. Run this describe-vault command and you should see similiar json results.

aws glacier describe-vault --vault-name backups --account-id -
{
"SizeInBytes": 11360932143, 
"VaultARN": "arn:aws:glacier:us-east-1:<redacted>:vaults/backups", 
"LastInventoryDate": "2015-12-16T01:23:18.678Z", 
"NumberOfArchives": 7, 
"CreationDate": "2015-12-12T02:22:24.956Z", 
"VaultName": "backups"
}

Install

for mac

sudo ln -s $(pwd)/glacierupload.sh /usr/local/bin/glacierupload

Script Usage

Tar and zip the files you want to upload:

tar -zcvf my-backup.tar.gz /location/to/zip/*

Then run the script:

export AWS_PROFILE=your-aws-profile-name
glacierupload my-backup.tar.gz your-vault-name

More Advanced Usage

Firstable I prefare compess a file what you want archive.

Encrypt file: tar cf - <file> | pigz -11 -p 32 > <file>.tar.gz

Backup to glacier:

AWS_PROFILE=default ./glacierupload.sh "<file>" "<vault>" "logs/result.txt" "logs/database.txt" "8"
AWS_PROFILE=default ./glacierupload.sh "<file>" "<vault>" "" "" "1"
AWS_PROFILE=default ./glacierupload.sh "/home/ddebny/workspace-23-11-2019.veracrypt" "workspace" "logs/result.txt" "logs/database.txt" "128" "workspace-23-11-2019.veracrypt"

Releases

No releases published

Packages

No packages published

Languages

  • Shell 54.5%
  • Java 45.5%