Data Pipeline to stream data of a file from S3 bucket to Google Cloud Storage using AWS Lambda whenever a file is uploaded to S3
Need to develop a pipeline which can transfer the file from s3 to google cloud storage whenever a file is uploaded to the google cloud storage. There are some great tools that can be used but these tools doesn't support transferring of file on upload event in s3 or they are third party tools or they are big data tools which my organization was reluctant to use. Therefore, I developed a streaming application which would download the contents of the file in s3 bucket in chunks depending upon the memory of the lambda function and upload the chunks the to the gcs and continue this process until the file has been completely copied from s3 to google cloud storage. For this purpose stream library of nodejs has been used.
- Create a destination bucket in Google Cloud Storage
- Create a service account with write access to Google Cloud Storage
- Install Serverless (See references for how to install)
- Save private_key and client_email of gcp service account in aws secret manager
- run
npm install
- Replace following parameters in serverless.yml file.
- gcsBucket : Destination bucket in google cloud storage.
- role : IAM role to be associated with lambda function.
- S3SourceBucket : Source S3 bucket (check serverless documentation if bucket already exists).
- projectId: Project ID of GCP project
- secretName: secret name is aws secret manager used to store service account details
- Optional : Replace other parameters like service name, function name, env variables as per requirement.
- run
sls deploy
- Test the code with
sls invoke -f functionName --logs
The following are the runtimes of lambda function which specify how much time it took for lambda function to run when a file of x MB is uploaded to S3 and Lambda is allocated y MB of memory.
File Size | Memory (MB) | Run time duration (ms) |
---|---|---|
500 MB | 128 MB | 80500 |
500 MB | 256 MB | 41200 |
500 MB | 512 MB | 20800 |
500 MB | 1024 MB | 12200 |