Alright, it’s time for the second post of our sequence focusing on AWS options to setup pipelines in a server-less fashion. The topics that we are covering throughout this series are:
- Part 1: Python Lambda to load data into AWS Redshift datawarehouse
- Part 2: Terraform setup of Lambda function for automatic trigger
- Part 3: Example AWS Step function to schedule a cron pipeline with AWS Lambda
In this post we complement the previous one, by providing infrastructure-as-code with Terraform for deployment purposes. We are strong believers of a DevOps approach also to Data Engineering, also known as “DataOps”. Thus we thought it would make perfect sense to share a sample Terraform module along with Python code.
To recap, so far we have Python code that, if triggered by a AWS event on a new S3 object, will connect to Redshift, and issue SQL Copy command statement to load that data into a given table. Next we are going to show how to configure this with Terraform code.
As usual, all the code for this post is available publicly in this github repository. In case you haven’t yet, you will need to install terraform in order follow along this post.
Continue reading “AWS Server-less data pipelines with Terraform to Redshift – Part 2”