AWS Server-less data pipelines with Terraform to Redshift – Part 2

Alright, it’s time for the second post of our sequence focusing on AWS options to setup pipelines in a server-less fashion. The topics that we are covering throughout this series are:

In this post we complement the previous one, by providing infrastructure-as-code with Terraform for deployment purposes. We are strong believers of a DevOps approach also to Data Engineering, also known as “DataOps”. Thus we thought it would make perfect sense to share a sample Terraform module along with Python code.

To recap, so far we have Python code that, if triggered by a AWS event on a new S3 object, will connect to Redshift, and issue SQL Copy command statement to load that data into a given table. Next we are going to show how to configure this with Terraform code.

As usual, all the code for this post is available publicly in this github repository. In case you haven’t yet, you will need to install terraform in order follow along this post.

Continue reading “AWS Server-less data pipelines with Terraform to Redshift – Part 2”

AWS Server-less data pipelines with Terraform to Redshift – Part 1

This post is the first of sequence of posts focusing on AWS options to setup pipelines in a serverless fashion. The topics that we all cover throughout the whole series are:

In this post we lean towards another strategy to setup data pipelines, namely event triggered. That is, rather than being scheduled to execute with a given frequency, our traditional pipeline code is executed immediately triggered by a given event. Our example consists of a demo scenario for immediately and automatically loading data that is stored in S3 into Redshift tutorial. Continue reading “AWS Server-less data pipelines with Terraform to Redshift – Part 1”