AWS Server-less data pipelines with Terraform to Redshift – Part 2

Alright, it’s time for the second post of our sequence focusing on AWS options to setup pipelines in a server-less fashion. The topics that we are covering throughout this series are:

In this post we complement the previous one, by providing infrastructure-as-code with Terraform for deployment purposes. We are strong believers of a DevOps approach also to Data Engineering, also known as “DataOps”. Thus we thought it would make perfect sense to share a sample Terraform module along with Python code.

To recap, so far we have Python code that, if triggered by a AWS event on a new S3 object, will connect to Redshift, and issue SQL Copy command statement to load that data into a given table. Next we are going to show how to configure this with Terraform code.

As usual, all the code for this post is available publicly in this github repository. In case you haven’t yet, you will need to install terraform in order follow along this post.

Continue reading “AWS Server-less data pipelines with Terraform to Redshift – Part 2”

Get AppScaled ECS Tasks served by AWS Network Load Balancer

This article is intended to be a quick and dirty snippet for anyone going to through the struggle of getting your ECS service, which might have one or more containers running the same App (being part of an Auto Scaling Group), with a Network Load Balancer (instead of the more common ELB or ALB).

ECS Service/Task Definition

Another particularity of this implementation is that I also decided to use the ECS task’s network mode as awsvpc. In the case that you are not acquainted with this new option, this means that:

  • Your container will get its own network interface and its own IP address;
  • The Host port and the Container port need to be the same, since there is not middleware managing port match between the two entities.

The cherry on top is that the ECS Service now has the option of automatically registering and deregistering LB targets by their IP address, which fits perfectly on the intention described.

Network Load Balancer

This post isn’t concretely about describing the technical details of what is a Network Load Balancer but about the caveats of using it in this scenario: because NLB is a layer 4 load balancer, you won’t be able to define Security Groups at the NLB level. Instead, you’ll have to make sure you make your tasks/containers secure by attaching the security groups to them – remember that with the awsvpc network mode, each container will get its own NIC.

Implementation

As for the actual code snippet to support what I’m trying to achieve: Continue reading “Get AppScaled ECS Tasks served by AWS Network Load Balancer”

AWS Server-less data pipelines with Terraform to Redshift – Part 1

This post is the first of sequence of posts focusing on AWS options to setup pipelines in a serverless fashion. The topics that we all cover throughout the whole series are:

In this post we lean towards another strategy to setup data pipelines, namely event triggered. That is, rather than being scheduled to execute with a given frequency, our traditional pipeline code is executed immediately triggered by a given event. Our example consists of a demo scenario for immediately and automatically loading data that is stored in S3 into Redshift tutorial. Continue reading “AWS Server-less data pipelines with Terraform to Redshift – Part 1”