List all your AWS resources with Go

Picking up on Diogo’s last post on how to obliterate all resources on your AWS Account, I thought it could also be useful to, instead, list all you have running.

Since I’m long overdue on a Go post, I’m going to share a one file app that uses the Go AWS SDK for to crawl each region for all taggable resources and pretty printing it on stdout, organised by Service type (e.g. EC2, ECS, ELB, etc.), Product Type (e.g. Instance, NAT, subnet, cluster, etc.).

The AWS SDK allows to retrieve all ARNs for taggable resources, so that’s all the info I’ll use for our little app.

Note: If you prefer jumping to full code code, please scroll until the end and read the running instructions before.

The objective

The main goal is to get structured information from the ARNs retrieved, so the first thing is to create a type that serves as a blue print for what I’m trying to achieve. Because I want to keep it simple, let’s call this type SingleResource.

Also, since we are taking care of the basics, we can also define the TraceableRegions that we want the app to crawl through.

Finally, to focus the objective, let’s also create a function that accepts a slice of []*SingleResource and will convert will print it out as a table to stdout:

Continue reading “List all your AWS resources with Go”

Keeping AWS costs low with AWS Nuke

A common pattern in several companies using AWS services is having several distinct AWS accounts, partitioned not only by teams, but also by environments, such as develop, staging, production.

This can very easily explode your budget with not utilized resources. A classic example occurs when automated pipelines – think of terraform apply, or CI/CD procedures, etc – fail or time out, and all the resources created in the meanwhile are left behind.

Another frequent example happens in companies recently moving to the cloud. They create accounts for the sole purpose of familiarizing and educating developers on AWS and doing quick and dirty experiments. Understandably, after clicking around and creating multiple resources, it becomes hard to track exactly what was instantiated, and so unused zombie resources are left lingering around.

 

AWS Nuke to the rescue

There are several options for tools to assist you with cleaning up AWS environment, such as from aws-nuke from rebuy-de and cloud-nuke from gruntwork. From the documentation aws-nuke supports destroying many more AWS resources compared to cloud-nuke, which was my intended use case. However cloud-nuke is being developed with a broader scope, with support for Azure and GCP in mind. When this post was released, this still remained a declaration of intentions though.

AWS Nuke is quite easy and intuitive to work with. To install it:

aws-nuke supports AWS CLI credentials, as well as profiles, something typical when managing several AWS accounts.  Here is how you can run a first scan of the resources to be deleted:

aws-nuke --config config.yaml --profile 

 

When you run it for the first time you might stumble upon the following message:

aws-nuke version v2.7.0 – Fri Nov 23 10:28:30 UTC 2018 – 0b0806d56f85a329de6d1eedbf8559d46988a7f4

Error: The specified account doesn’t have an alias. For safety reasons you need to specify an account alias. Your production account should contain the term ‘prod’.

As explained here, the root cause might very likely be that you do not have a AWS account alias configured.

If things went as they should, you should be prompted with:

Do you really want to nuke the account with the ID 12345678910 and the alias ‘<your-aws-account-specific-profile>’?
Do you want to continue? Enter account alias to continue.

 

At this point two things are important to be mentioned. The first is is that depending how many resources should be removed, this might take a while. The second is that no resource will be actually deleted at this point. You should get a message similar to the following:

Scan complete: 436860 total, 436860 nukeable, 0 filtered. The above resources would be deleted with the supplied configuration. Provide –no-dry-run to actually destroy resources.

Thus, to conduct the irreversible destruction:

aws-nuke --config config.yaml --profile  --no-dry-run

Here is an example of a config.yaml template to remove all resources except a given IAM:

Alternatively you can also specify only some target resources to be removed. Here is an example:

That’s all for today. For more information about aws-nuke, have a look in the repository.

 

 

Decrypting correctly parameters from AWS SSM

Today is yet short one, but ideally will already save a whole lot of headaches for some people.

Scenario: You have stored the contents of a string using AWS SSM parameter store (side note: if you are not using it yet, you should definitely have a look), but when retrieving it  decrypted via CLI, you notice that the string has new lines (‘\n’) substituted by spaces (‘ ‘).

In my case, I was storing a private SSH key encrypted to integrate with some Ansible scripts triggered via AWS CodePipeline + CodeBuild. CodeBuild makes it realy easy to access secrets stored in SSM store, however it was retrieving my key incorrectly, which in term domino-crashed my ansible scripts.

Here you can also confirm more people are facing this issue. After following the suggestion of using AWS SDK – in my case with python boto3 – it finally worked. So here is a gist to overwrite an AWS SSM parameter, and then retrieving it back:

Hope this helps!

AWS Server-less data pipelines with Terraform to Redshift – Part 2

Alright, it’s time for the second post of our sequence focusing on AWS options to setup pipelines in a server-less fashion. The topics that we are covering throughout this series are:

In this post we complement the previous one, by providing infrastructure-as-code with Terraform for deployment purposes. We are strong believers of a DevOps approach also to Data Engineering, also known as “DataOps”. Thus we thought it would make perfect sense to share a sample Terraform module along with Python code.

To recap, so far we have Python code that, if triggered by a AWS event on a new S3 object, will connect to Redshift, and issue SQL Copy command statement to load that data into a given table. Next we are going to show how to configure this with Terraform code.

As usual, all the code for this post is available publicly in this github repository. In case you haven’t yet, you will need to install terraform in order follow along this post.

Continue reading “AWS Server-less data pipelines with Terraform to Redshift – Part 2”

Get AppScaled ECS Tasks served by AWS Network Load Balancer

This article is intended to be a quick and dirty snippet for anyone going to through the struggle of getting your ECS service, which might have one or more containers running the same App (being part of an Auto Scaling Group), with a Network Load Balancer (instead of the more common ELB or ALB).

ECS Service/Task Definition

Another particularity of this implementation is that I also decided to use the ECS task’s network mode as awsvpc. In the case that you are not acquainted with this new option, this means that:

  • Your container will get its own network interface and its own IP address;
  • The Host port and the Container port need to be the same, since there is not middleware managing port match between the two entities.

The cherry on top is that the ECS Service now has the option of automatically registering and deregistering LB targets by their IP address, which fits perfectly on the intention described.

Network Load Balancer

This post isn’t concretely about describing the technical details of what is a Network Load Balancer but about the caveats of using it in this scenario: because NLB is a layer 4 load balancer, you won’t be able to define Security Groups at the NLB level. Instead, you’ll have to make sure you make your tasks/containers secure by attaching the security groups to them – remember that with the awsvpc network mode, each container will get its own NIC.

Implementation

As for the actual code snippet to support what I’m trying to achieve: Continue reading “Get AppScaled ECS Tasks served by AWS Network Load Balancer”

Getting started with AWS IoT and a dockerized device

IoT isn’t a new term and has, actually, been one of the buzz words of the XXI century, right alongside Crypto Currency. The two of them are seen holding hand in hand in the good and the bad, from interesting crypto projects focused on IoT usage, or actual IoT devices being hacked to mine crypto currencies. But this article is actually about IoT: it seems the last few years have given us the perfect momentum between three main ingredients for its popularity: the exponential increase of network capable devices, the easily available processing power and, finally, the hungry-for-data dynamic we from most parties nowadays.

For someone working as a Data Engineer (or related) there isn’t there much of a more end-to-end project than one which goes from setting up an actual no-so-intelligence device, design the methods to connect these devices to a network and, still, go about designing all streaming/batch processing infrastructure needed to deal with all the data. It really is a huge challenge.

In the following lines, I’ll focus on how AWS IoT has come to help in the first and second challenge and show an example of how to emulate an actual IoT device with the help of docker and walk through some of the nice and easy features AWS IoT offers such as IoT Core, management, topics and rules as well as integration with other AWS services such as Kinesis, Firehose or Lambdas.

Continue reading “Getting started with AWS IoT and a dockerized device”

AWS Server-less data pipelines with Terraform to Redshift – Part 1

This post is the first of sequence of posts focusing on AWS options to setup pipelines in a serverless fashion. The topics that we all cover throughout the whole series are:

In this post we lean towards another strategy to setup data pipelines, namely event triggered. That is, rather than being scheduled to execute with a given frequency, our traditional pipeline code is executed immediately triggered by a given event. Our example consists of a demo scenario for immediately and automatically loading data that is stored in S3 into Redshift tutorial. Continue reading “AWS Server-less data pipelines with Terraform to Redshift – Part 1”