Airflow: create and manage Data Pipelines easily

This bootstrap guide was originally published at GoSmarten but as the use cases continue to increase, it’s a good idea to share it here as well.

What is Airflow

The need to perform operations or tasks, either simple and isolated or complex and sequential, is present in all things data nowadays. If you or your team work with lots of data on a daily basis there is a good chance you’re struggled with the need to implement some sort of pipeline to structure these routines. To make this process more efficient, Airbnb developed an internal project conveniently called Airflow which was later fostered under the Apache Incubator program. In 2015 Airbnb open-sourced the code to the community and, albeit its trustworthy origin played a role in its popularity, there are many other reasons why it became widely adopted (in the engineering community). It allows for tasks to be set up purely in Python (or as Bash commands/scripts).

What you’ll find in this tutorial

Not only we will walk you through setting up Airflow locally, but you’ll do so using Docker, which will optimize the conditions to learn locally while minimizing transition efforts into production. Docker is a subject for itself and we could dedicate a few dozen posts to, however, it is also simple enough and has all the magic needed to show off some of its advantages combining it with Airflow. Continue reading “Airflow: create and manage Data Pipelines easily”

Summary Terraform vs Spinakker

I recently needed to build this summary, so thought I’d rather share with more people as well. Please feel free to add any points you see fitting.

TL;DR

Rather then putting on versus another assuming mutual exclusivity, many companies are adopting both tools simultaneously.Terraform is usually used for static cloud Infrastructure setup and updates, such as networks/VLANs, Firewalls, Load Balancers, storage buckets, etc. Spinakker is used for setting up more complex deployment pipelines, mainly orchestration of software packages and application code to setup on servers.Though there is intersection (Spinakker can also deploy App environment), Terraform provides an easy and clean way to setup Infrastructure-as-Code. Continue reading “Summary Terraform vs Spinakker”