Build a Data Pipeline with AWS Athena and Airflow (part 1)

In this post, I build up on the knowledge shared in the post for creating Data Pipelines on Airflow and introduce new technologies that help in the Extraction part of the process with cost and performance in mind. I’ll go through the options available and then introduce to a specific solution using AWS Athena. First we’ll establish the dataset and organize our data in S3 Buckets. Afterwards, you’ll learn how to make it so that this information is queryable through AWS Athena, while making sure it is updated daily.

Data dump files of not so structured data are a common byproduct of Data Pipelines that include extraction. dumps of not-so-structured data. This happens by design: business-wise and as Data Engineers, it’s never too much data. From an investment stand point, object-relational database systems can become increasingly costly to keep, especially if we aim at keeping performance while the data grows.

Having this said, this is not a new problem. Both Apache and Facebook have developed open source software that is extremely efficient in dealing with extreme amounts of data. While such softwares are written in Java, they maintain an abstracted interface to the data that relies on traditional SQL language to query data that is stored on filesystem storage, such as S3 for our example and in a wide range of different formats from HFDS to CSV.

Today we have many options to tackle this problem and I’m going to go through on how to welcome this problem in today’s serverless world with AWS Athena. For this we need to quickly rewind back in time and go through the technology Continue reading “Build a Data Pipeline with AWS Athena and Airflow (part 1)”

Cota – Wireless Power baby!

I haven’t had the opportunity to return to blogging activity as much as I wanted, but no matter what your tech inclination inside the DC might be, whether you’re Cisco/Dell/IBM/HP/VMware/… oriented, I still think you’ll find this technology a cool one. Yes, I can’t resist to jump out of the Datacenter to share an exciting Startup company: Cota.

These guys want to bring to the market a Power Charger that works via… Wi-Fi. Yes, you read it right. We’re talking about using similar spectrum from current Wi-Fi (2.4GHz or 5GHz) to charge your mobile devices. And yes, no line-of-sight is needed, just like Wi-Fi signal can cross walls; though not without suffering likewise from interference on your charging power.

Can’t help to wonder if Peeble got 10M on Kickstarter, how much would this get? (And I do buy into the whole tech wearables concept!)

 

Cheers