Integrating IAM user/roles with EKS

To be completely honest, this article spawns out of some troubleshooting frustration. So hopefully this will save others some headaches.

The scenario: after having configured an EKS cluster, I wanted to provide permissions for more IAM users. After creating a new IAM user with belonged to the target intended IAM groups, the following exceptions were thrown in the CLI:

kubectl get svc
error: the server doesn't have a resource type "svc"
kubectl get nodes
error: You must be logged in to the server (Unauthorized)

 

AWS profile config

First configure your local AWS profile. This is also useful if you want to test for different users and roles.

 

# aws configure --profile
# for example:
aws configure --profile dev

If this is your firts time, this will generate two files,

~/.aws/config and ~/.aws/credentials

It will simply append to them, which means that you can obviously edit the files manually as well if you prefer. The way you can alternate between these profiles in the CLI is:

#export AWS_PROFILE=
# for example:
export AWS_PROFILE=dev

Now before you move on to the next section, validate that you are referencing the correct user or role in your local aws configuration:

# aws --profile  sts get-caller-identity
# for example:
aws --profile dev sts get-caller-identity

{
"Account": "REDACTED",
"UserId": "REDACTED",
"Arn": "arn:aws:iam:::user/john.doe"
}

 

Validate AWS permissions

Validate that your user has the correct permissions, namely the following two are required:

# aws eks describe-cluster --name=
# for example:
aws eks describe-cluster --name=eks-dev

Add IAM users/roles to cluster config

If you managed to add worker nodes to your EKS cluster, then this documentation should be familiar already. You probably AWS documentation describes

kubectl apply -f aws-auth-cm.yaml

While troubleshooting I saw some people trying to use the clusters role in the “-r” part. However you can not assume a role used by the cluster, as this is a role reserved/trusted for instances. You need to create your own role, and add root account as trusted entity, and add permission for the user/group to assume it, for example as follows:

{
  "Version": "2012-10-17",
  "Statement": [
     {
        "Effect": "Allow",
        "Principal": {
             "Service": "eks.amazonaws.com",
             "AWS": "arn:aws:iam:::user/john.doe"
         },
        "Action": "sts:AssumeRole"
    }
   ]
}

 

Kubernetes local config

then, generate a new kube configuration file. Note that the following command will create a new file in ~/.kube/config

aws --profile=dev eks update-kubeconfig --name esk-dev

AWS suggests isolating your configuration in a file with name “config-“. So, assuming our cluster name is “dev”, then:

export KUBECONFIG=~/.kube/config-eks-dev
aws --profile=dev eks update-kubeconfig --name esk-dev

 

This will then create a the config file in ~/.kube/config-eks-dev rather than ~/.kube/config

As described in AWS documentation, your kube configuration should be something similar to the following:

If you want to make sure you are using the correct configuration:

export KUBECONFIG=~/.kube/config-eks-dev
kubectl config current-context

This will print whatever the alias you gave in the config file.

Last but not least, update the new config file and add the profile used.

Last step is to confirm you have permissions:

export KUBECONFIG=~/.kube/config-eks-dev
kubectl auth can -i get pods
# Ideally you get "yes" as the answer.
kubectl get svc

 

Troubleshooting

To make sure you are not working in a environment with hidden environmental variables that you are not aware and may conflict, make sure you unset them as follows:

unset AWS_ACCESS_KEY_ID && unset AWS_SECRET_ACCESS_KEY && unset AWS_SESSION_TOKEN

Also if you are getting as follows:

could not get token: AccessDenied: User arn:aws:iam:::user/john.doe is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam:::user/john.doe

Then it means you are specifying the -r flag in your kube/config file. This should be only used for roles.

Hopefully this short article was enough to unblock you, but in case not, here is a collection of further potential useful articles:

Getting started with AWS IoT and a dockerized device

IoT isn’t a new term and has, actually, been one of the buzz words of the XXI century, right alongside Crypto Currency. The two of them are seen holding hand in hand in the good and the bad, from interesting crypto projects focused on IoT usage, or actual IoT devices being hacked to mine crypto currencies. But this article is actually about IoT: it seems the last few years have given us the perfect momentum between three main ingredients for its popularity: the exponential increase of network capable devices, the easily available processing power and, finally, the hungry-for-data dynamic we from most parties nowadays.

For someone working as a Data Engineer (or related) there isn’t there much of a more end-to-end project than one which goes from setting up an actual no-so-intelligence device, design the methods to connect these devices to a network and, still, go about designing all streaming/batch processing infrastructure needed to deal with all the data. It really is a huge challenge.

In the following lines, I’ll focus on how AWS IoT has come to help in the first and second challenge and show an example of how to emulate an actual IoT device with the help of docker and walk through some of the nice and easy features AWS IoT offers such as IoT Core, management, topics and rules as well as integration with other AWS services such as Kinesis, Firehose or Lambdas.

Continue reading “Getting started with AWS IoT and a dockerized device”

Build a Data Pipeline with AWS Athena and Airflow (part 2)

After learning the basics of Athena in Part 1 and understanding the fundamentals or Airflow, you should  now be ready to integrate this knowledge into a continuous data pipeline.

The idea is for it to run on a daily schedule, checking if there’s any new CSV file in a folder-like structure matching the day for which the task is running. For example, if the task is running for 2010-01-31, then then it will check if there is any file in s3://data/year=2010/month=01/day=31/*. If it finds a file there, it will add the “folder” as a partition to Athena so we can keep querying it.

Remind me again: why Athena?

At this point, if you are still wondering why Athena is so useful when you already have a pipeline in process to dump data somewhere (maybe a DB?) well, remember Athena is a “pay as you go” solution that will scale automatically for the desired queries you are running. The underlying costs are only associated with the S3 file hosting itself plus the execution of queries. Such queries, when combined with the Hive Metastore will provide a fast solution for querying heavy loads of data stored in several different types of files on an S3 bucket. On the other hand, provisioning a Database for dumping data will have fixed costs such as processing power, memory and storage amount which will surpass the first ones in case you are not using/needing the full blown features of having in place a proper database engine.

Before proceeding, there are three important assumptions: Continue reading “Build a Data Pipeline with AWS Athena and Airflow (part 2)”

Build a Data Pipeline with AWS Athena and Airflow (part 1)

In this post, I build up on the knowledge shared in the post for creating Data Pipelines on Airflow and introduce new technologies that help in the Extraction part of the process with cost and performance in mind. I’ll go through the options available and then introduce to a specific solution using AWS Athena. First we’ll establish the dataset and organize our data in S3 Buckets. Afterwards, you’ll learn how to make it so that this information is queryable through AWS Athena, while making sure it is updated daily.

Data dump files of not so structured data are a common byproduct of Data Pipelines that include extraction. dumps of not-so-structured data. This happens by design: business-wise and as Data Engineers, it’s never too much data. From an investment stand point, object-relational database systems can become increasingly costly to keep, especially if we aim at keeping performance while the data grows.

Having this said, this is not a new problem. Both Apache and Facebook have developed open source software that is extremely efficient in dealing with extreme amounts of data. While such softwares are written in Java, they maintain an abstracted interface to the data that relies on traditional SQL language to query data that is stored on filesystem storage, such as S3 for our example and in a wide range of different formats from HFDS to CSV.

Today we have many options to tackle this problem and I’m going to go through on how to welcome this problem in today’s serverless world with AWS Athena. For this we need to quickly rewind back in time and go through the technology Continue reading “Build a Data Pipeline with AWS Athena and Airflow (part 1)”

Cota – Wireless Power baby!

I haven’t had the opportunity to return to blogging activity as much as I wanted, but no matter what your tech inclination inside the DC might be, whether you’re Cisco/Dell/IBM/HP/VMware/… oriented, I still think you’ll find this technology a cool one. Yes, I can’t resist to jump out of the Datacenter to share an exciting Startup company: Cota.

These guys want to bring to the market a Power Charger that works via… Wi-Fi. Yes, you read it right. We’re talking about using similar spectrum from current Wi-Fi (2.4GHz or 5GHz) to charge your mobile devices. And yes, no line-of-sight is needed, just like Wi-Fi signal can cross walls; though not without suffering likewise from interference on your charging power.

Can’t help to wonder if Peeble got 10M on Kickstarter, how much would this get? (And I do buy into the whole tech wearables concept!)

 

Cheers