Integrating IAM user/roles with EKS

To be completely honest, this article spawns out of some troubleshooting frustration. So hopefully this will save others some headaches.

The scenario: after having configured an EKS cluster, I wanted to provide permissions for more IAM users. After creating a new IAM user with belonged to the target intended IAM groups, the following exceptions were thrown in the CLI:

kubectl get svc
error: the server doesn't have a resource type "svc"
kubectl get nodes
error: You must be logged in to the server (Unauthorized)

 

AWS profile config

First configure your local AWS profile. This is also useful if you want to test for different users and roles.

 

# aws configure --profile
# for example:
aws configure --profile dev

If this is your firts time, this will generate two files,

~/.aws/config and ~/.aws/credentials

It will simply append to them, which means that you can obviously edit the files manually as well if you prefer. The way you can alternate between these profiles in the CLI is:

#export AWS_PROFILE=
# for example:
export AWS_PROFILE=dev

Now before you move on to the next section, validate that you are referencing the correct user or role in your local aws configuration:

# aws --profile  sts get-caller-identity
# for example:
aws --profile dev sts get-caller-identity

{
"Account": "REDACTED",
"UserId": "REDACTED",
"Arn": "arn:aws:iam:::user/john.doe"
}

 

Validate AWS permissions

Validate that your user has the correct permissions, namely the following two are required:

# aws eks describe-cluster --name=
# for example:
aws eks describe-cluster --name=eks-dev

Add IAM users/roles to cluster config

If you managed to add worker nodes to your EKS cluster, then this documentation should be familiar already. You probably AWS documentation describes

kubectl apply -f aws-auth-cm.yaml

While troubleshooting I saw some people trying to use the clusters role in the “-r” part. However you can not assume a role used by the cluster, as this is a role reserved/trusted for instances. You need to create your own role, and add root account as trusted entity, and add permission for the user/group to assume it, for example as follows:

{
  "Version": "2012-10-17",
  "Statement": [
     {
        "Effect": "Allow",
        "Principal": {
             "Service": "eks.amazonaws.com",
             "AWS": "arn:aws:iam:::user/john.doe"
         },
        "Action": "sts:AssumeRole"
    }
   ]
}

 

Kubernetes local config

then, generate a new kube configuration file. Note that the following command will create a new file in ~/.kube/config

aws --profile=dev eks update-kubeconfig --name esk-dev

AWS suggests isolating your configuration in a file with name “config-“. So, assuming our cluster name is “dev”, then:

export KUBECONFIG=~/.kube/config-eks-dev
aws --profile=dev eks update-kubeconfig --name esk-dev

 

This will then create a the config file in ~/.kube/config-eks-dev rather than ~/.kube/config

As described in AWS documentation, your kube configuration should be something similar to the following:

If you want to make sure you are using the correct configuration:

export KUBECONFIG=~/.kube/config-eks-dev
kubectl config current-context

This will print whatever the alias you gave in the config file.

Last but not least, update the new config file and add the profile used.

Last step is to confirm you have permissions:

export KUBECONFIG=~/.kube/config-eks-dev
kubectl auth can -i get pods
# Ideally you get "yes" as the answer.
kubectl get svc

 

Troubleshooting

To make sure you are not working in a environment with hidden environmental variables that you are not aware and may conflict, make sure you unset them as follows:

unset AWS_ACCESS_KEY_ID && unset AWS_SECRET_ACCESS_KEY && unset AWS_SESSION_TOKEN

Also if you are getting as follows:

could not get token: AccessDenied: User arn:aws:iam:::user/john.doe is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam:::user/john.doe

Then it means you are specifying the -r flag in your kube/config file. This should be only used for roles.

Hopefully this short article was enough to unblock you, but in case not, here is a collection of further potential useful articles:

Getting Started with Spark (part 4) – Unit Testing

Alright quite a while ago (already counting years), I published a tutorial series focused on helping people getting started with Spark. Here is an outline of the previous posts:

In the meanwhile Spark has not decreased popularity, so I thought I continued updating the same series. In this post we cover an essential part of any ETL project, namely Unit testing.

For that I created a sample repository, which is meant to serve as boiler plate code for any new Python Spark project.

Continue reading “Getting Started with Spark (part 4) – Unit Testing”

Decrypting correctly parameters from AWS SSM

Today is yet short one, but ideally will already save a whole lot of headaches for some people.

Scenario: You have stored the contents of a string using AWS SSM parameter store (side note: if you are not using it yet, you should definitely have a look), but when retrieving it  decrypted via CLI, you notice that the string has new lines (‘\n’) substituted by spaces (‘ ‘).

In my case, I was storing a private SSH key encrypted to integrate with some Ansible scripts triggered via AWS CodePipeline + CodeBuild. CodeBuild makes it realy easy to access secrets stored in SSM store, however it was retrieving my key incorrectly, which in term domino-crashed my ansible scripts.

Here you can also confirm more people are facing this issue. After following the suggestion of using AWS SDK – in my case with python boto3 – it finally worked. So here is a gist to overwrite an AWS SSM parameter, and then retrieving it back:

Hope this helps!

Container orchestration in AWS: comparing ECS, Fargate and EKS

Before rushing into the new cool kid, namely AWS EKS, AWS hosted offering of Kubernetes, you might want to understand how it works underneath and compares to the already existing offerings. In this post we focus on distinguishing between the different AWS container orchestration solutions out there, namely AWS ECS, Fargate, and EKS, as well as comparing their pros and cons.

Introduction

Before we dive into comparisons, let us summarize what each product is about.

ECS was the first container orchestration tool offering by AWS. It essentially consists of EC2 instances which have docker already installed, and run a Docker container which talks with AWS backend. Via ECS service you can either launch tasks – unmonitored containers suited usually for short lived operations – and services, containers which AWS monitors and guarantees to restart if they go down by any reason. Compared to Kubernetes, it is quite simpler, which has advantageous and disadvantages.

Fargate was the second service offering to come, and is intended to abstract all everything bellow the container (EC2 instances where they run) from you. In other words, a pure Container-as-a-Service, where you do not care where that container runs. Fargate followed two core technical advancements made in ECS: possibility to assign ENI directly and dedicated to a Container and integration of IAM on a container level. We will get into more detail on this later.

The following image sourced from AWS blog here illustrates the difference between ECS and Fargate services.

fargate-1

EKS is the latest offering, and still only available on some Regions. With EKS you can abstract some of the complexities of launching a Kubernetes Cluster, since AWS will now manage the Master Nodes – the control plane. Kubernetes is a much richer container orchestrator, providing features such as network overlay, allowing you to isolate container communication, and storage provisioning. Needless to say, it is also much more more complex to manage, with a bigger investment in terms of DevOps effort.

Like Kubernetes, you can also use kubectl to communicate with EKS cluster. You will need to configure the AWS IAM authenticator locally to communicate with your EKS cluster.

Continue reading “Container orchestration in AWS: comparing ECS, Fargate and EKS”

Versioning in data projects

Reproducibility is a pillar in science, and version control via git has been a blessing to it. For pure Software Engineering it works perfectly. However, machine learning projects are not just only about code, but rather also about the data. The same model trained with two distinct data sets can produce completely different results.

So it comes with no surprise when I stumble with csv files on git repos of data teams, as they struggle to keep track of code and metadata. However this cannot be done for today’s enormous datasets. I have seen several hacks to solve this problem, none of them bullet proof. This post is not about those hacks, rather about an open source solution for it: DVC.

dvc_flow_large

Let us exemplify by using a kaggle challenge: predicting house prices with Advanced Regression Techniques

Continue reading “Versioning in data projects”

AWS Server-less data pipelines with Terraform to Redshift – Part 2

Alright, it’s time for the second post of our sequence focusing on AWS options to setup pipelines in a server-less fashion. The topics that we are covering throughout this series are:

In this post we complement the previous one, by providing infrastructure-as-code with Terraform for deployment purposes. We are strong believers of a DevOps approach also to Data Engineering, also known as “DataOps”. Thus we thought it would make perfect sense to share a sample Terraform module along with Python code.

To recap, so far we have Python code that, if triggered by a AWS event on a new S3 object, will connect to Redshift, and issue SQL Copy command statement to load that data into a given table. Next we are going to show how to configure this with Terraform code.

As usual, all the code for this post is available publicly in this github repository. In case you haven’t yet, you will need to install terraform in order follow along this post.

Continue reading “AWS Server-less data pipelines with Terraform to Redshift – Part 2”

AWS Server-less data pipelines with Terraform to Redshift – Part 1

This post is the first of sequence of posts focusing on AWS options to setup pipelines in a serverless fashion. The topics that we all cover throughout the whole series are:

In this post we lean towards another strategy to setup data pipelines, namely event triggered. That is, rather than being scheduled to execute with a given frequency, our traditional pipeline code is executed immediately triggered by a given event. Our example consists of a demo scenario for immediately and automatically loading data that is stored in S3 into Redshift tutorial. Continue reading “AWS Server-less data pipelines with Terraform to Redshift – Part 1”