Get AppScaled ECS Tasks served by AWS Network Load Balancer

This article is intended to be a quick and dirty snippet for anyone going to through the struggle of getting your ECS service, which might have one or more containers running the same App (being part of an Auto Scaling Group), with a Network Load Balancer (instead of the more common ELB or ALB).

ECS Service/Task Definition

Another particularity of this implementation is that I also decided to use the ECS task’s network mode as awsvpc. In the case that you are not acquainted with this new option, this means that:

  • Your container will get its own network interface and its own IP address;
  • The Host port and the Container port need to be the same, since there is not middleware managing port match between the two entities.

The cherry on top is that the ECS Service now has the option of automatically registering and deregistering LB targets by their IP address, which fits perfectly on the intention described.

Network Load Balancer

This post isn’t concretely about describing the technical details of what is a Network Load Balancer but about the caveats of using it in this scenario: because NLB is a layer 4 load balancer, you won’t be able to define Security Groups at the NLB level. Instead, you’ll have to make sure you make your tasks/containers secure by attaching the security groups to them – remember that with the awsvpc network mode, each container will get its own NIC.


As for the actual code snippet to support what I’m trying to achieve: Continue reading “Get AppScaled ECS Tasks served by AWS Network Load Balancer”

Resilience measures with HP IRF: ISSU, GR and MAD (Part I)

In an effort to come down to earth and cover a topic which can be useful for the majority of now-a-days Enterprises that have HP gear, I will cover resilience features one can/should use in a HP Networking environment along with IRF.

Though I don’t argue that this is a Best Practice for all cases, clustering HP (former 3Com) switches with IRF can be a great solution to a lot of problems. How? Basically using a simple and thus effective ingredient that speaks by itself: drastic topology complexity reduction. Aggregation of devices to function as one can be an early Christmas for many cases: You get to do so many more LAGs (MLAGs preferably), your Spanning Tree is much simpler (I’m not that big of a fan of the HPN marketing papers issuing STPs death certificate by IRF, you gotta admit human mistake!), your linkstate DB gets much simpler, and you have less devices to config and manage (after you get centralized control & management planes).

Creating a Non-stop network environment
I’m not going to focus on HP’s claim on convergence time above HPs claim. That should be left for a demo by HP guys. What I want to focus on are the Software features that can be used to create an even more resilient network along with IRF, and which require human config.
These features are In-Service-Software-Upgrade (ISSU) – so yes, you get this bonus feature on standalone switches (where this feature is not natively present) when you setup an IRF cluster – Graceful Restart, and Multi-Active detection – commonly know as split brain detection.

First you have to take into consideration that the way IRF works when you do virtual clustering with standalone switches, is exactly the way chassis-based switches work: Master MPU controls Management & Control planes, synchronizes real time with standby MPU, and forwarding plane is active on all LPUs. The difference being that in standalone switches one of the switch acts as a Master and the all the rest of the nodes as standby “MPUs”.
So when your start a Software upgrade on a IRF stack, the first members to be upgraded are the standby nodes. After this job is completed, one of the standby nodes gets elected as the new master, and failover occurs. While the new elected node acts as Master, the former Master is upgraded. After this job is done, preemption-alike behavior occurs, and the former master gets reelected to the master role.
Note that the virtual cluster runs with a virtual Bridge MAC address, so L2 destination remains remains the same, and the only changes that might occur are on the link forwarding inside MLAGs. This should be neglectable if you have a solid config. Note also that the routing process (if any) running on the master will not be restarted during the service upgrade.


Comment Relative To former post “HP MSM Controllers Initial Setup Considerations”

This post is an answer to a comment relative to this former post “HP MSM Controllers Initial Setup Considerations“. (I wanted to add some drawings to the answer to make it clearer, so ended up using another post.) Thank you for your comment. Please let me know if I understood your comment well, or got it all wrong.

George P Isaac’s comment was:

As per my understanding..In option 2 we have to do following config.

1.We should tag particular VLAN in either internet port or access port
2.we should assign IP address to particular VLAN and gateway..
3.Nothing is required in VSC mapping in AP group.

then Where will I specify VLAN mapping..??

“extending the ingress interface to the egress interface “– is this option used when internet port and tunneled network in same VLAN??

OK to be fair my post could have been clearer (probably related to the fact that I’m not an expert on HP’s MSM solution). I took the next figure from a MSM Controller Config manual (section 4-30), which summarizes pretty well in my opinion what the options are for Access-Controlled traffic.

Access-controlled Flow of traffic

So let me recap: “Option 2” should actually have been Option 2 A) and B). Essentially these options are:

2. A) Having Access-Controlled Clients doing Web Auth on the HP MSM Controller, and then being ejected straight to the a default Gateway of a different interface that bypasses the Corporate Network. So in summary what the Egress VLAN is doing is defining a new default Gateway – for instance the routing device connected to the Internet – for the controller to use specifically for the clients assigned to that VLAN. The reasoning for this option might be that the Network Admin is worried that it may allow clients to access corporate resources, the Controller’s default Gateway might not have the adjusted ACLs for handling the Client traffic, or you might prefer to simply save that router CPU. In any case – the main goal is to bypass the Corporate Network.

In this option Clients still have an IP address in a different subnet from the egress VLAN subnet. This is the reason why you should enable NAT on that interface (to simplify): because clients will be placed on a subnet to which the gateway has no route to. Alternatively you can also configure a Route on that gateway to that subnet.

2. B) Having Access-Controlled Clients doing Web Auth on the HP MSM Controller, and then being ejected to a restricted VLAN, and receiving IP addresses on the egress VLAN. Hence the term extending the Egress VLAN to clients. Sort of like in non-access-controlled scenarios, but in this case the controller is actually routing on the background, though clients can’t notice it.

Please note that Option 2 A) and B) have essentially one thing in common: clients bypass Corporate resources. However in example B, clients actually receive an IP address in the VLAN where they are ejected either given by a DHCP Server resident on that VLAN, or by the Egress VLAN’s Gateway, which is implementing DHCP Relay.

then Where will I specify VLAN mapping..??” 

So in both cases, you have to alternatives to specify the VLAN Mapping: VSC level or user-based level. In VSC level you simply grab all authenticated users and forward them on the same VLAN. In user-based Egress VLANs you get more granularity by specifying customized VLAN IDs to user account profiles. Or you can implement both altogether, where user-based specifics override VSC level Egress VLAN definitions.

However in option B – where you extend the Egress VLAN IP addressing to the clients – you have some additional settings to configure, which I specified in the previous post:

  • In the global DHCP relay settings, select the checkbox for extending the ingress interface to the egress interface. After this is enabled, you will no longer be able to specify the IP address and subnet mask settings on VSCs.
  • Disable NAT on the egress VLAN IP interface.
  • Set the MSM controller’s IP address for the default gateway and DNS server (it will forward them to the correct ones).

“extending the ingress interface to the egress interface “– is this option used when internet port and tunneled network in same VLAN??

Well the Egress VLAN (whether extended or not to the client) might be implemented in either port (LAN or Internet). I would rather say that this option might make sense being used when you prefer having the “Bypassing main Corporate Network feature” + “the non-access-controlled alike behavior” all together. This solution might greatly simplify your manual setup, when you want certain clients to access some corporate resources, for instance. Having the same Subnet as those corporate resources might be advantageous for your setup.

Hope this helps. Cheers!

VMware NSX

Its VMworld show time, which means awesome stuff being announced. VMware’s Overlay Network and SDN Sagas are clearly just starting, and so VMware announced today its Network Virtualization Solution: NSX.

I haven’t been able to grasp any VMware own technical material yet, however from what I could until so far understand we are talking about the first true signs of a Nicira’s integration/incorporation into current Vmware Networking Portfolio. VMware’s Overlay Networking technology – VXLAN –  has been integrated with Nicira’s NVP platform. However, this is not mandatory, so you do get 2 additional flavors of encapsulation to choose among VXLAN: GRE and STT.

As it can be seen from NSX datasheet, key features are similar to Nicira’s NVP plataform:

• Logical Switching – Reproduce the complete L2 and L3
switching functionality in a virtual environment, decoupled
from underlying hardware
• NSX Gateway – L2 gateway for seamless connection to
physical workloads and legacy VLANs
• Logical Routing – Routing between logical switches,
providing dynamic routing within different virtual networks.
• Logical Firewall – Distributed firewall, kernel enabled line
rate performance, virtualization and identity aware, with
activity monitoring
• Logical Load Balancer – Full featured load balancer with
SSL termination.
• Logical VPN – Site-to-Site & Remote Access VPN in software
• NSX API – RESTful API for integration into any cloud
management platform”

Ivan Pepelnjak gives very interesting preview-insight on what’s under the hood. Though new, integration with Networking partners is also being leveraged. Examples are: Arista,  Brocade, Cumulus, Dell, and Juniper on the pure Networking side, and Palo Alto Networks on the security side.

Exciting time for the Networking community!


I found this Packet-Pusher’s Podcast from last week about Avaya’s Software Defined Datacenter & Fabric Connect with Paul Unbehagen really interesting. He points out some of the differences in VMware’s Overlay Network VXLAN approach against Physical Routing Switches Overlay Network’s SPB approach (which naturally Avaya is using). Some of these were different encapsulation methods – where with VXLAN the number of headers is quite more numerous – ability to support Multicasting environments (such as PIM), and most importantly, raises the central question: where do you want to control your routing and switching – the Virtual Layer or the Physical Layer. Even though I’m theoretically favorable to Virtual, arguments to keep some functions on the Physical layer still do make a lot of sense in a lot of scenarios.

The Podcast also features a lot of interesting Avaya Automation related features result of a healthy promiscuous relationship between VMware and OpenStack. Also if you want to get in more detail about SPB, Paul Unbehagen covers lots of tech details in his blog.

Overlay Virtual Networks – VXLAN

Overlay Virtual Networks (OVN) are increasingly gaining a lot of attention, whether from Virtual Networking providers, as for Physical Networking providers. Here are my notes on a specific VMware Solution: Virtual eXtensible LAN (VXLAN).


This would never be an issue if not large/huge datacenters didn’t come into place. We’re talking about some big-ass companies as well as Cloud providers, enterprises where the number of VMs scales beyond thousands. This is when the ability to scale within the Network, to rapidly change the network, and the ability to isolate different tenants is crucial. So the main motivators are:

  • First L2 communication requirement is an intransigent pusher, which drags 4k 802.1Q VLAN-tagging limitation plus L2 flooding with it
  • Ability to change configurations without burining the rest of the network, and doing it quickly – e.g. easy isolation
  • Ability to change without being “physically” constrained by Hardware limitations
  • Ability to scale large number of VMs, and being able to isolate different tenants.
  • Unlimmited Workload mobility.

These requirements demand for a change in the architecture. They demand that one is not bound to physical hardware constraints, and as such, demand an abstraction layer run by Software and which can be mutable – a virtualization layer in other words.

Network Virtualization

Professor Nick Feamster – who ended today his SDN MOOC course on Coursera – goes further and describes Network Virtualization as being the “killer app” for SDN. As a side note, here is an interesting comment from a student of this course.

Thus it is no surprise that Hyper-visor vendors were the first to push such technologies.

It is also no wonder that their approach was to treat the physical Network as a dumb Network, unaware of the Virtual Segmentation that is done within the Hyper-visor.

To conclusion, the main goal being really moving away from the dumb VLAN-aware L2 vSwitch to building a Smart Edge (Internal VM Host Networking) without having to rely on smart Datacenter Fabrics (supporting for instance EVB, etc).

Overall solutions

There is more than vendor using a OVN approach to solve the stated problems. VMware was probably one of the first Hyper-visor vendors who started with their vCloud Director Networking Infrastructure (vCDNI). MAC in MAC solution, so L2 Networks over L2. Unfortunately this wasn’t a successful attempt, and so VMware changed quickly the its solution landscape. VMware currently has two Network Virtualization solutions, namely VXLAN and more advanced Nicira NVP. Though I present these two as OVN solutions, this is actually quit an abuse, as these are quit different from each other. In this post I will restrict myself to VXLAN.

As for Microsoft, shortly after VXLAN was introduced Microsoft proposed its own Network Virtualization Solution called NVGRE. Finally Amazon uses L3 Core, which uses IP-over-IP communications.


Virtual eXtensible LAN (VXLAN) was developed by in conjunction of Cisco and VMware, and IETF launched a draft. It is supposed to have a similar encapsulation Header as in Nexus 7k OTV/LISP, allowing for Nexus 7k to act as a VXLAN Gateway.

VXLAN introduces an additional kernel Software layer between ESX vSwitches and Physical Network Card, which can either be VMware Distributed vSwitch or Cisco’s Nexus 1000v. This kernel code is able to introduce additional L2 Virtual Segments beyond the 4k 802.1Q limitation over standard IP Networks. Note that these segments run solely within the hyper-visor, which means that in order to have a physical server communicating with these VMs you will need a VXLAN Gateway.

So the VXLAN kernel is aware of Port-Groups on VM-side and intriduces a VX-segment ID (VNI), and introduces an adaptor on the NIC-side for IP communications- the VXLAN Termination Point (VTEP). VTEP has an IP address and performs encapsulation/decapsulation from L2 traffic generated by a VM and inserts VXLAN header and an UDP header plus traditional IP envelop to talk to the physical NIC. The receiving host where the destination VM resides will do the exact same reverse process.

Also note that it transforms broadcast traffic into multicast traffic for segmentation.

It is thus a transparent layer between VMs and the Network. However, since there is no centralized control plane, VXLAN used to need IP multicast on the DC core for L2 flooding. However this has changed with recent enhancements on Nexus OS.

Here’s VMware’s VXLAN Deployment Guide, and Design Guide.

Finally please do note that not everyone is pleased with VXLAN solution.


SDN Playground – getting started with OpenFlow

Most every big Networking company has announced something related to SDN. Whether simple marketing to concrete legit solutions, its a question of time until the market is filled with SDN-related products. It is thus essential to start getting familiar with it, and you know damn well there’s nothing like getting your hands dirty. So here are some helper-notes on getting started with sandboxing OpenFlow (OF) environments.

To do so I’m using Mininet – a VM created part of an OpenSource Project to emulate a whole complete environment with a Switch, an OF Controller, and even three linux hosts. Also note I’m using my desktop as a Host, with VirtualBox.

So what you’ll need:

  • If you don’t have it yet, download VirtualBox, or another PC hyper-visor Software such as VMware Player. VirtualBox has the advantage of being free for Windows, Linux and Mac.
  • Download Mininet VM OVF image.
  • After decompressing the image, import the OVF.

VB Import Applicance

  • In order to establish terminal session to your VM, you’ll need to add a Host-only Adaptor on the Mininet VM. So first (before adding the adaptor on the VM itself) go to VirtualBox > Preferences. Then select the Networking tab, and add and adaptor.


  • Next edit Vm Settings, and add an Host-only Adaptor. Save it and boot the VM.
  • User: mininet       Password: mininet
  • Type sudo dhclient eth1 (or if you haven’t added another adaptor and simply changed the default Adaptor from NAT to Host-only adaptor then type eth0 instead of eth1) to enable DHCP service on that interface.
  • Type ifconfig eth1 to get the IP address of the adaptor.
  • Establish an SSH session to the Mininet VM. Open terminal, and type ssh -X [user]@[IP-Address-Eth1], where the default mininet user is “mininet” and IP address is what you got after ifconfig. So in my case it was: ssh -X mininet@
  • Mininet has its own basics tutorial – the Walkthrough. Also interesting is the OpenFlow tutorial.

The Mininet Walkthrough is designed for less than an hour tutorial. Here are some simple shortcuts to speedup your playing around:

  • Type sudo mn –topo single,3 –mac –switch ovsk –controller remote. This will fire up the emulated environment of the switch, OF controller, and 3 linux hosts.

OF topology

  • Type nodes to confirm it. “h” stands for hosts, “s” for switch and “c” for controller. If you want, for instance, to now the addresses of a specific node such as Host2, type h2 ifconfig. If you want to establish a terminal session to the same host, type xterm h2. Note that xterm command only works if you first established ssh session by typing ssh -X

This should already get you started.

Have fun!

Basic BGP Concepts

Here is a very short introduction to Border Gatway Protocol (BGP) – or Bloody Good Protocol as some like to call it. BGP is a routing Protocol, which is used mainly for:

  • Sharing prefixes (networks) between ISPs, thus enabling the Internet to scale;
  • Multi-home an organization to several ISPs (whereby Internet prefixes from ISPs are learned, and its own networks are advertised)
  • Scaling internally in very large organizations

BGP is an Exterior Gateway Protocol (EGP), which differentiates from IGPs – such as RIP, OSPF, IS-IS, EIGRP – mainly for:

  • Uses TCP (port 179) for transport ensuring reliable delivery of BGP messages between peers (Routers)
  • Can scale hundreds of thousands of Routes (without crashing like IGPs would)
  • Peers are manually configured – there is no automatic peer discovery, all peers must be manually added
  • Besides prefix, mask and metric, BGP carries several additional attributes. Though being a major advantage against other protocols, attributes also have the disadvantage of making BGP more complex to configure
  • BGP is “political in nature” when it comes to finding best paths, meaning Best Paths can be flexibly changed (using attributes). IGPs on the contrary have fixed best path algorithm, namely Short Path First, and chose it by metric. It is much harder to manually influence the Best path choosen in an IGP (for instance you can change the cost of an interface, but it is not possible to set a different cost on the same interface for different destinations), whereas in BGP it is much easier.
  • BGP may converge more slowly when failures occur, whereas IGPs usually converge faster
  • Since BGP is not a link state protocol, BGP does not share every prefix in its BGP table with every peer. Instead, it only shares the best routes with peers (even though it might know several paths to the same destination).

BGP carries several attributes with each prefix. Since there is no space in the routing table to hold all those attributes, BGP has its own table where it stores prefixes with all its attributes. However, BGP table is not used directly to route IP packets. Instead BGP places only the best prefixes in the routing table with administrative distance of 255 (so that ii the prefix is learned by both BGP and IGP, the IGP route will always be preferred), while maintaining all prefixes in its table. This allows for redundancy as well as load balancing capabilities.

Attributes are what makes BGP so flexible and thus interesting. Most are optional, only the first three are mandatory. Here is a list of BGPs attributes:

  • Origin (mandatory) – indicates how the attribute was originally created into BGP; in other words, it indicates if a certain prefix was imported from another Routing protocol or static routes, or if it was specifically originated by the administrator manually, or even if it was originated by the EGP (obsolete)
  • AS Path  (mandatory) – allows eBGP to be loop free. Subsequent AS can distinguish how the route was created, be being able to see a ordered list of AS between local (first AS Path number) and destination prefix (last AS Path number).
  • Next hop  (mandatory) -is an IP address, that should be used for packets destined to a certain prefix. It allows a peer to deduce the interface to use to send packets to the appropriate border router.
  • Multi-Exit Descriminator (MED) (optional) – used to influence inbound traffic from a neighboring AS. It only influences direct neighbor peers. It is a low-power attribute (comes late in the decision process), but can be useful in organizations that are multi-homed to the same ISP. The lowest MED value wins
  • Local preference (optional) – used to influences outbound traffic, also in organizations multi-homed to the same ISP. Local preference value is only advertised within iBGP peers, and is not advertised to a neighboring AS. Prefix with highest local preference value wins decision process, with the advantage of being able to load balance traffic while maintaining redundancy.
  • Atomic Aggregate (optional) – Used in prefix summarization to warn throughout the Internet that a certain prefix is an aggregate
  • Aggregator (optional) -Also used in prefix summarization, shared throughout the Internet and it includes the router-id and AS number of the router that performed the summarization
  • AS 4 Path (optional) – used to support the longer 32 bit AS numbers through AS that support only 16 bit AS numbers
  • Communities  (optional) – it is a special marking for policies usually deployed by ISPs. It allows to group prefixes together in order to give special and common treatment for a set of prefixes
  • Extended communities  (optional) – as the name indicates, it extends the Communities attribute length, allowing for additional provider offerings such as MPLS VPNs, etc.
  • Originator ID (optional) – attribute intended for iBGP environments where Route Reflectors (RR) are used. It prevents from misconfiguration of RR, by ignoring  duplicate prefixes that a client has advertised and received back.
  • Cluster List (optional) – helps preventing loops when using multiple clusters of Route Reflectors (in redundant HA mode). Cluster List operates much like AS Path does, collecting the sequence of Cluster IDs through which the update has traversed. This attribute is also exclusive for iBGP environments, and will not traverse to eBGP peers

Finally the BGP decision process hierarchy, from highest to lowest. BGP will chose the best path considering the many attributes associated with the multiple copies of one prefix, instead of the cost or metric like IGPs. Since the attributes can be changed by the administrator, the best path configuration is indeed based on the preferences of the administrator. BGP will also maintain several paths in its table, so that when a prefix is no longer available (for example due to a link failure which BGP monitors through its keep-alive messages in the TCP session) a new best path is populated in the routing table.

Whenever a tie, move to next lower level in order to choose the best path chosen to populate the routing table:

  • Next hop reachable – a route must exist to next hop IP address, and will not be considered if not reachable
  • Preferred Value – the highest preferred value will be chosen. It is a proprietary parameter and local to the router.
  • Local preference – the highest local preference value will be chosen. The policy is local to the AS
  • Locally originated – prefix originated by the local router
  • Shortest AS Path – shared throughout between local and destination.
  • Origin – “i” preferred over “?”
  • Multi-Exit Discriminator – influences neighboring AS only
  • External BGP versus internal BGP – eBGP preferred over iBGP
  • Router-ID – the lowest value will be chosen. It is the final tiebraker

iBGP basics

BGP peers that belong to the same Autonomous System (AS) are considered iBGP peers. Why is this important? Because iBGP behavior is different from eBGP, even though the commands might be quite similar. Here is a summary of the differences:

  • AS-Path is only pre-pended at eBGP border, not in iBGP. iBGP has thus its own loop prevention mechanism, which consists of prohibiting the advertising of iBGP prefixes from other iBGP peers amongst themselves
  • BGP sets the TTL in its messages’ IP packet equal to one (1), so that it is restricted to one hop. In iBGP TTL is set to the maximum value of 255, as connections between iBGP peers may be multiple hops away
  • BGP attributes are not changed within iBGP communications. Next-hop remains the eBGP next-hop. Moreover, Local preference attribute will only remain within iBGP peers and will not traverse to neighboring AS.
  • Route selection process will prefer an eBGP route over iBGP route when AS-Path is the same.

It is the IGP’s responsibility to find a path to the loopback’s interface. However, if the IGP process fails, iBGP will also fail. So the first troubleshooting task should be confirming if both routers can reach the peer’s loopback interface.

Since best practices recommend using loopback interfaces for establishing connection with iBGP peers redundancy reasons, remember to advertise each peer’s loopback interface into the IGP.

On the other hand loop prevention mechanism is different as well between eBGP and iBGP. eBGP uses AS Path attribute to guarantee loop free behavior, where iBGP uses almost sacred rules in terms of prevention:

  • Prefixes received from an eBGP peer will always be advertised to all other BGP peers (in other words, directly connected)
  • Prefixes received from iBGP peers are only sent to eBGP peers.

So by not advertising iBGP prefixes from other iBGP peers amongst themselves, iBGP is able to prevent loops. However the problem being is that such functioning mechanism requires full-mesh topology between iBGP peers. If not in full-mesh, a second hop away iBGP peer router may not receive certain eBGP routes, breaking reachability.

However, since full-mesh topology requirement makes it quite hard to scale, another mechanism was created, which explicitly breaks the stated above rules about iBGP loop prevention mechanism: Route Reflectors (RR). RR allow one iBGP Router to send prefixes to another iBGP peer (client). So the RR iBGP client does not need to be fully meshed, it only needs to maintain a session with the RR Router.

RR iBGP routers should mirror prefixes to its iBGP RR client with BGP attributes unchanged. Note that you can configure 1:N relationship in terms of iBGP RR router and several iBGP RR clients. Also noteworthy in terms of scalability is the fact that a RR client can also be a RR router for other RR clients, and of the same routes. However the more you cascade, the bigger the risk, since RR Routers represent a Single Point of Failure (SPOF). It is always a best practice to configure a redundant Route reflector, acting as a cluster (so that updates are not duplicated by the reflectors).

Moreover, you can have an hibrid iBGP AS, where you use Route Reflectors for some non fully meshed iBGP routers, and fully meshed another set of routers that follow traditional iBGP rules.

Finally Originator ID attribute and Cluster List attribute can prevent from misconfigurations when using RR.