NSX for Newbies – Part 9: L2-VPN and stretched Logical Networks (on 6.1+)

Topics covered in this article:

  • NSX L2 VPN Overview
  • NSX L2 VPN Use cases
  • NSX L2 VPN Topology
  • NSX L2 VPN Server Configuration
  • NSX L2 VPN Client Configuration
  • Testing L2 VPN Connectivity

NSX L2 VPN Overview

One cool future of the NSX ESG (Edge Services Gateway) is L2 VPN which enables to stretch a L2 subnet over L3, tunnelled through an SSL VPN. The two sites form L2 adjacency, this could either be infra-dc (within the same data centre) or across data centers locations.
Back when I did my ICM class, NSX for vSphere 6.0 only supported one VLAN to be stretched (read trunked) but as of NSX for vSphere 6.1 it’s possible to trunk multiple logical networks, whether that’s VLAN to VLAN, VLAN to VXLAN or VXLAN to VXLAN. It’s also possible to deploy a standalone ESG on a remote site without that site being “NSX enabled” that is, connecting to VLAN (see Use Cases section).
I won’t hide I struggled at the beginning to understand the following two new abstracts, primarily because I couldn’t find proper documentation out there nor articles articulating the topic so I decided to wrap up my discoveries in this post.

  • Trunk interface: allows multiple internal networks (either VLAN or VXLAN) to be trunked. They key for me is internal ! Having learnt networking (CCNA) from Cisco every time I hear the word trunk I think of uplink; which is effectively what ultimately the trunk interface will do but, because the interface type Uplink already existed on previous versions, this got me really confused, until I realised that yes, it’s a trunk but still for Internal networks!!!
  • Local Egress Optimization: this is more easy to understand. It enables the ESG to route any packets sent towards the Egress Optimization IP address locally, and send everything else over the tunnel. Why? VM Mobility! If the default gateway for the virtual machines (belonging to the subnets you’re stretching) is same across the two sites you need this setting to ensure traffic will be locally routed on each site. Should you need to migrate VM from one site to the other, you can do so without touching the guest os network configuration. Nice ah ?!

NOTE: One “disadvantage” that I have noticed is that stretching a logical network requires its gateway interface to “reside” on the ESG which means no DLR LIF, that is the interface will not be distributed nor will perform at line-rate. In other words instead of east-west traffic it’s north-south, like used to be with vCloud vShield Edge. If you have some tiers at the Distributed Logical switch and others at the ESG level, from a design standpoint pay attention at traffic going northbound as the ESG will be your bottleneck!
That said, one could implement ECMP to increase this bottleneck, two I’m led to believe this L2VPN is meant to be temporary solution as opposed to permanent (think of migrations for example). Looking forward to hearing comments and this two points.

NSX L2 VPN Use Cases

One big use case is “cloud bursting” where a Private Cloud service bursts to a Public Cloud when the demand spikes. Effectively an Hybrid Cloud solution.

The following diagrams are taken from NSX 6.1 administration guide.

As you can see in this scenario, VLAN 10 on site A is stretched to VXLAN 5010 on site B. Similarly for VLAN 11 stretched to VXLAN 5011 on site B.
Again, this is an example where an NSX data centre is extended to a non-NSX data centre.

In this scenario, which could be used for a Private Cloud to Private Cloud migration or DR, the VXLAN (5010,5011) have been stretched to site B and mapped to the same VNI.

The concept is very similar to Cisco OTV, the nice things is you don’t need an expensive Cisco hardware because it’s all software (yay SDN rocks!).
Sounds cool ah? Let’s get into the nitty-gritty 🙂

NSX L2 VPN Topology

This is the topology I’m working with, which is VXLAN to VXLAN extension. As you can see I’m stretching VXLAN 5004 at Site B (Branch Web Tier) to VXLAN 5003 at Site A(Web Tier).

Let’s see how to configure it.

L2 VPN Server configuration (Site A)

Select the Edge Gateway > Manage > Settings > Interfaces. We need to create the Trunk interface and inside it the sub-interface mapped to the logical switch Web-Tier(5003).

  1. Select an unused interface (for me it’s vNIC2) > Edit. Select type Trunk and the distributed port group it connects to (here Mgmt_vDS_L2VPN_Trunk)
  2. Click on + to configure the sub-interface. This is where you map the sub-interface to the VXLAN (5003) and give it the IP address that will be the “stretched default gatway”.


    NOTE
    Enable Send Redirect
    : I had no clue what this option was. The explanation on nsx_61_admin.pdf is pretty poor, it only says: “Enable Send Redirect to convey routing information to hosts”. What the heck does this mean?
    Turns out this option is about enabling ICMP Redirect on the Edge, which means the ESG will inform the ESXi hosts that the best route to a particular subnet (in my case here 172.16.10.0/24) is available and the best default gateway to reach it will be 172.16.10.254. Subsequents packets from ESXi destined to hosts residing on the stretched subnet 172.16.10.0/24 will be directly sent to 172.16.10.254. Although it may sounds a nice feature, from a security perspective it’s not. Why? Attackers could maliciously alter routing tables, spoof traffic by injecting routes into hosts to redirect traffic. Hence why this option is disabled by default. There’s a good article from Cisco that explains this in more detail.
    I need to thank NSX guru Michael Haines (Snr. Architect within NSBU at VMware) for clarifying this to me! Thanks Michael 🙂

  3.  Clicking on Trunk the configuration should look like this
  4. Create your self-signed SSL certificate for the VPN.  Settings > Certificates > Actions > Generate CSR (Certificate Signing Request)
  5. Fill in the classic information
  6. with CSR selected > Actions > Self Sign Certificate and input how many days you like the certificate to be valid for. This is how it should look like:

  7. VPN tab > L2 VPN under L2VPN Mode select Server then Change
    Decide what IP address the VPN Server should listen to, the port, encryption algorithm and the self-signed certificate

  • The Site Configuration > + symbol and here is where we configure the site details. Username and password must match on the client side.

    and it should look like this
  •  Last, Enable the VPN and Publish the changes

  • Because the VPN Client isn’t configured yet, the tunnel is expected to be in a down state. Click on Show L2VPN Statistics to see this

L2 VPN Client configuration (Site B)

  1. On the ESG on site B acting as VPN Client repeat steps 1 to 3 done for the Server in order to create a Trunk interface with the same IP address 172.16.10.254 mapped to VXLAN 5004 (Branch Web Tier)

  2. VPN tab > L2VPN > set L2VPN Mode to Client. Then Change. Here we put the Server listener IP, we select the stretched interface, set the optimization gateway address to be 172.16.10.254 and provide the same credentials used on the server.

  3.  Enable the Service and publish the changes. If you’re lucky after some seconds, fetching the tunnel status should reveal the tunnel as UP

  4. Likewise on the Server side

    So here we go! We have the L2 VPN up and running. Time to test it!

Testing L2 VPN Connectivity

Again, this is the topology

If the L2 VPN tunnel is up I should be able to ping web-sv-03a (172.16.10.12) from web-sv-01a (172.16.10.10)


Duplicates packets (DUP!) are expected due to the environment being nested (Promiscuous Mode Accept).
Checking the ARP, we can see that 172.16.10.254 is mapped to MAC address 00:50:56:a1:11:cb which belongs to the Trunk interface (vNIC2) of the Perimeter ESG (VPN Server).

Now from web-sv-03a:

and again let’s check the ARP table:

This time 172.16.10.254 is mapped to MAC address 00:50:56:a1:4a:bc which is the Branch ESG vNIC2

This concludes this article on L2 VPN on NSX but stay tuned as more will come! 🙂

Be sociable, share!Tweet about this on TwitterShare on LinkedInShare on FacebookShare on Google+Email this to someone

21 Comments

 Add your comment
  1. hi ,

    Please tell my why we need to associate trunk interface with standard or Distrusted port group ?

    Br
    Mok

  2. Hi Mok,
    thanks for your question. Simply because the Trunk interface you are creating at the end of the day still live (reside) on a virtual machine (NSX Edge Gateway) so you need to tell it where to go out from. Back to the basis, on ESXi you either have Standard or Distributed port groups.

    I hope I answered your question, let me know if not.

  3. Hi Giuliano, I keep getting an error whether I use a self generated cert or a self signed: configuration failed on NSX Edge vm xxxxx failed to add L2 vpn server configuration. Invalid key or certificate

  4. Hi ,

    Your initial setup suggests to leave MTU at default 1500. While I am uncertain how much overhead SSL/TLS at large will add to the equation (as it seems odd to use SSL for L2VPN to start with) – I hazard a guess here that the most prominent usecases will be DCI over the internet and not across your own L3 core (where you could bump it up to 1600). Provided that there are other performance related constraints however this is likely not a major concern 🙂
    Any plans to look at proper L2VPN technologies such as EVPN ? I know for a fact that Cisco lookst at it already for their ACI offering.

    Cheers

    R/

    • Hi Rikherlaar,

      thanks for your comments. My L3 core was just an example indeed, that could easily have been an external L3 subnet outside my DC if you like, hence the need to secure the traffic (ssl).
      I am not VMware so I can’t really answer the question regarding EVPN but I know in the next upcoming releases of NSX there will be MPLS integration therefore allowing virtual networks inside the DC to connect to L3VPN service in the WAN.

  5. Giuliano, thanks so much for posting, there isn’t a lot out there and this was invaluable.
    So if we loose the dLR and place the gateway on the edge(s), How can we route between Tiers, your topology diagrams seems to indicate we can… I’m missing something here…
    I’ve created a post to complement yours, let me know if you see errors
    http://virtualizationgains.com/nsx/nsx-lab-configure-layer-2-vpn/

    • >> so if we loose the dLR and place the gateway on the edge(s), How can we route between Tiers?
      don’t forget that the dLR is distributed, so do you mean you lose one of the host or the actual Logical Router Control VM? In the first case HA will take care, in the latter the Control VM is just the control plane right? so you won’t learn any new routes etc… but the forwarding plane will carry on.
      Did I answer your question?

      • No, my question was regarding routing between the edge and dLR, I couldn’t ping between web-1 172.16.10.10 and app-1 172.16.20.10 and I thought I needed to reconfigure ospf or something. It turned out that I had configured a default gateway on web-1 .254 and in my case the local egress optimization ip was .1, once I reset the gateway on web-1 to 172.16.10.1, I was able to ping the vms on the other tiers/segments.

  6. EXCELLENT POST. Nice work.

  7. One question – where did the “listener address” / “server address” of 192.168.100.10 come from? I see that the uplink int on the L2VPN server is assigned to .3. But again not sure where .10 comes from.

    • If you notice the screenshot at step 3 for Site A you will see that 19.168.100.10 is an additional IP address configured on the HQ Uplink interface. I could have used .3 as well but I decided to use .10 for the sake of “rule separation” if you like.
      I’m glad you liked the article, thank you!

  8. Hi Giuliano, if I got numbers of segment want to do stretch L2VPN, the gateway must place in Edge instead of dLR?
    If yes, is it must be in trunk mode with sub-interface?

    Another question, how many segments I can stretch in a single Edge? One segment one edge or can be multi segment one edge?

    Appreciate for your help.

  9. I love your article! I’m starting out on NSX and lots of your writeups are so much better than VMware’s.
    Thank you and keep the articles coming.
    Thumbs up!

  10. Hi, i have followed step by step, i have my tunnel up but i can’t ping vms on the other side. Any recomendation?

  11. Hi !

    1- So, if I understand correctly, this way, you can extend VXLAN over a layer 3 network. Can this L3 be MTU 1500 ??? Or it needs to be 1600 ?

    2- Does VXLAN is encapsulated in the tunnel or it switch to VLAN based before going into the tunnel and switch back to VXLAN on the other side ?

    3- And with this method, all VXLAN networks need to have the edge as default gateway, so no DLR. Right ?

    Thanks

    • 1) yes it can be 1500
      2) Assuming we are mapping VLAN 20 (vm A) to VXLAN 2020 (vm B) and there is always an Internal VLAN represented by the Tunnel ID, to make an example let’s say 10
      a.IP packet leaves vm A and gets tagged with VLAN 100 at the dvPortGroup.
      b.The IP packet tagged with VLAN 20 is received by the sink port in the distributed switch and forwarded to the trunk interface.
      c.The NSX Edge look ups the L2VPN mapping table and encapsulates the IP packet with the SSL tunnel internal VLAN Tag, in this case VLAN 1.
      d.The NSX Edge at the server site receives the packet encapsulated with VLAN 10 (SSL tunnel internal VLAN Tag), looks up the mapping table and adds pseudo header for VXLAN 2020.
      e.The VXLAN formatted packet is forwarded to the dvPortGroup, where is decapsulated and passed on to vm B

      3) Yes and no. There is a corner case where you are allowed (starting with 6.2.2) to stretch a DLR gateway interface but it’s ONLY allowed on 1 side of the tunnel, not on both side.

  12. Hi,

    I have done some test using the HOLs and I noticed that you can stretch a Logical Switch connected to a DLR and it works fine. I haven’t found a document where VMware mention that this is not a recommended practice. I’m doing some test to validate if I can see some unexpected behavior with this config.

    • Erwin,
      thanks for your comment first of all. Please note that all the NSX for Newbies series is based on NSX-v 6.0 (it was 2015). With 6.2 Cross-VC you can now simply have a Universal LS stretches across sites so I would definitely prefer this approach instead of a L2 VPN.

Leave a Comment

Your email address will not be published.

2 Trackbacks

  1. NSX for Newbies (The Series) | blog.bertello.org (Pingback)
  2. NSX Link-O-Rama | vcdx133.com (Pingback)