Friday, December 23, 2016

OTV implementation notes!

Quick fire hose to the face of terminology..

  • OTV edge device: Device that performs the OTV function and operations.  OTV edge device recieves L2 ethernet traffic for all vlans that need to be extended between OTV peers.
  • OTV internal interface: The "inside" interface that face the local site and carry the VLANs extended through OTV.  This is typically a trunk (if configured on a Nexus switch).  
  • OTV join-interface: The L3 interface that is used in the overlay interface configuration.
  • OTV overlay interface: Logical interface where all the OTV configuration is placed.  It encapsulates the site L2 frames in IP unicast or multicast packets that are sent to other sites.
  • OTV site VLAN: This is used to dicsover other edge devices in a multihomed topology.  It is recommended that this VLAN NOT overlap with any other VLANs in the topology and to be unique per location.  Furthermore, the authoritative edge device is elected by using this VLAN.
  • OTV site-id: Each OTV edge device located in the same site must be configured with the same site-identifier (site-id).  The site-id protects against when the OTV site VLAN is partitioned between the OTV edge devices in the same site.  If the site-id is NOT configured, the overlay interface will not come up!  
  • OTV authoritative edge device (AED): The AED is responsible for forwarding of L2 traffic including unicast, multicast, and broadcast traffic.  The AED is also responsible for the advertisement of mac-address-reachability to the remote locations for the VLANs it is active for.  The discovery elects an ordinal value of either 0 (zero) or 1 (one).  This value cannot be manually configured, but the AED that has the value of 0 will be the AED for the even-numbered VLANs; the AED with the ordinal value of 1 will be the AED for all odd-numbered VLANs. 

Example topology



In the topology above, we have a multihomed topology between two locations.  To tie the terminology points above together...
  • The edge devices are pretty straight forward..as they're labeled!
  • The OTV internal interfaces are gi0/0/2 on all the edge devices.
  • The OTV join interfaces are gi0/0/3 on all the edge devices.
  • The OTV overlay interface is Overlay1 on all the edge devices.
  • The OTV site vlan is 555 for both datacenters.  This is NOT best practice...I was just lazy and made sure to NOT allow VLAN 555 over the OTV overlay interface.
  • The OTV site-id is 0000.0000.1111 for the New York DC and 0000.0000.2222 for the Palo Alto DC.
  • The OTV authoritative edge device (AED) is determined as follows...
Note: DAL-BH01 and DAL-BH02 are the New York OTV edge devices 1 and 2, respectively:








As we can see, edge device 1 is the AED for vlan 231 and edge device 2 is the AED for vlan 200.  We can see why this is the case if we perform a "show otv site" on the devices:









As mentioned previously, devices with the Ordinal of 0 are the AED for even VLANs and the ordinal of 1 are the AED for odd VLANs.  Since the edge device 1 has an ordinal of 1, it is the AED for VLAN 231 (an odd vlan).  Inversely, edge device 2 is the AED for VLAN 200 (an even VLAN), since it has an ordinal of 0. 

Configurations...

NOTE: This is OTV configurations on an ASR100X..which are somewhat different than on Nexus!

NOTE #2: I did NOT yet discuss the following configuration items...I'll come back to these shortly!
  • "otv fragmentation join-interface GigabitEthernet0/0/3"
  • " otv adjacency-server unicast-only"
  • " otv use-adjacency-server <IP address> unicast-only"

New York

Edge device 1:

otv site bridge-domain 555
otv fragmentation join-interface GigabitEthernet0/0/3
otv site-identifier 0000.0000.1111
!
interface GigabitEthernet0/0/2
 no ip address
 negotiation auto
 cdp enable
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!
 service instance 555 ethernet
  encapsulation dot1q 555
  bridge-domain 555
!
interface GigabitEthernet0/0/3
 description TO Palo Alto Edge device 1
 ip address 10.255.255.122 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip flow ingress
 ip flow egress
 ip pim sparse-dense-mode
 no negotiation auto
 cdp enable
!
interface Overlay1
 no ip address
 otv join-interface GigabitEthernet0/0/3
 otv adjacency-server unicast-only
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!

Edge device 2

otv site bridge-domain 555
otv fragmentation join-interface GigabitEthernet0/0/3
otv site-identifier 0000.0000.1111
!
interface GigabitEthernet0/0/2
 no ip address
 negotiation auto
 cdp enable
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!
 service instance 555 ethernet
  encapsulation dot1q 555
  bridge-domain 555
!
interface GigabitEthernet0/0/3
 description TO Palo Alto Edge device 2
 ip address 10.255.255.126 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip flow ingress
 ip flow egress
 no negotiation auto
 cdp enable
!
interface Overlay1
 no ip address
 otv join-interface GigabitEthernet0/0/3
 otv use-adjacency-server 10.255.255.122 unicast-only
 otv adjacency-server unicast-only
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!

Palo Alto

Edge device 1:

otv site bridge-domain 555
otv fragmentation join-interface GigabitEthernet0/0/3
otv site-identifier 0000.0000.2222
!
interface GigabitEthernet0/0/2
 no ip address
 negotiation auto
 cdp enable
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
 !
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
 !
 service instance 555 ethernet
  encapsulation dot1q 555
  bridge-domain 555
 !
!
interface GigabitEthernet0/0/3
 description TO New York Edge device 1
 ip address 10.255.255.121 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip flow ingress
 ip flow egress
 ip pim sparse-dense-mode
 no negotiation auto
 cdp enable
!
interface Overlay1
 no ip address
 otv join-interface GigabitEthernet0/0/3
 otv use-adjacency-server 10.255.255.122 10.255.255.126 unicast-only
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!

Edge device 2:

otv site bridge-domain 555
otv fragmentation join-interface GigabitEthernet0/0/3
otv site-identifier 0000.0000.2222
!
interface GigabitEthernet0/0/2
 no ip address
 negotiation auto
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!
 service instance 555 ethernet
  encapsulation dot1q 555
  bridge-domain 555
!
interface GigabitEthernet0/0/3
 description To New York Edge device 2
 ip address 10.255.255.125 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip flow ingress
 ip flow egress
 no negotiation auto
 cdp enable
!
interface Overlay1
 no ip address
 otv join-interface GigabitEthernet0/0/3
 otv use-adjacency-server 10.255.255.122 10.255.255.126 unicast-only
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!

YOU NEVER SAID WHAT THOSE THINGS THAT YOU DIDN'T SAY WHAT THEY WERE?!


  • "otv fragmentation join-interface GigabitEthernet0/0/3"
OTV adds a 42-byte header with the Do Not Fragment (DF)-bit to all encapsulated packets. In order to transport 1500-byte packets through the overlay, the transit network must support MTU of 1542 or higher. OTV does not support fragmentation. In order to allow for fragmentation accross OTV,  you must enable otv fragmentation join-interface <interface>.  OTV will NOT work by default if there is fragmentation, and without knowing if the provider supports a >1500 MTU, this command allows fragmentation.
  • " otv adjacency-server unicast-only"
OTV's "magic" boils down to the use of a control protocol to advertise MAC address reachability information and packet switching of IP encapsulated L2 traffic for data forwarding.  In addition, OTV uses a control protocol to map MAC address destinations to IP next hops.  This process can be thought of as MAC routing in which the destination is a MAC address where the next-hop is an IP address.  Cool, huh?  Well, before we can start sharing MAC address reachability, all OTV devices must become "adjacent" with one another.  The method of forming adjacency can be accomplished in two ways, that are dependent on the nature of the transport network interconnecting the sites:  Multicast mode or Unicast mode.  Now, there are quite a bit of information on the various pros and cons of the two methods...but the underlying transport MUST support multicast to use Multicast mode (duh). 


"...I and some customers are torn between unicast and multicast-based OTV. For many sites, multicast-based OTV has clear benefits. On the other hand, many of us (well, me and a couple others I’ve talked to?)  feel that IPmc code in general is less mature, likely to be less robust, and it adds complexity, suggesting there is less overall risk to doing unicast-based OTV in the absence of any factors making IPmc-based more attractive. Such as “many” datacenter sites, or need to transport IPmc flows."

In our scenario, I wasn't entirely sure if the provider supported multicast and the deployment was a bit time sensitive (DC migration pending my OTV deployment), so I went with unicast!  If you're going to have >3 sites...the stance from Cisco is typically to go with the multicast mode.  

Knowing Cisco..this page will be 404 unknown URL within the year, so I apologize if/when it's down.  
  • " otv use-adjacency-server <IP address> unicast-only"
See above....this is where you specify the edge device to use as the adjacency server.  You can also configure multiple IP addresses for redundancy purposes: "otv use-adjacency-server 10.255.255.122 10.255.255.126 unicast-only."


"How do I see what traffic is being learned over the OTV tunnel?"


First-Hop Routing Protocol Localization

"Wait...I have to send ALL my traffic over the OTV tunnel...what if I don't want to!?"

Well, you don't have to...but it ain't pretty.

The default setup would basically have FHRP exist in ONE data center, with all traffic to/from the hosts traversing the OTV interface.  There is a new feature that can be enabled under the overlay interface to filter FHRP messages: "otv filter-fhrp".  This would allow the same FHRP gateway IP address to exist in BOTH data centers simultaneously.

In addition, you're going to want to implement a MAC-ACL on the OTV routers to further prevent any HSRP communication, for example, between the sites.  This will allow you to have the SAME IP address without any end-host ARP table issues.  I personally ran this with HSRP on one end with a 6500 on the other end and run into a TON of issues; hosts in one DC arp for their gateway and get two MAC addresses: One for the local Nexus core and one for the remote 6500 VSS core.  What is the resolution for this?  Run HSRP on the 6500, even though there are no benefits to this, with the only reason being so that the gateway IP address resolves to the same virtual MAC address.

Special-Case Unicast Topology notes

If the drawing didn't give it a way...we do not have a full mesh of AEDs in our scenario.  We have point-to-point links between ASR 1 in NY and ASR 1 in Palo Alto.  As a result...it is entirely possible that OTV auto-selected ordinal value will NOT match (as it did in our scenerio).  For example, if you were to run a "show otv site" you would see that NY device 1 is the AED for odd vlans and the Palo Alto device 1 is the AED for even vlans.  


Crap...what do we do...this is an elected process!  Furthermore...this may even blackhole traffic!  Cisco provides a couple step to avoid these scenerios:

  1. Enable BFD on the join-interfaces: "bfd interval 50 min_rx 50 multiplier 5"
  2. Configure BFD within the IGP connecting the data centers: "router ospf 555" "bfd all-interfaces"
  3. Configure the following EEM script (Note: I had to modify the one Cisco provided in the docmentation as I was not getting the "new adjacency" alert in the syslog.  Furthermore, the down notification was not "BFD peer node dowen" but rather "BFD node down"). no event manager applet WatchBFDdown
    • no event manager applet WatchBFDup
      event manager environment _OverlayInt Overlay1
      event manager applet WatchBFDdown authorization bypass
       description "Monitors BFD status, if it goes down, bring OVERLAY int down"
       event syslog pattern "BFD node down" period 1
       action 1.0 cli command "enable"
       action 2.0 cli command "config t"
       action 2.1 syslog msg "EEM: WatchBFDdown will shut int $_OverlayInt"
       action 3.0 cli command "interface $_OverlayInt"
       action 4.0 cli command "shutdown"
       action 5.0 syslog msg "EEM WatchBFDdown COMPLETE ..."
      event manager applet WatchBFDup authorization bypass
       description "Monitors BFD status, if it goes up, bring OVERLAY int up"
       event syslog pattern "GigabitEthernet0/0/3 from LOADING to FULL" period 1
       action 1.0 cli command "enable"
       action 2.0 cli command "config t"
       action 2.1 syslog msg "EEM: WatchBFDup bringing up int $_OverlayInt"
       action 3.0 cli command "interface $_OverlayInt"
       action 4.0 cli command "no shutdown"
       action 5.0 syslog msg "EEM WatchBFDup COMPLETE ..."
    • The purpose of this EEM script is to track Syslog messages.  Should the BFD adjacency to the other DC go down..it shutdowns the overlay interface.  Inversely, it when it sees the OSPF re-establish, it performs a "no shutdown" on the overlay interface.
  4. Now to force the ordinal value on all the devices so that even VLANs and odd VLANs take the correct path (Note: Router A, B, C, D are indicated in the drawing above):
    • The OTV ISIS net identifier should be configured on all the OTV routers. Care should be taken when configuring the identifier so that all OTV routers will still recognize each other.
      OTV router A:
      otv isis Site
       net 49.0001.0001.0001.000a.00
      OTV router B:
      otv isis Site
       net 49.0001.0001.0001.000b.00
      OTV router C:
      otv isis Site
       net 49.0001.0001.0001.000c.00
      OTV router D:
      otv isis Site
       net 49.0001.0001.0001.000d.00
      The portions of the identifier in bold must match across all OTV routers participating in the overlay. The portion of the identifier in red may be modified. The lowest network identifier at a site will get ordinal number 0 and, in turn, forward the even-numbered VLANs. The highest network identifier at a site will get ordinal number 1 and forward the odd number VLANs.

Troubleshooting Tips

I had a major issue with duplicate IP addresses..so I'm going to shed some light on how this was resolved:

1. ARP-ND-CACHE...what does this do?  It essentially works like proxy-arp.  The ASRs are aware of what lives in the other DC, as it is learned via the OTV link.  In an attempt to cutdown on unnecessary arp traffic..should the ASR get ARP requests for hosts that it (the ASR) knows across the OTV link, it'll respond to the ARP requests with the MAC address it has within the arp-nd-cache.  Why?  If we didn't use arp-nd-cache, we'd have a ton of arp/broadcast traffic going across our WAN (not fun).
2. While reviewing the arp-nd-cache entries, we found that we had a gateway IP address local to a DC being learned via OTV.....but with a different MAC address than what we have in the local DC...wuttt....
3. After logging into the other DC and tracking the MAC address, we found a trunk link to another switch with an SVI using the SAME IP ADDRESS.
4. How did this happen?  When hosts in VLAN 2,4,6,237,241,242, or 1105 would arp for their gateway they would get two responses: aaaa.bbbb.cccc &  0000.0c07.ac04.  As a result of having two entries in the host arp table, flapping occurs.

LESSONS LEARNED: Weird crap happens when you extend your L2....there may be IP addressed out there configured for god knows why.  VERIFY YOUR ARP-ND-CACHE TABLE...or if you have access to the hosts, do an arp -a to see what the host has in it's arp table (if you can get access to it).

4 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Hi ,

    Many thanks for this post.I am working on similar setup right now.Just have a small query.If link between A & C will go down then traffic will pass from A to B.I guess we only need to configure IP reliability between A and B for failover.Please share your thoughts or configuration we need to do on inter-link between A and B.Thanks.

    ReplyDelete
  3. Great and insightful work.
    Very well done.

    ReplyDelete
  4. I'm Абрам Александр a businessman who was able to revive his dying lumbering business through the help of a God sent lender known as Benjamin Lee the Loan Consultant of Le_Meridian Funding Service. Am resident at Yekaterinburg Екатеринбург. Well are you trying to start a business, settle your debt, expand your existing one, need money to purchase supplies. Have you been having problem trying to secure a Good Credit Facility, I want you to know that Le_Meridian Funding Service. Is the right place for you to resolve all your financial problem because am a living testimony and i can't just keep this to myself when others are looking for a way to be financially lifted.. I want you all to contact this God sent lender using the details as stated in other to be a partaker of this great opportunity Email: lfdsloans@lemeridianfds.com OR WhatsApp/Text +1-989-394-3740.

    ReplyDelete