Friday, December 23, 2016

OTV implementation notes!

Quick fire hose to the face of terminology..

  • OTV edge device: Device that performs the OTV function and operations.  OTV edge device recieves L2 ethernet traffic for all vlans that need to be extended between OTV peers.
  • OTV internal interface: The "inside" interface that face the local site and carry the VLANs extended through OTV.  This is typically a trunk (if configured on a Nexus switch).  
  • OTV join-interface: The L3 interface that is used in the overlay interface configuration.
  • OTV overlay interface: Logical interface where all the OTV configuration is placed.  It encapsulates the site L2 frames in IP unicast or multicast packets that are sent to other sites.
  • OTV site VLAN: This is used to dicsover other edge devices in a multihomed topology.  It is recommended that this VLAN NOT overlap with any other VLANs in the topology and to be unique per location.  Furthermore, the authoritative edge device is elected by using this VLAN.
  • OTV site-id: Each OTV edge device located in the same site must be configured with the same site-identifier (site-id).  The site-id protects against when the OTV site VLAN is partitioned between the OTV edge devices in the same site.  If the site-id is NOT configured, the overlay interface will not come up!  
  • OTV authoritative edge device (AED): The AED is responsible for forwarding of L2 traffic including unicast, multicast, and broadcast traffic.  The AED is also responsible for the advertisement of mac-address-reachability to the remote locations for the VLANs it is active for.  The discovery elects an ordinal value of either 0 (zero) or 1 (one).  This value cannot be manually configured, but the AED that has the value of 0 will be the AED for the even-numbered VLANs; the AED with the ordinal value of 1 will be the AED for all odd-numbered VLANs. 

Example topology



In the topology above, we have a multihomed topology between two locations.  To tie the terminology points above together...
  • The edge devices are pretty straight forward..as they're labeled!
  • The OTV internal interfaces are gi0/0/2 on all the edge devices.
  • The OTV join interfaces are gi0/0/3 on all the edge devices.
  • The OTV overlay interface is Overlay1 on all the edge devices.
  • The OTV site vlan is 555 for both datacenters.  This is NOT best practice...I was just lazy and made sure to NOT allow VLAN 555 over the OTV overlay interface.
  • The OTV site-id is 0000.0000.1111 for the New York DC and 0000.0000.2222 for the Palo Alto DC.
  • The OTV authoritative edge device (AED) is determined as follows...
Note: DAL-BH01 and DAL-BH02 are the New York OTV edge devices 1 and 2, respectively:








As we can see, edge device 1 is the AED for vlan 231 and edge device 2 is the AED for vlan 200.  We can see why this is the case if we perform a "show otv site" on the devices:









As mentioned previously, devices with the Ordinal of 0 are the AED for even VLANs and the ordinal of 1 are the AED for odd VLANs.  Since the edge device 1 has an ordinal of 1, it is the AED for VLAN 231 (an odd vlan).  Inversely, edge device 2 is the AED for VLAN 200 (an even VLAN), since it has an ordinal of 0. 

Configurations...

NOTE: This is OTV configurations on an ASR100X..which are somewhat different than on Nexus!

NOTE #2: I did NOT yet discuss the following configuration items...I'll come back to these shortly!
  • "otv fragmentation join-interface GigabitEthernet0/0/3"
  • " otv adjacency-server unicast-only"
  • " otv use-adjacency-server <IP address> unicast-only"

New York

Edge device 1:

otv site bridge-domain 555
otv fragmentation join-interface GigabitEthernet0/0/3
otv site-identifier 0000.0000.1111
!
interface GigabitEthernet0/0/2
 no ip address
 negotiation auto
 cdp enable
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!
 service instance 555 ethernet
  encapsulation dot1q 555
  bridge-domain 555
!
interface GigabitEthernet0/0/3
 description TO Palo Alto Edge device 1
 ip address 10.255.255.122 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip flow ingress
 ip flow egress
 ip pim sparse-dense-mode
 no negotiation auto
 cdp enable
!
interface Overlay1
 no ip address
 otv join-interface GigabitEthernet0/0/3
 otv adjacency-server unicast-only
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!

Edge device 2

otv site bridge-domain 555
otv fragmentation join-interface GigabitEthernet0/0/3
otv site-identifier 0000.0000.1111
!
interface GigabitEthernet0/0/2
 no ip address
 negotiation auto
 cdp enable
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!
 service instance 555 ethernet
  encapsulation dot1q 555
  bridge-domain 555
!
interface GigabitEthernet0/0/3
 description TO Palo Alto Edge device 2
 ip address 10.255.255.126 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip flow ingress
 ip flow egress
 no negotiation auto
 cdp enable
!
interface Overlay1
 no ip address
 otv join-interface GigabitEthernet0/0/3
 otv use-adjacency-server 10.255.255.122 unicast-only
 otv adjacency-server unicast-only
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!

Palo Alto

Edge device 1:

otv site bridge-domain 555
otv fragmentation join-interface GigabitEthernet0/0/3
otv site-identifier 0000.0000.2222
!
interface GigabitEthernet0/0/2
 no ip address
 negotiation auto
 cdp enable
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
 !
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
 !
 service instance 555 ethernet
  encapsulation dot1q 555
  bridge-domain 555
 !
!
interface GigabitEthernet0/0/3
 description TO New York Edge device 1
 ip address 10.255.255.121 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip flow ingress
 ip flow egress
 ip pim sparse-dense-mode
 no negotiation auto
 cdp enable
!
interface Overlay1
 no ip address
 otv join-interface GigabitEthernet0/0/3
 otv use-adjacency-server 10.255.255.122 10.255.255.126 unicast-only
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!

Edge device 2:

otv site bridge-domain 555
otv fragmentation join-interface GigabitEthernet0/0/3
otv site-identifier 0000.0000.2222
!
interface GigabitEthernet0/0/2
 no ip address
 negotiation auto
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!
 service instance 555 ethernet
  encapsulation dot1q 555
  bridge-domain 555
!
interface GigabitEthernet0/0/3
 description To New York Edge device 2
 ip address 10.255.255.125 255.255.255.252
 no ip redirects
 no ip proxy-arp
 ip flow ingress
 ip flow egress
 no negotiation auto
 cdp enable
!
interface Overlay1
 no ip address
 otv join-interface GigabitEthernet0/0/3
 otv use-adjacency-server 10.255.255.122 10.255.255.126 unicast-only
 service instance 200 ethernet
  encapsulation dot1q 200
  bridge-domain 200
!
 service instance 231 ethernet
  encapsulation dot1q 231
  bridge-domain 231
!

YOU NEVER SAID WHAT THOSE THINGS THAT YOU DIDN'T SAY WHAT THEY WERE?!


  • "otv fragmentation join-interface GigabitEthernet0/0/3"
OTV adds a 42-byte header with the Do Not Fragment (DF)-bit to all encapsulated packets. In order to transport 1500-byte packets through the overlay, the transit network must support MTU of 1542 or higher. OTV does not support fragmentation. In order to allow for fragmentation accross OTV,  you must enable otv fragmentation join-interface <interface>.  OTV will NOT work by default if there is fragmentation, and without knowing if the provider supports a >1500 MTU, this command allows fragmentation.
  • " otv adjacency-server unicast-only"
OTV's "magic" boils down to the use of a control protocol to advertise MAC address reachability information and packet switching of IP encapsulated L2 traffic for data forwarding.  In addition, OTV uses a control protocol to map MAC address destinations to IP next hops.  This process can be thought of as MAC routing in which the destination is a MAC address where the next-hop is an IP address.  Cool, huh?  Well, before we can start sharing MAC address reachability, all OTV devices must become "adjacent" with one another.  The method of forming adjacency can be accomplished in two ways, that are dependent on the nature of the transport network interconnecting the sites:  Multicast mode or Unicast mode.  Now, there are quite a bit of information on the various pros and cons of the two methods...but the underlying transport MUST support multicast to use Multicast mode (duh). 


"...I and some customers are torn between unicast and multicast-based OTV. For many sites, multicast-based OTV has clear benefits. On the other hand, many of us (well, me and a couple others I’ve talked to?)  feel that IPmc code in general is less mature, likely to be less robust, and it adds complexity, suggesting there is less overall risk to doing unicast-based OTV in the absence of any factors making IPmc-based more attractive. Such as “many” datacenter sites, or need to transport IPmc flows."

In our scenario, I wasn't entirely sure if the provider supported multicast and the deployment was a bit time sensitive (DC migration pending my OTV deployment), so I went with unicast!  If you're going to have >3 sites...the stance from Cisco is typically to go with the multicast mode.  

Knowing Cisco..this page will be 404 unknown URL within the year, so I apologize if/when it's down.  
  • " otv use-adjacency-server <IP address> unicast-only"
See above....this is where you specify the edge device to use as the adjacency server.  You can also configure multiple IP addresses for redundancy purposes: "otv use-adjacency-server 10.255.255.122 10.255.255.126 unicast-only."


"How do I see what traffic is being learned over the OTV tunnel?"


First-Hop Routing Protocol Localization

"Wait...I have to send ALL my traffic over the OTV tunnel...what if I don't want to!?"

Well, you don't have to...but it ain't pretty.

The default setup would basically have FHRP exist in ONE data center, with all traffic to/from the hosts traversing the OTV interface.  There is a new feature that can be enabled under the overlay interface to filter FHRP messages: "otv filter-fhrp".  This would allow the same FHRP gateway IP address to exist in BOTH data centers simultaneously.

In addition, you're going to want to implement a MAC-ACL on the OTV routers to further prevent any HSRP communication, for example, between the sites.  This will allow you to have the SAME IP address without any end-host ARP table issues.  I personally ran this with HSRP on one end with a 6500 on the other end and run into a TON of issues; hosts in one DC arp for their gateway and get two MAC addresses: One for the local Nexus core and one for the remote 6500 VSS core.  What is the resolution for this?  Run HSRP on the 6500, even though there are no benefits to this, with the only reason being so that the gateway IP address resolves to the same virtual MAC address.

Special-Case Unicast Topology notes

If the drawing didn't give it a way...we do not have a full mesh of AEDs in our scenario.  We have point-to-point links between ASR 1 in NY and ASR 1 in Palo Alto.  As a result...it is entirely possible that OTV auto-selected ordinal value will NOT match (as it did in our scenerio).  For example, if you were to run a "show otv site" you would see that NY device 1 is the AED for odd vlans and the Palo Alto device 1 is the AED for even vlans.  


Crap...what do we do...this is an elected process!  Furthermore...this may even blackhole traffic!  Cisco provides a couple step to avoid these scenerios:

  1. Enable BFD on the join-interfaces: "bfd interval 50 min_rx 50 multiplier 5"
  2. Configure BFD within the IGP connecting the data centers: "router ospf 555" "bfd all-interfaces"
  3. Configure the following EEM script (Note: I had to modify the one Cisco provided in the docmentation as I was not getting the "new adjacency" alert in the syslog.  Furthermore, the down notification was not "BFD peer node dowen" but rather "BFD node down"). no event manager applet WatchBFDdown
    • no event manager applet WatchBFDup
      event manager environment _OverlayInt Overlay1
      event manager applet WatchBFDdown authorization bypass
       description "Monitors BFD status, if it goes down, bring OVERLAY int down"
       event syslog pattern "BFD node down" period 1
       action 1.0 cli command "enable"
       action 2.0 cli command "config t"
       action 2.1 syslog msg "EEM: WatchBFDdown will shut int $_OverlayInt"
       action 3.0 cli command "interface $_OverlayInt"
       action 4.0 cli command "shutdown"
       action 5.0 syslog msg "EEM WatchBFDdown COMPLETE ..."
      event manager applet WatchBFDup authorization bypass
       description "Monitors BFD status, if it goes up, bring OVERLAY int up"
       event syslog pattern "GigabitEthernet0/0/3 from LOADING to FULL" period 1
       action 1.0 cli command "enable"
       action 2.0 cli command "config t"
       action 2.1 syslog msg "EEM: WatchBFDup bringing up int $_OverlayInt"
       action 3.0 cli command "interface $_OverlayInt"
       action 4.0 cli command "no shutdown"
       action 5.0 syslog msg "EEM WatchBFDup COMPLETE ..."
    • The purpose of this EEM script is to track Syslog messages.  Should the BFD adjacency to the other DC go down..it shutdowns the overlay interface.  Inversely, it when it sees the OSPF re-establish, it performs a "no shutdown" on the overlay interface.
  4. Now to force the ordinal value on all the devices so that even VLANs and odd VLANs take the correct path (Note: Router A, B, C, D are indicated in the drawing above):
    • The OTV ISIS net identifier should be configured on all the OTV routers. Care should be taken when configuring the identifier so that all OTV routers will still recognize each other.
      OTV router A:
      otv isis Site
       net 49.0001.0001.0001.000a.00
      OTV router B:
      otv isis Site
       net 49.0001.0001.0001.000b.00
      OTV router C:
      otv isis Site
       net 49.0001.0001.0001.000c.00
      OTV router D:
      otv isis Site
       net 49.0001.0001.0001.000d.00
      The portions of the identifier in bold must match across all OTV routers participating in the overlay. The portion of the identifier in red may be modified. The lowest network identifier at a site will get ordinal number 0 and, in turn, forward the even-numbered VLANs. The highest network identifier at a site will get ordinal number 1 and forward the odd number VLANs.

Troubleshooting Tips

I had a major issue with duplicate IP addresses..so I'm going to shed some light on how this was resolved:

1. ARP-ND-CACHE...what does this do?  It essentially works like proxy-arp.  The ASRs are aware of what lives in the other DC, as it is learned via the OTV link.  In an attempt to cutdown on unnecessary arp traffic..should the ASR get ARP requests for hosts that it (the ASR) knows across the OTV link, it'll respond to the ARP requests with the MAC address it has within the arp-nd-cache.  Why?  If we didn't use arp-nd-cache, we'd have a ton of arp/broadcast traffic going across our WAN (not fun).
2. While reviewing the arp-nd-cache entries, we found that we had a gateway IP address local to a DC being learned via OTV.....but with a different MAC address than what we have in the local DC...wuttt....
3. After logging into the other DC and tracking the MAC address, we found a trunk link to another switch with an SVI using the SAME IP ADDRESS.
4. How did this happen?  When hosts in VLAN 2,4,6,237,241,242, or 1105 would arp for their gateway they would get two responses: aaaa.bbbb.cccc &  0000.0c07.ac04.  As a result of having two entries in the host arp table, flapping occurs.

LESSONS LEARNED: Weird crap happens when you extend your L2....there may be IP addressed out there configured for god knows why.  VERIFY YOUR ARP-ND-CACHE TABLE...or if you have access to the hosts, do an arp -a to see what the host has in it's arp table (if you can get access to it).

Tuesday, November 22, 2016

Moving a service profile to a different UCS blade

Basic guide:
https://supportforums.cisco.com/document/29926/what-are-quick-steps-moving-service-profile-different-blade

First off, our scenario was a bit different.

Normally, one would simply disassociate the service profile and associate the service profile to the new blade....but this was a B200-M4 replacing a B200-M3.

You'll get an alert saying there is a BIOS issue if you try and associate the service profile..the issue here has to do with the host firmware package.  The default one being used in this case did not have a software package to support the M4 blade.

Since the service profile was created from the template, we can modify the service profile to point to a newly created host firmware package without impacting the other blades.

Once complete...it goes through the normal process of associating the SP to a blade...with some errors.  The KVM at this point is displaying the pre-POST message "Configuring and verifying memory."  There are also a number of errors...mainly that the memory and processors were using unknown or unsupported FRUs.

My first thought was to update the capability catalog..no change.

Then I found these release notes:

http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/release/notes/ucs_2_2_rn.html#pgfId-390503

The column on the left indicates the processor type with a minimum and recommended software version...we were running 2.2(6c).

Whoops!

As seen above, I updated the blade package on the host firmware package associated with the B200-M4 servers.  That way, as long as I associate the same host firmware package, it will use the correct version.  

Now, with the host firmware package "b200m4" updated, it re-associated the service profile, rebooted, and came up successfully.

Tuesday, November 15, 2016

Storage notes!

Intro

So far I've comes across/implemented FC, FCoE, iSCSI, and NFS.  Here are some basic notes of those technologies!

FCoE

Implementation on a Nexus 5K..

We typically will have two VSANs and two FCoE VLANs, for example:

With regards to UCS..your cabling should look something like this:



Note: These should be separate links than what was used to connect the FIs to the Nexus 5Ks for ethernet connectivity (Cross connecting to N5Ks & use vPC).

Now that we're cabled...

1. Enable the features we need:

feature lacp
feature npiv 
feature fcoe


2. Create the FCoE VLANs (304 & 305 on FAB-A & FAB-B, respectively):

Fabric A
vlan 304
name FCoE-VLAN_304 
exit

Fabric B
vlan 305
name FCoE-VLAN_305
exit


3. Create the VSANs (4 & 5 on FAB-A & FAB-B, respectively) and associate the VSAN with FCoE VLANs

Fabric A
vsan database 
vsan 4 
vsan 4 name General-Storage 
exit 

vlan 304
fcoe vsan 4 
exit

Fabric B
vsan database 
vsan 5 
vsan 5 name General-Storage 
exit 

vlan 305
fcoe vsan 5
exit


4. Create a port-channel containing physical member interfaces:

interface ethernet2/1 
description FCoE Link to FI-A eth1/33 
channel-group 33 mode active 
no shutdown 

interface ethernet2/2 
description FCoE Link to FI-A eth1/34 
channel-group 33 mode active 
no shutdown

Note: Perform the same action on FAB-B N5K


5. Configure the port-channel to trunk and allow the FCoE VLAN:

interface port-channel 33 
description FCoE EtherChannel Link to FI-A 
switchport mode trunk 
switchport trunk allowed vlan 304 
spanning-tree port type edge trunk

Note: Perform the same action on FAB-B N5k


6. Create a virtual fibre channel (vfc) interface, bind it to the port-channel we just created, and allow the VSAN associated with the fabric:

interface vfc 33 
bind interface port-channel33
switchport trunk allowed vsan 4 
switchport mode F 
no shutdown

Note: Perform the same action on FAB-B N5K


7. Verify..

Some useful show commands...

show flogi database
Use this command to see fabric logins.  We'll use this information to set up our zoning!  If you aren't seeing logins...check host connectivity.  For the case of UCS & the FIs, you'll see multiple FLOGIs on the same vfc.  
show zoneset active
Once the zones have been created and we see FLOGIs, I'll use this command to verify my zoneset has been commited.  If there is an issue with the FLOGI, you'll likely NOT see an FCID associated with the zone member.  If you set up your zoning and don't configure FCoE..you'll simply see the PWWN/Alias you configured but no FCID.
show interface vfc38
This will give you an indicator if your trunks are allowing your VSANs and the current state.  I had a scenario where the state was "initializing" on my FCoE VLAN.  Further investigation found that the host connected to the vfc and the respective physical interface had NOT performed a FLOGI.  A rescan did not prove to be helpful...but a reboot did :)

As seen here..our vfc is 38, bound interface is eth1/38, and we're allowing VSAN across the trunk and it is up:





8. Wait....what does my host using FCoE need to do?  

Well, when you configure your CNA/HBA to use FCoE, you'll need to tell it which FCoE VLAN to use.  The benefit to the CNA is that it can carry both FCoE traffic as well as ethernet traffic.

9. Lastly....zoning..I'm just not sure how to do it..



The gist: Each initiator needs to have access to ONLY have access to a target.  In the above example, we have 3 zones per fabric (3 initiators) and we've allowed the hosts access to both storage processors on our storage array.  Why?  Well, while the LUN on the storage array is "owned" by a storage processor, we want the host to have access to both storage processors, should one path become unavailable.


FC

Wait, why should I use FC?  Well---this one is open to debate.  Some people think that the requirement for FC is dying.  While we typically have 8Gb or 16Gb FC connections..there seems to be a race between FC and ethernet.  Back in the day, FC was the declared winner..but with ethernet capabilities allowing for 10Gb, 25Gb, 40Gb, 100Gb...its possible that we may no longer see native FC!  But for the time being...its pretty damn easy and straight forward to set up (why a lot of people care for it!)

That being said...cable that thing in the same way you would FCoE:
Furthermore, our N5Ks will have crossconnected paths to our storage like the FCoE diagram.  One thing worth noting....if we have a "UP" or Unified Ports capability 5K, changing the port type from ethernet to FC requires a reboot!  We also work left to right for ethernet and right to left for FC!

Now for the configuration..its pretty darn straight forward:

1. Enable the required features:

feature npiv 
feature fport-channel-trunk 
feature fcoe

2. Configure the port-channel (in this case its to the FIs)..With NPIV enabled, you must assign a virtual SAN (VSAN) to the SAN port channels that connect to the fabric interconnects:

interface san-port-channel 29 
channel mode active 
switchport trunk mode on 
switchport trunk allowed vsan 1
switchport trunk allowed vsan add 4

3. Add the SAN port channel to an existing VSAN database on the data center core Cisco Nexus 5500UP-A switch:

vsan database 
vsan 4 
interface san-port-channel 29

4. On the data center core Cisco Nexus 5500UP-A switch, configure the SAN port channel on physical interfaces. The Fibre Channel ports on the Cisco Nexus 5500UP switches are set to negotiate speed by default. 

interface fc1/29
switchport trunk mode on 

channel-group 29 force 
!
interface fc1/30 
switchport trunk mode on 
channel-group 29 force


Misc.


 As performed from an MDS switch.  Here we can see that the local domain ID of this particular switch is 0x91 and the peer switch is 0x63.  Any hosts that perform a FLOGI to this switch will be given an FCID with the 0X91 prefix.

For example...there is a host logged in on fc2/3.  The 0x910000 is the FCID for this host.  We can tell from this output that this device is directly attached to the SAN switch "MDS1."


From the output below, we can determine that the MDS switch, "MDS3" has two equal paths to the 0x91 domain, via fc2/13 and fc2/14.

Below, we can see the output of the show run on the SAN switch facing ports on a UCS FI.  As indicated, mode "NP" equates to node proxy.  What does this do for us?  Well, being that the FCID field is 1 byte long, there is the theoretical maximum of 255 FCIDs (note: there are some reserved values so the actual value is less than this).  Node proxy allows the FI to proxy FLOGIs in a fashion similar to NAT.


Below, NPIV (Node port ID virtualization) is the magic that allows us to have multiple fabric logins on an individual host-facing port and NPV (Node port virtualization) is the proxy portion.

iSCSI

To come..

NFS

To come..

Friday, August 5, 2016

Python notes as it pertains to networking!

Getting it setup on my Raspberry Pi
So python comes pre-installed on the raspberry pi!  Neat..but after following some guides here...

https://pynet.twb-tech.com/blog/automation/netmiko.html

I've decided to give this a try to see if I can use Python to SSH into my existing devices.  This will help facilitate anything I'll need to do down the road!  Some things I've read is that one of the benefits Python has over Poweshell is that Python supports SSH!

So one thing that I've learned (in my ignorant Python experience) is that you can use the Paramiko library to SSH into "stuff."  I tried messing around with it and was able to successfully SSH into my home 2811...but the syntax was...not very user-friendly (for someone who is as new to Python as I am).

I wanted to get into use "Netmiko" see link above..but found that it wouldn't work!  Now, I'll go over some things I tried..

1. I downloaded the setup.py file and manually installed it "python setup.py install --user."
2. Once it successfully installed, I tried launching python via "python" to see if it would import the ConnectHandler function from the Netmiko module.
3. No bueno!  Every time I'd get to calling the ConnectHandler...I'd get the following error:


No handlers could be found for logger "paramiko.transport" Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/admin/.local/lib/python2.7/site-packages/netmiko-0.5.6-py2.7.egg/netmiko/ssh_dispatcher.py", line 84, in ConnectHandler return ConnectionClass(*args, **kwargs) File "/home/admin/.local/lib/python2.7/site-packages/netmiko-0.5.6-py2.7.egg/netmiko/base_connection.py", line 68, in __init__ self.establish_connection(verbose=verbose, use_keys=use_keys, key_file=key_file) File "/home/admin/.local/lib/python2.7/site-packages/netmiko-0.5.6-py2.7.egg/netmiko/base_connection.py", line 177, in establish_connection self.remote_conn_pre.connect(**ssh_connect_params) File "/home/admin/.local/lib/python2.7/site-packages/paramiko-2.0.2-py2.7.egg/paramiko/client.py", line 338, in connect t.start_client() File "/home/admin/.local/lib/python2.7/site-packages/paramiko-2.0.2-py2.7.egg/paramiko/transport.py", line 493, in start_client raise e AttributeError: 'EntryPoint' object has no attribute 'resolve'

Now, a little googling presented the following:

https://github.com/mozilla/sops/issues/67

Should the link die..

"Yep, downgrading to cryptography 1.2.1 fixed it. Thank you!"

Also, after discussing my conondrum in a reddit post, someone else recommended installing PIP, a Python package manager.

sudo apt-get install python-pip

Hindsight is 20/20..but I could have simply used PIP to install netmiko via "pip install netmiko."

So finally, with PIP installed..

sudo pip install cryptography==1.2.1

Huzzah, it works!

>>> from netmiko import ConnectHandler
>>> my_router = {
...     'device_type': 'cisco_ios',
...     'ip': '10.0.0.1',
...     'username': 'admin',
...     'password': '<password here>',
...     }
>>> net_connect = ConnectHandler(**my_router)
>>> output = net_connect.send_command("show ip int brief")
>>> print output
Interface                  IP-Address      OK? Method Status                Protocol
FastEthernet0/0            <My public>    YES DHCP   up                    up
FastEthernet0/1            unassigned      YES NVRAM  administratively down down
FastEthernet1/0            unassigned      YES unset  up                    up
FastEthernet1/1            unassigned      YES unset  up                    up
FastEthernet1/2            unassigned      YES unset  up                    down

Wednesday, June 1, 2016

Using robocopy for file system migrations

The idea..

We have file system A (old) & file system B (new).  We wish to migrate the files & permissions of the OLD file system (on an old array) to the NEW file system (on a new array).

I've found that we want to copy as HIGH as we can get in the file system, so that robocopy can not only copy the directory and files..but ALSO the permissions!

In this particular implementation the old file system is called "fileserv1" and the new file system is called "fileserv2"

This particular customer's file system is only used for home drives that are managed via AD.  The customer's know this drive as their "M:/ drive."

Now, as you can imagine, the directory that holds all of these sub-user directories is quite massive; there is a folder for EVERY user.  In addition to each user's folder, the folder contains the security permissions of the individual + domain admin..we want robocopy to do all this work for us!

What is Robocopy?

Robocopy (the name is short for Robust File Copy) was introduced with the Windows Server 2003 Resource Kit and is included in all editions of Windows 7. Its many strengths include the ability to copy all NTFS file attributes and to mirror the contents of an entire folder hierarchy across local volumes or over a network. If you use the right combination of options, you can recover from interruptions such as network outages by resuming a copy operation from the point of failure after the connection is restored. 

The Robocopy syntax takes some getting used to. If you’re familiar with the standard Copy and Xcopy commands, you’ll have to unlearn their syntax and get used to Robocopy’s unconventional ways. The key difference is that Robocopy is designed to work with two directories (folders) at a time, and the file specification is a secondary parameter. In addition, there are dozens of options that can be specified as command-line switches.

Shamelessly stolen from Microsoft's TechNet...see here for the full article!

Preliminary

Once I've created the CIFS share per best practices (HIDE the ETC and the Lost & Found!), I'm going to navigate to the highest level that I can navigate to and mirror the permissions.


The reason that I've set the permissions on this to "Domain Admins" is so that when robocopy copy/creates the subdirectories..I want each directory to inherit the "Domain Admins" permission so that any user in the Domain Admins group can access/manage the files.  In addition, we do NOT want any non-Domain Admin user to have access to the "myhomefolder."  

Furthermore, where do we run the script?  It does not require much..so I'd simply pick a windows server that has access to both source/destination file systems.  Be sure to log into the server as a domain admin! 

NOTE: If you're copying in a windows environment, you're going to have to pass credentials BEFORE you can use robocopy

The syntax for that is..

net use \\server\ipc$ /user:username password

Example: "net use \\10.1.5.1\c$\Users\Bob\Desktop /user:john.doe Ilovepotatoes"

The script

:loop
robocopy C:\Users\kbarnes\Desktop\Source C:\Users\kbarnes\Desktop\Destination /MT /copyall /MIR /TEE /FFT /R:1 /W:5 /XD ~snapshot /NP /log+:C:\Users\kbarnes\Desktop\Robo.txt
echo complete
echo %date% %time%
goto loop

I feel as it is probably best to just present what I use and try to explain its functionality.  I'm using this script as an example, hense the source/destination of two folders on my computer's desktop.  In a REAL scenerio, it may look something like this:

robocopy \\<source FS>\myhomefolder \\<destination FS>\c$\cifs_home_fs\myhomefolder

Now...for the "weird" part: Robocopy switches.

Switches are the parameters that the administrator can modify to make Robocopy do what he/she wants.  It is honestly as simple as that.  My example is one that I've been using that works for me and I'll explain:

  • /MT This is Robocopy's "multithread."  The default is 8 simultanous file copies.  We can use a value between 1-128 by using "/MT:32" with 32 threads as an example.
  • /copyall This copies all.  Yea, no shit.  What I mean is that it also copies the security/last modified information of the file.
  • /MIR This mirrors a directory tree.  Furthermore, it will copy subdirectories including empty ones and remove files that are not in the source directory.
  • /TEE Writes the status output to the console window, as well as to the log file
  • /FFT Assumes FAT file times (two-second precision)
  • /R:1 Specifies the number of retries on failed copies. The default value of N is 1,000,000 (one million retries).  We used a read of "1" as we found that we'd rather to try and get the files in a another sweep..than to wait.
  • /W:5 Specifies the wait time between retries, in seconds. The default value of N is 30 (wait time 30 seconds).  The same as the read...we'd rather not wait forever on a single file.
  • /XD ~snapshot We found that there is NO reason to copy the snapshot directory.  Using /XD ~snapshot ignores the hidden snapshot directory.
  • /NP This stops the progress information of the copy job.  We just found it not necessary in our uses.
  • /log+ When using log+ followed by a location, we can allow the output to be sent to a text file.  The + sign says to prepend any output to the same file, versus creating a new one/replacing the old one every cycle.
Lastly, I've included some things in the script to make it loop and to output time, dates, and a string value of "complete." 

Example use of script


1. Log into windows server as a domain admin user.
2. Right-click on the .bat file and "Run as Administrator."
3. Assuming your script is set up correctly, you should see the log file pop up wherever you've indicated.

In my example, I have two folders: Source and Destination.  I've put in a couple files in the source and I want to see them pop over to the destination!


After we've run the script we can see a couple of things:  The text files have been copied and the "Date modified" has also been copied!  If we had permissions on the text files, they would have been copied over as well!

4. Review the output log


Here, we can see that the 8 files have been completed successfully and that there were no failures!

What if we run the script again...will it attempt to copy the files a second time?

Here we can see that it skipped ALL the files, as there was nothing new on the source directory!

What if we change one of the files?


Here, I simply put some random text into one of the source text files called "New Text Document - Copy.txt"

Lets run the script again...


NICE!  We can see that it skipped the 7 unmodified files but copied over the one file that was changed!  We can also see the file that was changed with an indicator that it is "Newer."

Lastly..what if we delete one of the files on the source?

 Here we can see that the file we deleted on the source is shown as "EXTRA File."  If we navigate to the destination directory "Destination" we can confirm that the deleted file on the source has been deleted on the destination.  This is made possible by the /MIR switch we used.

Yep!  It's gone!

How to perform the migration?

It is important to know that the first pass is quite time consuming.  It has to copy EVERY file/directory that we've indicated to the destination.  You can absolutely do this during business hours.  Because of the loop in the .bat file, its going to start over as soon as it finishes.  I typically like to kick it off as I'm leaving for the day so that I can come in and verify the progress.  

Here is a real world example of the first pass:

Note: This first pass took ~5 hours.  There were some skips because I did a couple tests before running the full on script.


Once we get the first pass done, I let robocopy continue to run until I'm ready to begin my migration.  The migration IS impacting.  We have to somehow quiesce writes to the old filesystem..or else we'll lose some files in the migration.  I've found that this can be achieved by making the source file system read-only (DO THIS AFTER HOURS TO SAVE YOU A HEADACHE!).  Once the file system is configured as read-only...verify.  Can you create a new file or does it bark at you?  If it barked at you..GOOD!

Once the source file system is read-only, run robocopy ONE FINAL TIME!  Since it is in read-only mode, there should be no new files after running this script.  If you're insane and want to be double-safe...run it one more time after this step.  We should see ALL skips and 0 copies.

Now the next part really depends on the environment.  I've had some environments that either use DNS or DFS (Distributed File System).  In this (and my last) file system migration it was via DNS.  If thee computers are using "FILESERVER" as the hostname to point to the source filesystem..simply update the DNS entry to make "FILESERVER" point to the IP address of the new CIFS server.

Lastly.....do NOT TURN OFF THE OLD FILE SYSTEM!  Should a user call post-migration that a file is missing/corrupt...have the old file system handy for recovery purposes (SHOULDN'T be needed..but I'm paranoid).

Good luck!