Basic guide:
https://supportforums.cisco.com/document/29926/what-are-quick-steps-moving-service-profile-different-blade
First off, our scenario was a bit different.
Normally, one would simply disassociate the service profile and associate the service profile to the new blade....but this was a B200-M4 replacing a B200-M3.
You'll get an alert saying there is a BIOS issue if you try and associate the service profile..the issue here has to do with the host firmware package. The default one being used in this case did not have a software package to support the M4 blade.
Since the service profile was created from the template, we can modify the service profile to point to a newly created host firmware package without impacting the other blades.
Once complete...it goes through the normal process of associating the SP to a blade...with some errors. The KVM at this point is displaying the pre-POST message "Configuring and verifying memory." There are also a number of errors...mainly that the memory and processors were using unknown or unsupported FRUs.
My first thought was to update the capability catalog..no change.
Then I found these release notes:
http://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/release/notes/ucs_2_2_rn.html#pgfId-390503
The column on the left indicates the processor type with a minimum and recommended software version...we were running 2.2(6c).
Whoops!
As seen above, I updated the blade package on the host firmware package associated with the B200-M4 servers. That way, as long as I associate the same host firmware package, it will use the correct version.
Now, with the host firmware package "b200m4" updated, it re-associated the service profile, rebooted, and came up successfully.
Tuesday, November 22, 2016
Tuesday, November 15, 2016
Storage notes!
Intro
So far I've comes across/implemented FC, FCoE, iSCSI, and NFS. Here are some basic notes of those technologies!
FCoE
Implementation on a Nexus 5K..
We typically will have two VSANs and two FCoE VLANs, for example:
With regards to UCS..your cabling should look something like this:
Note: These should be separate links than what was used to connect the FIs to the Nexus 5Ks for ethernet connectivity (Cross connecting to N5Ks & use vPC).
Now that we're cabled...
1. Enable the features we need:
feature lacp
feature npiv
feature fcoe
vlan 304
name FCoE-VLAN_304
exit
Fabric B
vlan 305
name FCoE-VLAN_305
exit
vsan database
vsan 4
vsan 4 name General-Storage
exit
!
vlan 304
fcoe vsan 4
exit
Fabric B
vsan database
vsan 5
vsan 5 name General-Storage
exit
!
vlan 305
fcoe vsan 5
exit
description FCoE Link to FI-A eth1/33
channel-group 33 mode active
no shutdown
!
interface ethernet2/2
description FCoE Link to FI-A eth1/34
channel-group 33 mode active
no shutdown
Note: Perform the same action on FAB-B N5K
description FCoE EtherChannel Link to FI-A
switchport mode trunk
switchport trunk allowed vlan 304
spanning-tree port type edge trunk
Note: Perform the same action on FAB-B N5k
bind interface port-channel33
switchport trunk allowed vsan 4
switchport mode F
no shutdown
Note: Perform the same action on FAB-B N5K
show flogi database
Use this command to see fabric logins. We'll use this information to set up our zoning! If you aren't seeing logins...check host connectivity. For the case of UCS & the FIs, you'll see multiple FLOGIs on the same vfc.
show zoneset active
Once the zones have been created and we see FLOGIs, I'll use this command to verify my zoneset has been commited. If there is an issue with the FLOGI, you'll likely NOT see an FCID associated with the zone member. If you set up your zoning and don't configure FCoE..you'll simply see the PWWN/Alias you configured but no FCID.
show interface vfc38
This will give you an indicator if your trunks are allowing your VSANs and the current state. I had a scenario where the state was "initializing" on my FCoE VLAN. Further investigation found that the host connected to the vfc and the respective physical interface had NOT performed a FLOGI. A rescan did not prove to be helpful...but a reboot did :)
As seen here..our vfc is 38, bound interface is eth1/38, and we're allowing VSAN across the trunk and it is up:
9. Lastly....zoning..I'm just not sure how to do it..
The gist: Each initiator needs to have access to ONLY have access to a target. In the above example, we have 3 zones per fabric (3 initiators) and we've allowed the hosts access to both storage processors on our storage array. Why? Well, while the LUN on the storage array is "owned" by a storage processor, we want the host to have access to both storage processors, should one path become unavailable.
switchport trunk mode on
channel-group 29 force
!
interface fc1/30
switchport trunk mode on
channel-group 29 force
As performed from an MDS switch. Here we can see that the local domain ID of this particular switch is 0x91 and the peer switch is 0x63. Any hosts that perform a FLOGI to this switch will be given an FCID with the 0X91 prefix.
For example...there is a host logged in on fc2/3. The 0x910000 is the FCID for this host. We can tell from this output that this device is directly attached to the SAN switch "MDS1."
From the output below, we can determine that the MDS switch, "MDS3" has two equal paths to the 0x91 domain, via fc2/13 and fc2/14.
Below, we can see the output of the show run on the SAN switch facing ports on a UCS FI. As indicated, mode "NP" equates to node proxy. What does this do for us? Well, being that the FCID field is 1 byte long, there is the theoretical maximum of 255 FCIDs (note: there are some reserved values so the actual value is less than this). Node proxy allows the FI to proxy FLOGIs in a fashion similar to NAT.
2. Create the FCoE VLANs (304 & 305 on FAB-A & FAB-B, respectively):
Fabric Avlan 304
name FCoE-VLAN_304
exit
Fabric B
vlan 305
name FCoE-VLAN_305
exit
3. Create the VSANs (4 & 5 on FAB-A & FAB-B, respectively) and associate the VSAN with FCoE VLANs
Fabric Avsan database
vsan 4
vsan 4 name General-Storage
exit
!
vlan 304
fcoe vsan 4
exit
Fabric B
vsan database
vsan 5
vsan 5 name General-Storage
exit
!
vlan 305
fcoe vsan 5
exit
4. Create a port-channel containing physical member interfaces:
interface ethernet2/1description FCoE Link to FI-A eth1/33
channel-group 33 mode active
no shutdown
!
interface ethernet2/2
description FCoE Link to FI-A eth1/34
channel-group 33 mode active
no shutdown
Note: Perform the same action on FAB-B N5K
5. Configure the port-channel to trunk and allow the FCoE VLAN:
interface port-channel 33description FCoE EtherChannel Link to FI-A
switchport mode trunk
switchport trunk allowed vlan 304
spanning-tree port type edge trunk
Note: Perform the same action on FAB-B N5k
6. Create a virtual fibre channel (vfc) interface, bind it to the port-channel we just created, and allow the VSAN associated with the fabric:
interface vfc 33bind interface port-channel33
switchport trunk allowed vsan 4
switchport mode F
no shutdown
Note: Perform the same action on FAB-B N5K
7. Verify..
Some useful show commands...show flogi database
Use this command to see fabric logins. We'll use this information to set up our zoning! If you aren't seeing logins...check host connectivity. For the case of UCS & the FIs, you'll see multiple FLOGIs on the same vfc.
show zoneset active
Once the zones have been created and we see FLOGIs, I'll use this command to verify my zoneset has been commited. If there is an issue with the FLOGI, you'll likely NOT see an FCID associated with the zone member. If you set up your zoning and don't configure FCoE..you'll simply see the PWWN/Alias you configured but no FCID.
show interface vfc38
This will give you an indicator if your trunks are allowing your VSANs and the current state. I had a scenario where the state was "initializing" on my FCoE VLAN. Further investigation found that the host connected to the vfc and the respective physical interface had NOT performed a FLOGI. A rescan did not prove to be helpful...but a reboot did :)
As seen here..our vfc is 38, bound interface is eth1/38, and we're allowing VSAN across the trunk and it is up:
8. Wait....what does my host using FCoE need to do?
Well, when you configure your CNA/HBA to use FCoE, you'll need to tell it which FCoE VLAN to use. The benefit to the CNA is that it can carry both FCoE traffic as well as ethernet traffic.9. Lastly....zoning..I'm just not sure how to do it..
The gist: Each initiator needs to have access to ONLY have access to a target. In the above example, we have 3 zones per fabric (3 initiators) and we've allowed the hosts access to both storage processors on our storage array. Why? Well, while the LUN on the storage array is "owned" by a storage processor, we want the host to have access to both storage processors, should one path become unavailable.
FC
Wait, why should I use FC? Well---this one is open to debate. Some people think that the requirement for FC is dying. While we typically have 8Gb or 16Gb FC connections..there seems to be a race between FC and ethernet. Back in the day, FC was the declared winner..but with ethernet capabilities allowing for 10Gb, 25Gb, 40Gb, 100Gb...its possible that we may no longer see native FC! But for the time being...its pretty damn easy and straight forward to set up (why a lot of people care for it!)
That being said...cable that thing in the same way you would FCoE:
Furthermore, our N5Ks will have crossconnected paths to our storage like the FCoE diagram. One thing worth noting....if we have a "UP" or Unified Ports capability 5K, changing the port type from ethernet to FC requires a reboot! We also work left to right for ethernet and right to left for FC!
Now for the configuration..its pretty darn straight forward:
1. Enable the required features:
feature npiv
feature fport-channel-trunk
feature fcoe
2. Configure the port-channel (in this case its to the FIs)..With NPIV enabled, you must assign a virtual SAN (VSAN) to the SAN port channels that connect to the fabric interconnects:
interface san-port-channel 29
channel mode active
switchport trunk mode on
switchport trunk allowed vsan 1
switchport trunk allowed vsan add 4
3. Add the SAN port channel to an existing VSAN database on the data center core Cisco Nexus 5500UP-A switch:
vsan database
vsan 4
interface san-port-channel 29
4. On the data center core Cisco Nexus 5500UP-A switch, configure the SAN port channel on physical interfaces. The Fibre Channel ports on the Cisco Nexus 5500UP switches are set to negotiate speed by default.
interface fc1/29switchport trunk mode on
channel-group 29 force
!
interface fc1/30
switchport trunk mode on
channel-group 29 force
Misc.
As performed from an MDS switch. Here we can see that the local domain ID of this particular switch is 0x91 and the peer switch is 0x63. Any hosts that perform a FLOGI to this switch will be given an FCID with the 0X91 prefix.
For example...there is a host logged in on fc2/3. The 0x910000 is the FCID for this host. We can tell from this output that this device is directly attached to the SAN switch "MDS1."
From the output below, we can determine that the MDS switch, "MDS3" has two equal paths to the 0x91 domain, via fc2/13 and fc2/14.
Below, we can see the output of the show run on the SAN switch facing ports on a UCS FI. As indicated, mode "NP" equates to node proxy. What does this do for us? Well, being that the FCID field is 1 byte long, there is the theoretical maximum of 255 FCIDs (note: there are some reserved values so the actual value is less than this). Node proxy allows the FI to proxy FLOGIs in a fashion similar to NAT.
Below, NPIV (Node port ID virtualization) is the magic that allows us to have multiple fabric logins on an individual host-facing port and NPV (Node port virtualization) is the proxy portion.
iSCSI
To come..
NFS
To come..Friday, August 5, 2016
Python notes as it pertains to networking!
Getting it setup on my Raspberry Pi
So python comes pre-installed on the raspberry pi! Neat..but after following some guides here...
https://pynet.twb-tech.com/blog/automation/netmiko.html
I've decided to give this a try to see if I can use Python to SSH into my existing devices. This will help facilitate anything I'll need to do down the road! Some things I've read is that one of the benefits Python has over Poweshell is that Python supports SSH!
So one thing that I've learned (in my ignorant Python experience) is that you can use the Paramiko library to SSH into "stuff." I tried messing around with it and was able to successfully SSH into my home 2811...but the syntax was...not very user-friendly (for someone who is as new to Python as I am).
I wanted to get into use "Netmiko" see link above..but found that it wouldn't work! Now, I'll go over some things I tried..
1. I downloaded the setup.py file and manually installed it "python setup.py install --user."
2. Once it successfully installed, I tried launching python via "python" to see if it would import the ConnectHandler function from the Netmiko module.
3. No bueno! Every time I'd get to calling the ConnectHandler...I'd get the following error:
No handlers could be found for logger "paramiko.transport" Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/admin/.local/lib/python2.7/site-packages/netmiko-0.5.6-py2.7.egg/netmiko/ssh_dispatcher.py", line 84, in ConnectHandler return ConnectionClass(*args, **kwargs) File "/home/admin/.local/lib/python2.7/site-packages/netmiko-0.5.6-py2.7.egg/netmiko/base_connection.py", line 68, in __init__ self.establish_connection(verbose=verbose, use_keys=use_keys, key_file=key_file) File "/home/admin/.local/lib/python2.7/site-packages/netmiko-0.5.6-py2.7.egg/netmiko/base_connection.py", line 177, in establish_connection self.remote_conn_pre.connect(**ssh_connect_params) File "/home/admin/.local/lib/python2.7/site-packages/paramiko-2.0.2-py2.7.egg/paramiko/client.py", line 338, in connect t.start_client() File "/home/admin/.local/lib/python2.7/site-packages/paramiko-2.0.2-py2.7.egg/paramiko/transport.py", line 493, in start_client raise e AttributeError: 'EntryPoint' object has no attribute 'resolve'
Now, a little googling presented the following:
https://github.com/mozilla/sops/issues/67
Should the link die..
"Yep, downgrading to cryptography 1.2.1 fixed it. Thank you!"
Also, after discussing my conondrum in a reddit post, someone else recommended installing PIP, a Python package manager.
sudo apt-get install python-pip
Hindsight is 20/20..but I could have simply used PIP to install netmiko via "pip install netmiko."
So finally, with PIP installed..
sudo pip install cryptography==1.2.1
Huzzah, it works!
>>> from netmiko import ConnectHandler
>>> my_router = {
... 'device_type': 'cisco_ios',
... 'ip': '10.0.0.1',
... 'username': 'admin',
... 'password': '<password here>',
... }
>>> net_connect = ConnectHandler(**my_router)
>>> output = net_connect.send_command("show ip int brief")
>>> print output
Interface IP-Address OK? Method Status Protocol
FastEthernet0/0 <My public> YES DHCP up up
FastEthernet0/1 unassigned YES NVRAM administratively down down
FastEthernet1/0 unassigned YES unset up up
FastEthernet1/1 unassigned YES unset up up
FastEthernet1/2 unassigned YES unset up down
So python comes pre-installed on the raspberry pi! Neat..but after following some guides here...
https://pynet.twb-tech.com/blog/automation/netmiko.html
I've decided to give this a try to see if I can use Python to SSH into my existing devices. This will help facilitate anything I'll need to do down the road! Some things I've read is that one of the benefits Python has over Poweshell is that Python supports SSH!
So one thing that I've learned (in my ignorant Python experience) is that you can use the Paramiko library to SSH into "stuff." I tried messing around with it and was able to successfully SSH into my home 2811...but the syntax was...not very user-friendly (for someone who is as new to Python as I am).
I wanted to get into use "Netmiko" see link above..but found that it wouldn't work! Now, I'll go over some things I tried..
1. I downloaded the setup.py file and manually installed it "python setup.py install --user."
2. Once it successfully installed, I tried launching python via "python" to see if it would import the ConnectHandler function from the Netmiko module.
3. No bueno! Every time I'd get to calling the ConnectHandler...I'd get the following error:
No handlers could be found for logger "paramiko.transport" Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/admin/.local/lib/python2.7/site-packages/netmiko-0.5.6-py2.7.egg/netmiko/ssh_dispatcher.py", line 84, in ConnectHandler return ConnectionClass(*args, **kwargs) File "/home/admin/.local/lib/python2.7/site-packages/netmiko-0.5.6-py2.7.egg/netmiko/base_connection.py", line 68, in __init__ self.establish_connection(verbose=verbose, use_keys=use_keys, key_file=key_file) File "/home/admin/.local/lib/python2.7/site-packages/netmiko-0.5.6-py2.7.egg/netmiko/base_connection.py", line 177, in establish_connection self.remote_conn_pre.connect(**ssh_connect_params) File "/home/admin/.local/lib/python2.7/site-packages/paramiko-2.0.2-py2.7.egg/paramiko/client.py", line 338, in connect t.start_client() File "/home/admin/.local/lib/python2.7/site-packages/paramiko-2.0.2-py2.7.egg/paramiko/transport.py", line 493, in start_client raise e AttributeError: 'EntryPoint' object has no attribute 'resolve'
Now, a little googling presented the following:
https://github.com/mozilla/sops/issues/67
Should the link die..
"Yep, downgrading to cryptography 1.2.1 fixed it. Thank you!"
Also, after discussing my conondrum in a reddit post, someone else recommended installing PIP, a Python package manager.
sudo apt-get install python-pip
Hindsight is 20/20..but I could have simply used PIP to install netmiko via "pip install netmiko."
So finally, with PIP installed..
sudo pip install cryptography==1.2.1
Huzzah, it works!
>>> from netmiko import ConnectHandler
>>> my_router = {
... 'device_type': 'cisco_ios',
... 'ip': '10.0.0.1',
... 'username': 'admin',
... 'password': '<password here>',
... }
>>> net_connect = ConnectHandler(**my_router)
>>> output = net_connect.send_command("show ip int brief")
>>> print output
Interface IP-Address OK? Method Status Protocol
FastEthernet0/0 <My public> YES DHCP up up
FastEthernet0/1 unassigned YES NVRAM administratively down down
FastEthernet1/0 unassigned YES unset up up
FastEthernet1/1 unassigned YES unset up up
FastEthernet1/2 unassigned YES unset up down
Wednesday, June 1, 2016
Using robocopy for file system migrations
The idea..
We have file system A (old) & file system B (new). We wish to migrate the files & permissions of the OLD file system (on an old array) to the NEW file system (on a new array).
I've found that we want to copy as HIGH as we can get in the file system, so that robocopy can not only copy the directory and files..but ALSO the permissions!
In this particular implementation the old file system is called "fileserv1" and the new file system is called "fileserv2"
This particular customer's file system is only used for home drives that are managed via AD. The customer's know this drive as their "M:/ drive."
Now, as you can imagine, the directory that holds all of these sub-user directories is quite massive; there is a folder for EVERY user. In addition to each user's folder, the folder contains the security permissions of the individual + domain admin..we want robocopy to do all this work for us!
What is Robocopy?
Robocopy (the name is short for Robust File Copy) was introduced with the Windows Server 2003 Resource Kit and is included in all editions of Windows 7. Its many strengths include the ability to copy all NTFS file attributes and to mirror the contents of an entire folder hierarchy across local volumes or over a network. If you use the right combination of options, you can recover from interruptions such as network outages by resuming a copy operation from the point of failure after the connection is restored.
The Robocopy syntax takes some getting used to. If you’re familiar with the standard Copy and Xcopy commands, you’ll have to unlearn their syntax and get used to Robocopy’s unconventional ways. The key difference is that Robocopy is designed to work with two directories (folders) at a time, and the file specification is a secondary parameter. In addition, there are dozens of options that can be specified as command-line switches.
Shamelessly stolen from Microsoft's TechNet...see here for the full article!
Preliminary
Once I've created the CIFS share per best practices (HIDE the ETC and the Lost & Found!), I'm going to navigate to the highest level that I can navigate to and mirror the permissions.
The reason that I've set the permissions on this to "Domain Admins" is so that when robocopy copy/creates the subdirectories..I want each directory to inherit the "Domain Admins" permission so that any user in the Domain Admins group can access/manage the files. In addition, we do NOT want any non-Domain Admin user to have access to the "myhomefolder."
Furthermore, where do we run the script? It does not require much..so I'd simply pick a windows server that has access to both source/destination file systems. Be sure to log into the server as a domain admin!
NOTE: If you're copying in a windows environment, you're going to have to pass credentials BEFORE you can use robocopy
The syntax for that is..
net use \\server\ipc$ /user:username password
Example: "net use \\10.1.5.1\c$\Users\Bob\Desktop /user:john.doe Ilovepotatoes"
The script
:loop
robocopy C:\Users\kbarnes\Desktop\Source C:\Users\kbarnes\Desktop\Destination /MT /copyall /MIR /TEE /FFT /R:1 /W:5 /XD ~snapshot /NP /log+:C:\Users\kbarnes\Desktop\Robo.txt
echo complete
echo %date% %time%
goto loop
I feel as it is probably best to just present what I use and try to explain its functionality. I'm using this script as an example, hense the source/destination of two folders on my computer's desktop. In a REAL scenerio, it may look something like this:
robocopy \\<source FS>\myhomefolder \\<destination FS>\c$\cifs_home_fs\myhomefolder
Now...for the "weird" part: Robocopy switches.
Switches are the parameters that the administrator can modify to make Robocopy do what he/she wants. It is honestly as simple as that. My example is one that I've been using that works for me and I'll explain:
- /MT This is Robocopy's "multithread." The default is 8 simultanous file copies. We can use a value between 1-128 by using "/MT:32" with 32 threads as an example.
- /copyall This copies all. Yea, no shit. What I mean is that it also copies the security/last modified information of the file.
- /MIR This mirrors a directory tree. Furthermore, it will copy subdirectories including empty ones and remove files that are not in the source directory.
- /TEE Writes the status output to the console window, as well as to the log file
- /FFT Assumes FAT file times (two-second precision)
- /R:1 Specifies the number of retries on failed copies. The default value of N is 1,000,000 (one million retries). We used a read of "1" as we found that we'd rather to try and get the files in a another sweep..than to wait.
- /W:5 Specifies the wait time between retries, in seconds. The default value of N is 30 (wait time 30 seconds). The same as the read...we'd rather not wait forever on a single file.
- /XD ~snapshot We found that there is NO reason to copy the snapshot directory. Using /XD ~snapshot ignores the hidden snapshot directory.
- /NP This stops the progress information of the copy job. We just found it not necessary in our uses.
- /log+ When using log+ followed by a location, we can allow the output to be sent to a text file. The + sign says to prepend any output to the same file, versus creating a new one/replacing the old one every cycle.
Lastly, I've included some things in the script to make it loop and to output time, dates, and a string value of "complete."
Example use of script
1. Log into windows server as a domain admin user.
2. Right-click on the .bat file and "Run as Administrator."
3. Assuming your script is set up correctly, you should see the log file pop up wherever you've indicated.
In my example, I have two folders: Source and Destination. I've put in a couple files in the source and I want to see them pop over to the destination!
After we've run the script we can see a couple of things: The text files have been copied and the "Date modified" has also been copied! If we had permissions on the text files, they would have been copied over as well!
4. Review the output log
Here, we can see that the 8 files have been completed successfully and that there were no failures!
What if we run the script again...will it attempt to copy the files a second time?
Here we can see that it skipped ALL the files, as there was nothing new on the source directory!
What if we change one of the files?
Here, I simply put some random text into one of the source text files called "New Text Document - Copy.txt"
Lets run the script again...
NICE! We can see that it skipped the 7 unmodified files but copied over the one file that was changed! We can also see the file that was changed with an indicator that it is "Newer."
Lastly..what if we delete one of the files on the source?
Here we can see that the file we deleted on the source is shown as "EXTRA File." If we navigate to the destination directory "Destination" we can confirm that the deleted file on the source has been deleted on the destination. This is made possible by the /MIR switch we used.
Yep! It's gone!
How to perform the migration?
It is important to know that the first pass is quite time consuming. It has to copy EVERY file/directory that we've indicated to the destination. You can absolutely do this during business hours. Because of the loop in the .bat file, its going to start over as soon as it finishes. I typically like to kick it off as I'm leaving for the day so that I can come in and verify the progress.
Here is a real world example of the first pass:
Note: This first pass took ~5 hours. There were some skips because I did a couple tests before running the full on script.
Once we get the first pass done, I let robocopy continue to run until I'm ready to begin my migration. The migration IS impacting. We have to somehow quiesce writes to the old filesystem..or else we'll lose some files in the migration. I've found that this can be achieved by making the source file system read-only (DO THIS AFTER HOURS TO SAVE YOU A HEADACHE!). Once the file system is configured as read-only...verify. Can you create a new file or does it bark at you? If it barked at you..GOOD!
Once the source file system is read-only, run robocopy ONE FINAL TIME! Since it is in read-only mode, there should be no new files after running this script. If you're insane and want to be double-safe...run it one more time after this step. We should see ALL skips and 0 copies.
Now the next part really depends on the environment. I've had some environments that either use DNS or DFS (Distributed File System). In this (and my last) file system migration it was via DNS. If thee computers are using "FILESERVER" as the hostname to point to the source filesystem..simply update the DNS entry to make "FILESERVER" point to the IP address of the new CIFS server.
Lastly.....do NOT TURN OFF THE OLD FILE SYSTEM! Should a user call post-migration that a file is missing/corrupt...have the old file system handy for recovery purposes (SHOULDN'T be needed..but I'm paranoid).
Good luck!
Thursday, April 14, 2016
My first IWAN deployment
First off, let me get some of my resources out in the open. These were INVALUABLE, and i'd have failed if these had not been made available:
http://www.cisco.com/c/dam/en/us/td/docs/solutions/CVD/Feb2016/CVD-IWANDesignGuide-FEB16.pdf
http://www.cisco.com/c/dam/en/us/td/docs/solutions/CVD/Feb2016/CVD-IWANConfigurationFilesGuide-FEB16.pdf
http://www.cisco.com/c/dam/en/us/solutions/collateral/enterprise-networks/intelligent-wan/cvd-iwan-diadesignguide-mar15.pdf
http://docwiki.cisco.com/wiki/PfR3:Solutions:IWAN
Next, this is the personal blog of a friend of mine, who convinced me to take on this project---GOOD READ on all things IWAN: http://spanport.net -- He is the one who told me about the IOS-XE code 3.13 having MANY bugs...seriously upgrade to 3.16.02.S. We couldn't get MFR or site-to-site PfR working on the 3.13 version.
Lastly, I attended Cisco's IWAN "Design & Deploy for Impact" training. Anything indicated in red (like this) I've included POST training as it is a contradiction to my understanding. I want you to know what I THOUGHT and what I was told (just in case you find yourself thinking the same thing. Much thanks to Denise Fishburne, David Prall, Tom Kunath, & Mani Ganesan for putting me straight!
Now that we've got that out of the way, the statement of work essentially lists the following for our proof of concept:
This is REALLY going to depend on your MPLS environment. For this customer they're using BGP for PE-CE communication while advertising a default route to all spokes. My goal from a hub perspective is simply to make sure I have a route OUT to the core. I simply slapped a default route on my OUTSIDE vrf with a next-hop of the L3 core...and since my core has a route to all my spoke sites MPLS addresses...that'll do it!
http://www.cisco.com/c/dam/en/us/td/docs/solutions/CVD/Feb2016/CVD-IWANDesignGuide-FEB16.pdf
http://www.cisco.com/c/dam/en/us/td/docs/solutions/CVD/Feb2016/CVD-IWANConfigurationFilesGuide-FEB16.pdf
http://www.cisco.com/c/dam/en/us/solutions/collateral/enterprise-networks/intelligent-wan/cvd-iwan-diadesignguide-mar15.pdf
http://docwiki.cisco.com/wiki/PfR3:Solutions:IWAN
Next, this is the personal blog of a friend of mine, who convinced me to take on this project---GOOD READ on all things IWAN: http://spanport.net -- He is the one who told me about the IOS-XE code 3.13 having MANY bugs...seriously upgrade to 3.16.02.S. We couldn't get MFR or site-to-site PfR working on the 3.13 version.
Lastly, I attended Cisco's IWAN "Design & Deploy for Impact" training. Anything indicated in red (like this) I've included POST training as it is a contradiction to my understanding. I want you to know what I THOUGHT and what I was told (just in case you find yourself thinking the same thing. Much thanks to Denise Fishburne, David Prall, Tom Kunath, & Mani Ganesan for putting me straight!
Now that we've got that out of the way, the statement of work essentially lists the following for our proof of concept:
- 6 remote sites & 2 hub routers (1 MPLS & 1 INET).
- Master controller for PfR resides on the MPLS HUB. This goes against Cisco best practices, but will work for the purpose of our POC.
- We're utilizing ISR 4331s for the remote routers as well as the hub routers. Should this go prod we'll upgrade the hubs to something beefier.
- We're using pre-shared keys for IPSEC auth...this will get upgraded to PKI once this goes prod.
- We're using Cisco Prime for "Management of the IWAN infrastructure & spoke deployment."
That being said, you may be asking, "What is IWAN?" IWAN is Cisco's flavor of service defined WAN (SDWAN). Cisco's Intelligent WAN, or IWAN, is made up of the following pillars:
Secure connectivity is established with the use of Zone based firewalls and front door VRFs (FVRFs).
- Transport Independence
You may have heard of the term "transport agnostic." IWAN allows you to run an overlay network on top of any given provider, regardless of the underlying connectivity. For example, you can connect sites with 4G internet, commercial internet, MPLS, and/or a combination of said means. Unlike MPLS where we're heavily tied to the provider and have negotiated strict service-level aggreements (SLA), IWAN allows us to essentially hedge our bets; using multiple paths with different providers to ensure application performance.
- Intelligent Path Control
PfRv3 is the magic behing IWAN. We essentially use smart probing in addition to active data flows to test for delay, loss, and jitter. Should a path be deemed fall out of pre-determined metrics, IWAN will know preemptively if there is a better path for a given service. We can create policies for different application profiles based on differing amounts of delay and/or jitter with different actions to perform should our application fall out of the acceptable bounds.
- Application Optimization
AVC, or application visibility and control...or essentially NBAR2 allows the network administrator to identify >1400 applications. Based on NBAR classification, we can use differentiated service code point (DSCP) markings to ensure our applications fall under the PfR polices that are created on our master conroller (MC--read futher). Should NBAR not match on your home-grown application..you can still use the ISR's modular QoS to identify traffic. It's VERY flexible!
- Secure connectivity
Secure connectivity is established with the use of Zone based firewalls and front door VRFs (FVRFs).
Our reason/goal of implementing IWAN is to provide an alternate solution for remote site connectivity. Tired of spending $1000 on remote site T1 connections? Well, with IWAN, we can theoretically have 2 commercial internet connections and utilize smart probing to determine the BEST path for our chosen applications.
"But..but what about my SLA..." SCREW your SLA. If/when commercial internet #1 is determined to be lossy, for example sake, PfR's probing will say "HEY, VOICE, START USING THE OTHER PATH!"--overriding what exists in the routing table! The beauty of this product is that the policies are centrally managed so that we can have an environment with hundreds of spokes..without having to statically control the routing! If spoke A has a bad internet connection..the rest of the network will know to avoid that connection until it is determined to be resolved!
In addition to "getting rid of the T1s," we also want to get some control/visiblity of our WAN! We've all become soft and comfortable with the thought of simply handing our traffic to the ISP! With IWAN (and other SD WAN technologies for that matter), our goal will be to 1. Get an better understanding of what we're running on our network and 2. Gain visibility of these applications so that we can better manage & troubleshoot.
In addition to "getting rid of the T1s," we also want to get some control/visiblity of our WAN! We've all become soft and comfortable with the thought of simply handing our traffic to the ISP! With IWAN (and other SD WAN technologies for that matter), our goal will be to 1. Get an better understanding of what we're running on our network and 2. Gain visibility of these applications so that we can better manage & troubleshoot.
Ok, off my soap box.
But seriously, stop paying for those expensive ass T1s!!!!!
Now..if you review the configuration guide for IWAN there are a TON of configs on there..and if you aren't already comfortable with DMVPN, QoS, and basic EIGRP routing..you might want to go ahead and review those topics. My goal is to talk IWAN (and honestly..PfR..as I'd never even messed with this before). Furthermore, we'll discuss QoS..as there is quite a bit of QoS that is required to allow PfR to do its magic.
Lets talk about our hubs
As I said in our overview, we'll have two hubs that we need to squeeze into an existing network: One for MPLS connectivity and one for INET connectivity.
Let's talk placement...
In an ideal world, both the routers would live at the edge of the network. HAH, good luck with that. The biggest thing I'm looking for in the placement is to provide physical redundancy. In other words, I don't want an SFP, cable, and/or switch failure to cause loss of connectivity. In this particular customer's network, we decided to hang the devices off a pair of Nexus 5Ks (and their respective FEXs). ***WARNING*** this goes against Cisco best practice..but for our proof of concept purposes..it'll do!
Tom mentioned that while it may seem ideal to "clean up" the WAN edge at the hub once we've migrated all our sites to IWAN...it allows for flexibility in the future by having the IWAN routers "sit" behind the CE. The biggest benefit one may gain, as instructed by Tom, was that since we can't use nested QoS policies on the phsyical interface (as it breaks per-tunnel QoS)...having a separate CE router allows us the flexibility of using a hierarchical QoS policy that we would otherwise not be able to have!
Tom mentioned that while it may seem ideal to "clean up" the WAN edge at the hub once we've migrated all our sites to IWAN...it allows for flexibility in the future by having the IWAN routers "sit" behind the CE. The biggest benefit one may gain, as instructed by Tom, was that since we can't use nested QoS policies on the phsyical interface (as it breaks per-tunnel QoS)...having a separate CE router allows us the flexibility of using a hierarchical QoS policy that we would otherwise not be able to have!
Basic L2/L3 connectivity..
Now, here is where stuff gets exciting! IWAN uses a concept called front-door VRF. Essentially, this is a security mechanism that places the public facing (yes, we'll do this for the MPLS one too..) in a separate VRF. Logically, the "outside" and the "inside" legs into the network are completely separate...but PHYSICALLY, they are identical! I accomplished this by using port-channel sub-interfaces. Since our connection to the Nexus DC switches are purely L2, I created two sub-interfaces on our MPLS hub (po20.951 & po20.953), for example. To create the inside & outside L3 legs, I created two SVIs on the core L3 switch, 951 and 953. After assigning the SVIs with IP addresses on the core, I put po20.951 into the "OUTSIDE" vrf. Lastly, we assigned IP addresses on the hub routers that are in corresponding subnet. If you've completed everything correctly at L2 (create VLAN instances), you SHOULD be able to ping from the hub router to the core router on both the global routing table (v953) as well as the OUTSIDE VRF (v951).
David Prall laughed when I told him that we have this setup! "Do NOT use port-channels on the WAN edge of your IWAN hub." While we can port-channel the INSIDE interfaces...we cannot port-channel on the WAN side, as it negates our per-tunnel QoS! Well...shit! What do we do to provide redundancy? While we cannot use port-channels, we CAN have a separate physical path..we just can't channel them together! This can be accomplished by using "tunnel source loopback#" and using separate paths to get to this loopback (notice the higher AD on one of the paths?) I didn't include it...but one might want to use a track statement on the static routes so that we aren't just relying on the physical interface going up/down!
David Prall laughed when I told him that we have this setup! "Do NOT use port-channels on the WAN edge of your IWAN hub." While we can port-channel the INSIDE interfaces...we cannot port-channel on the WAN side, as it negates our per-tunnel QoS! Well...shit! What do we do to provide redundancy? While we cannot use port-channels, we CAN have a separate physical path..we just can't channel them together! This can be accomplished by using "tunnel source loopback#" and using separate paths to get to this loopback (notice the higher AD on one of the paths?) I didn't include it...but one might want to use a track statement on the static routes so that we aren't just relying on the physical interface going up/down!
The "last" ..I'll say that a million times I'm sure.. thing you have to do is to connect the hub router to whatever internal routing protocol is being used internally. In this scenario the customer is using OSPF as the IGP..so I'll need to get OSPF adjacency for global routing table; I'll use static routing for my OUTSIDE VRF connectivity. In this scenario we'll be using EIGRP as your routing protocol for DMVPN. The hub routers will be our point of redistribution...we'll come back to this as it requires some delicate handling.
Why EIGRP, though? The "main" reason seems to be that if you're using OSPF...the customer probably doesn't want EIGRP on their network (trying to stay off proprietary protocols?). An alternative is BGP! Aside from the reason I listed, one might argue that BGP offers benefits that EIGRP cannot--primarily being the granular nature in one's ability to control the routing.
Why EIGRP, though? The "main" reason seems to be that if you're using OSPF...the customer probably doesn't want EIGRP on their network (trying to stay off proprietary protocols?). An alternative is BGP! Aside from the reason I listed, one might argue that BGP offers benefits that EIGRP cannot--primarily being the granular nature in one's ability to control the routing.
MPLS
This is REALLY going to depend on your MPLS environment. For this customer they're using BGP for PE-CE communication while advertising a default route to all spokes. My goal from a hub perspective is simply to make sure I have a route OUT to the core. I simply slapped a default route on my OUTSIDE vrf with a next-hop of the L3 core...and since my core has a route to all my spoke sites MPLS addresses...that'll do it!
INET
So far we've ONLY discussed our MPLS router...as this is the easiest. Heck, the INET hub placement is IDENTICAL, with the only difference being that NAT is involved. What?! NAT?! Yea, we're statically NATing the "source" of our DMVPN connectivity & allowing the basic GRE/IPsec "stuff" on the firewall. I used another static default on the OUTSIDE VRF but with a next-hop of our firewall's inside interface. In addition to the firewall..we have a basic ACL applied on the outside interface allowing the same GRE/IPsec "stuff."
Once my hub routers are on the network, the first thing I do is configure my tunnel interfaces without any crypto our routing configured. My goal is to get this working in phases; first get DMVPN connectivity, then apply the crypto, then get my routing configured, and THEN worry about QoS/PfR. If you try to apply ALL the configurations out the gate..good luck troubleshooting anything, should/WHEN it doesn't work.
The hubs are in place...
Assuming we did everything correctly on the hub...lets set up a spoke! This customer site had a test environment that proved to be invaluable! In our test environment we have both a MPLS connection (T1) and a commercial internet connection.
With the device connected to both providers, we set up both "OUTSIDE" interfaces the same as we did on the hub; IWAN-TRANSPORT-1 for the MPLS interface and IWAN-TRANSPORT-2 for the INET interface. Because we are now using a VRF for MPLS connectivity, we have to modify BGP to use the address-family associated with the INET VRF instance!
router bgp 12345
no bgp default ipv4-unicast
!
address-family ipv4 vrf IWAN-TRANSPORT-1
neighbor 1.2.3.4 remote-as 54321
neighbor 1.2.3.4 description TO_MPLS_PROVIDER
neighbor 1.2.3.4 password 7 2304982034820384
neighbor 1.2.3.4 version 4
neighbor 1.2.3.4 activate
Now....assuming we can ping 1.2.3.4 if we source our pings from VRF IWAN-TRANSPORT-1 and our password/remote-as is correct..BGP SHOULD come up.
We can verify our BGP adjacency by performing a "show bgp vpnv4 unicast all summary". As I said earlier, we are only advertising a default route into BGP at the hub site...so we should expect to see "1" PfxRcd.
On the internet side...you should simply need to slap an IP address on the interface and verify you can ping out to the internet sourced from that VRF: ping vrf IWAN-TRANSPORT-2 8.8.8.8.
One thing that i'll note is that how you configure the internet facing interface "depends." If you're going to have central internet connectivity, or sending all internet through the hub, then you'll only need an ACL like we used on the INET hub. BUT...if we decided to go with direct internet access (we'll come back to this), then we'll use zone-based ACLs on that outside interface!
The first phase...DMVPN
Now that our hub and our spokes should have basic connectivity, time to put on the next layer: DMVPN. Honestly, by the time you're done implementing everything..you're tunnel interfaces are going to look ridiculous. You'll have per-tunnel QoS, multicast, and IPsec....but as I said, lets start without all the gobbledigook. Should you have any issues getting DMVPN connectivity, the first thing you'll want to check is to see if you have "tunnel vrf <VRF>" applied. This little command is what tells DMVPN "Hey, use this VRF to form the underlay!" If DMVPN is not forming, verify you have reachability by using good ole' ICMP. Can you ping the hub's "tunnel source" from the spoke's "tunnel source?" Lastly, verify that you have the NHRP nhs and nbma in the correct order!
The second phase...IPsec
Now that we have DMVPN connectivity, lets put on our IPsec layer! As I said earlier, we're using pre-shared keys currently, and i'll update this once we get cert-based auth working! But honestly..there isn't a whole lot to say here..as Cisco's configuration guide has made this EASY. The only thing i'll say is that if you are doing this to a remote site...do the remote-site first! Once you apply the tunnel protection profile...if both sides aren't IPsec ready..you'll lose connectivity! If you have ANY issues, verify your phase 1 and phase 2 configurations--should have mirroring transform-sets! Be sure to verify that your traffic is being encrypted!--show crypto ipsec sa and verify the numbers are incrementing!
The third phase...Routing
This is where stuff can get squirrely: Routing. I do NOT want to introduce any routing loops into my network...so there will be NO redistribution until I have all my routes tagged appropriately!
Here are my goals with route-tagging:
- Do NOT let my spokes advertise out anything they've learned from the hub.
- Configure the spokes as EIGRP stubs.
- Block anything with tags 101, 102, 103, or 104 outbound.
- Do NOT let my spokes advertise a default route.
- Block the prefix 0.0.0.0/0 outbound.
- Do NOT let my hub routers learn anything advertised from the OTHER hub routers.
- Block 101 & 102 inbound on the tunnel interfaces.
- Do NOT let my hub routers redistribute anything BACK into OSPF that was learned via OSPF.
- Tag & block on the hub routers (Tag 10 & block 20, while inverse on the other hub).
As you can see..some of this is a bit redundant. For instance, I'm not allowing either IWAN hub to learn anything from the other IWAN hub...even though my spokes are blocking learned routes from being advertised. The point of this is to have MULTIPLE layers of blocking, should a spoke be added that doesn't mirror other spoke configurations.
Again, follow the CVD for the EIGRP configuration--you can't go wrong! I'm likely a bit too obsessed with routing-loops...so I put in a couple more things to avoid it!
David/Denise--you got me again! While my design DOES stop routing loops...it also could potentially break routing in general between spokes! OK--here is our scenario:
You might notice in the CVD that we're summarizing on the hub to the spokes. We'll get more into this once we get to the PfR configuration. This is possible because we're using phase 3 DMVPN. Remember, if we had been using phase 2, then each spoke would need a more specific route to allow for spoke-to-spoke communication! If we weren't using DIA..i'd have simply advertised a default to all my spokes!
At this point we should have EIGRP adjacency to my spokes. If we followed the CVD, we should see that our MPLS path (tunnel 10) is what is installed in the RIB, as the CVD has us configure a higher delay on the inet path tunnel interfaces. This is important, as while PfR will allow us to override the routing table, we want to ensure we don't have asymmetric routing if our destination is not yet PfR controlled (i.e. you haven't cut all of your remote sites over to IWAN!).
***Per Cisco, it is on the roadmap for EIGRP to include EIGRP stub-site & stub-site wan-interface configurations...this will do what we're doing with route tagging!!***
Again, follow the CVD for the EIGRP configuration--you can't go wrong! I'm likely a bit too obsessed with routing-loops...so I put in a couple more things to avoid it!
David/Denise--you got me again! While my design DOES stop routing loops...it also could potentially break routing in general between spokes! OK--here is our scenario:
Under normal conditions there are no problems--spoke A can talk to spoke B & each spoke can talk tot he hub JUST fine. Now, what if spoke A's MPLS path is down & spoke B's INET path is down? Well....shit. Instead of blocking ALL routes from being learned...we should simply poison them (delay 25000 on the upstream link) or advertise a summary to the BR (remember--longest prefix wins). That way, should there be a path down scenerio like we discussed..spoke A can still talk to spoke B (in a hub-spoke fashion) by transiting through the other BRs! To summarize, my original route-tagging/blocking would stop each hub BR from learning about the paths via the other hub. We WANT them to learn the path...just in normal conditions NEVER use them.
At this point we should have EIGRP adjacency to my spokes. If we followed the CVD, we should see that our MPLS path (tunnel 10) is what is installed in the RIB, as the CVD has us configure a higher delay on the inet path tunnel interfaces. This is important, as while PfR will allow us to override the routing table, we want to ensure we don't have asymmetric routing if our destination is not yet PfR controlled (i.e. you haven't cut all of your remote sites over to IWAN!).
***Per Cisco, it is on the roadmap for EIGRP to include EIGRP stub-site & stub-site wan-interface configurations...this will do what we're doing with route tagging!!***
The fourth phase...QoS/PfR
QoS and PfR are the magic of IWAN. Seriously, you can get DMVPN/EIGRP set up in a day..but fine tuning your PfR policies can be a non-stop process.
One thing worth noting is that you may see that I have a nested child policy on the spoke..but not on the WAN hubs. The reason for this is to do with per-tunnel QoS. Per Cisco, we cannot have a nested child policy on the WAN hubs, as this "Breaks per-tunnel QoS."
Furthermore, I used port-channels on my WAN edge at the hub...this is a no no for the same reason as using hierarchical QoS--it breaks per-tunnel QoS. When this goes production, i'll be sure to have a separate physical path for the inside & outside!
One thing worth noting is that you may see that I have a nested child policy on the spoke..but not on the WAN hubs. The reason for this is to do with per-tunnel QoS. Per Cisco, we cannot have a nested child policy on the WAN hubs, as this "Breaks per-tunnel QoS."
Furthermore, I used port-channels on my WAN edge at the hub...this is a no no for the same reason as using hierarchical QoS--it breaks per-tunnel QoS. When this goes production, i'll be sure to have a separate physical path for the inside & outside!
This is the gist of what we're going to try and accomplish. The first thing I want to talk about is regarding QoS tagging. Cisco recommends an end-to-end QoS policy..where we're marking/classifying as close to the source as possible. Unfortunately...I'm not going to re-do this customer's QoS policy...that is just wwwwwwwwwwwaaaaay out of scope. To get around the fact that they lack a true QoS design, see "DSCP-MARKUP." While the ISR 4331s support NBAR2, or "Next Generation NBAR" we aren't going down that road for the POC.
The main applications this customer has running across their network are Exchange, Citrix, Voice, Video, and McAfee. I simply went with using an ACL to classify/mark the inbound traffic on the devices. That being said, Cisco Prime has some REALLY good templates that you can "borrow" that gets into some really neat classification using NBAR! The point of this markup is ENTIRELY for PfR purposes, as we'll go ahead and discuss now.
PfR configuration is scarily easy. Seriously, follow the CVD. The only point worth mentioning is the loopback reachability, prefix-list application, and policy creation.
First off, just make sure all your BRs can have reachability to the MCs loopback that you're using for PfR. That's it!
For the longest time I could not for the life of me figure out what the prefix-lists were for...here is my attempt at explaining that:
The site-prefix are the prefixes that your hub is advertising to the spokes. These prefixes are what is used for smart-probing. You have different ways of approaching this: Create a summary route in EIGRP for 10.0.0.0/8 and include a prefix-list that only includes 10.0.0.8 OR have a HUGE prefix-list that includes every prefix that the hub advertises to the spokes.
Why would you need this? Well the documentation on this is fuzzy at best..but my interpretation (and those that my peers seem to agree with) is that while the spokes learn about other spoke prefixes dynamically, the site-prefix is that of the hub, or data center learned networks. This part is NOT dynamic.
So if we talk about the first option (summarized 10.0.0.0/8), we'll be sending probes for this prefix and the respective traffic-classes. For example, if we have DSCP markings for EF, AF41, AF31, and 0....we'd have probing for the 10.0.0.0/8 network for the 4 traffic-classes. Alternatively, if we included all the subnets in the prefix list (second option), we'd have probing for each traffic-class of each prefix. But what does that mean? My interpretation is that this prefix-list is a balancing act; create too small a prefix-list and your probing isn't sufficient. Create too large a prefix-list and you'll kill your router's CPU with probes.
That being said, your prefix-list MUST match what is the RIB. For example, if you aren't summarizing 10.0.0.0/8, but include 10.0.0.0/8 prefix-list..then the only thing PfR will be probing for will be the EXACT 10.0.0.0/8 prefix, nothing with a longer prefix!!!!
David Prall said I'm incorrect on this! The only thing you need to do is ensure that you have a parent route for any site-prefixes learned from the hub! While "overloading" the hub is unrealistic, having too small a site-prefix IS an issue. For example, if we summarized 10.0.0.0/8 from the hub & used a site-prefix of 10.0.0.0/8....should ANY source-dest traffic for the particular marking from the hub fall out of policy, EVERYTHING is moving over to the alternate path. To make this perfectly clear....you have a summary for 10.0.0.0/8 & your site-prefix is 10.0.0.0/8, but you have voice traffic going to 10.0.0.27/24 & 10.100.5.9/24....if there is voice latency going to 10.100.5.9/24 destination....its going to swing this voice traffic AND 10.0.0.27/24 over to the alternate path (assuming this path is better). Alternatively, if we were to have a site-prefix list including 10.0.0.0/24 & 10.100.5.0/24 and experienced latency to 10.100.5.9...we'd only swing 10.100.5.9/24 to the alternate path, leaving 10.0.0.0/24 where it is!
Furthermore, the site-prefix does not have to have a 1-to-1 match in the RIB--you simply need a parent route!
David Prall said I'm incorrect on this! The only thing you need to do is ensure that you have a parent route for any site-prefixes learned from the hub! While "overloading" the hub is unrealistic, having too small a site-prefix IS an issue. For example, if we summarized 10.0.0.0/8 from the hub & used a site-prefix of 10.0.0.0/8....should ANY source-dest traffic for the particular marking from the hub fall out of policy, EVERYTHING is moving over to the alternate path. To make this perfectly clear....you have a summary for 10.0.0.0/8 & your site-prefix is 10.0.0.0/8, but you have voice traffic going to 10.0.0.27/24 & 10.100.5.9/24....if there is voice latency going to 10.100.5.9/24 destination....its going to swing this voice traffic AND 10.0.0.27/24 over to the alternate path (assuming this path is better). Alternatively, if we were to have a site-prefix list including 10.0.0.0/24 & 10.100.5.0/24 and experienced latency to 10.100.5.9...we'd only swing 10.100.5.9/24 to the alternate path, leaving 10.0.0.0/24 where it is!
Furthermore, the site-prefix does not have to have a 1-to-1 match in the RIB--you simply need a parent route!
Please see the PfR wiki for more information on the probing!
Now lets discuss the enterprise-prefix. The enterprise-prefix list is, in my understanding, mainly used to differentiate enterprise from internet traffic. If a prefix is a destination that falls OUTSIDE of this prefix-list, then the traffic will show as "INTERNET" and will be load-balanced. If your prefix is within the range and not learned via a site-prefix, then it will not be included in PfR's probing/control, but will simply fall back on the routing table to avoid asymmetric routing. Ultimately, this won't matter if all of your spoke's are IWAN/PfR controlled..but is a stopgap until you have all of your spokes converted.
Thanks Mani on this one! By default, if the traffic is matched by your enterprise prefix-list...it by DEFAULT falls back to the routing table (as I said). BUT, you can configure "load-balancing" under the PfR policy to load balance this traffic. The only traffic that is load-balanced is the non-"performance" traffic (aka the traffic that is tracking on delay, jitter, loss). Because we cannot track on delay, loss, jitter...we're only tracking on reachability.
Tom brought up an interesting scenario on this topic! Salesforce.com resolves to a public IP address (outside of our enterprise prefix-list range). What if we ONLY want salesforce.com traffic on our INET path, never on MPLS? We can add an entry in our PfR site-prefix list for the specific prefix! Once added, we can add an entry in our PfR policy with path preference, as this is now technically "PfR controlled!"
Thanks Mani on this one! By default, if the traffic is matched by your enterprise prefix-list...it by DEFAULT falls back to the routing table (as I said). BUT, you can configure "load-balancing" under the PfR policy to load balance this traffic. The only traffic that is load-balanced is the non-"performance" traffic (aka the traffic that is tracking on delay, jitter, loss). Because we cannot track on delay, loss, jitter...we're only tracking on reachability.
Tom brought up an interesting scenario on this topic! Salesforce.com resolves to a public IP address (outside of our enterprise prefix-list range). What if we ONLY want salesforce.com traffic on our INET path, never on MPLS? We can add an entry in our PfR site-prefix list for the specific prefix! Once added, we can add an entry in our PfR policy with path preference, as this is now technically "PfR controlled!"
Lastly, lets discuss the PfR MC policy! I'll first say that I have not and will not modify the default policies (voice, video, low-latency-data, and/or bulk-data). The most I've done is modified the policies to include the DSCP markings that are included in my "MARKUP" policy. For example, this customer made it clear that they ONLY wanted voice/video on the MPLS..and the rest to take the internet path. Well, that's easy enough--I simply made sure my path-preference was MPLS fallback INET for my voice/video classes and that the rest were the inverse.
Now, what if your prefix is PfR controlled BUT there is nothing in the PfR policy? For instance, 10.1.0.0/16 is in our site-prefix list, but there is nothing for DSCP 0? By default, it will use the routing table to determine the path to use. What if we don't want it to ONLY use the path in the routing table? Cisco's recommendation is to use "load-balancing" within the PfR policy! By doing so, PfR load-balances this traffic across BOTH paths and tries to use a variance of 20% between the paths. THIS CAN CAUSE ASYMMETRIC ROUTING....just an FYI..but yea, get over it!
EVEN if we have load-balancing configured---it will ONLY load-balance our non-performance "stuff." "Stuff" being things within our policy that do not have priority1, priority2, etc...like voice, for example. IF you have path preference, though, it will not load-balance (obviously).
Furthermore, one thing we can look into is using "INET fallback routing" so that we don't have to rely on probing across our MPLS path!
Now, what if your prefix is PfR controlled BUT there is nothing in the PfR policy? For instance, 10.1.0.0/16 is in our site-prefix list, but there is nothing for DSCP 0? By default, it will use the routing table to determine the path to use. What if we don't want it to ONLY use the path in the routing table? Cisco's recommendation is to use "load-balancing" within the PfR policy! By doing so, PfR load-balances this traffic across BOTH paths and tries to use a variance of 20% between the paths. THIS CAN CAUSE ASYMMETRIC ROUTING....just an FYI..but yea, get over it!
EVEN if we have load-balancing configured---it will ONLY load-balance our non-performance "stuff." "Stuff" being things within our policy that do not have priority1, priority2, etc...like voice, for example. IF you have path preference, though, it will not load-balance (obviously).
Furthermore, one thing we can look into is using "INET fallback routing" so that we don't have to rely on probing across our MPLS path!
Once you have PfR connectivity, a few things worth checking:
show domain IWAN master policy
>This will verify that the spoke's have learned the policy that has been configured on the MC.
show domain IWAN master traffic-classes summary
>We expect to see a correlation between the DSCP values to the exit, matching the policy on the MC.
show domain IWAN master traffic-classes dscp <dscp value>
>We expect to see information about the exact DSCP value. This will tell us more information regarding the history, should we have issues we'll see the changing exits.
show domain IWAN master traffic-classes route-change <reason>
>This will give us a higher-level view of the PfR domain. If you see multiple traffic-classes with changes due to issue X..then it will give you a starting point in troubleshooting.
show domain IWAN master site-prefix
>This gives a great view of prefixes learned either dynamically or via the MC's site-prefix list. One thing worth noting is the "*10.0.0.0/8" entry with a site-id of 255.255.255.255. This is from the MC's enterprise prefix-list!
The last piece I'm going to discuss...per-tunnel QoS "stuff"
Ok, like I said, one can get LOST in the web of QoS that is involved in IWAN. The first thing we'll discuss is the per-tunnel QoS. What is the purpose? Well, imagine remote-site with a T1 (1.5Mbps) that has DMVPN connectivity to a hub site with a 100 Mbps MPLS connection. Is it possible that the hub could send traffic faster than the remote site's T1 can handle? Absolutely.
Per-tunnel QoS is simply a method of allowing the spoke to communicate with the hub to say "Hey, send traffic to me at rate X." In our scenario, we created multiple per-tunnel QoS policies, given the varying bandwidth allowances for the different POC locations. For example, a site with 50Mbps down/10Mbps up would subscribe to the 50 Mbps policy, as it could potentially receive 50Mbps from the hub.
When creating these policies on the hub, we do two things: Allocate bandwidth percentages & set dscp tunnel values. The first portion is so that we can guarantee bandwidth to the important classes (voice and video) while allowing a remaining percentage to our mission critical/bulk data. Secondly, we assign dscp tunnel values so that if/when the traffic gets to someone who CARES about DSCP markings (i.e. our MPLS provider)..that they treat the traffic according to the contracted SLA!
Aside from our per-tunnel QoS, we're also doing some shaping on the physical interfaces! The purpose of this is to avoid policing at the ISP edge.
Things to avoid...
no ip unreachables
I'm guility of this myself...its a habit, I get it. DO NOT configure this on our physical WAN interfaces...IT BREAKS PMTUD. Look into "ip icmp rate-limit unreachable"
no next-hop-self
This is phase 2 DMVPN. Phase 2 DMVPN has no place, honestly, in modern DMVPN implementations, as it lacks the ability/support of summarization. Mainly, phase 2 DMVPN is process switched until NBMA next-hop is determined...why put that CPU overhead in the mix?
Miscellaneous notes...
- If using multicast, set the spoke pim dr-priority to 0. Hell, set it to 0 just in case.
- NHRP no-unique....allows branch to overwrite itself.
- START with zone based firewall; may want DIA one day...
- Per Cisco, max of 10 PfRv3 interfaces
- Tunnel key used to differentiate AFTER encryption
- If we don't disable NHRP route-watch, we CANNOT use spoke-to-spoke communication. By disabling it, we're telling NHRP to ignore the check to see if there is a parent route. Furthermore, we tell PfR to TAKE control to validate the path with smart probes.
- "Future is that all devices are BRs AND MCs at branches---dedicated MC at hub for sake of sparing the CPU"
- CANNOT USE PORT-CHANNEL WITH ECMP----WEIGHT ONE LINK WITH separate link to another router with a separate L3 interface with a floating static route & source tunnel on loopback.
- Configure "BW ingress" on an interface so the numbers are correct in the show interface.
- QoS with port-channels......UGH.."Load-balancing vlan manual" global config.
- path-pref MPLS1 MPLS2 fallback INET -- MPLS1 MPLS2 = OR; if site has links to one or the other it will choose the one it has. If it has BOTH..it will try and load share across both.
- If you have a site with a data VLAN, you may find that it is not "load-balancing" traffic across both paths--WHY!? Because dscp 0, for example, with the single site vlan..we don't have enough granularity! The only traffic class available is pinned to the one path! If we want more granular load-sharing..break up that site /24 into 2x/28s...now we can load balance across both paths (should it require it for balancing purposes).
- How can we "trick" IWAN into controlling a public IP address (example Sales Force)? Well, by default, SalesForce's public IP address is..well, public! As a result PfR will say "LOAD BALANCE THAT BAD BOY!!!!!!!!!!!!!!!" If we don't want to load balance, we'll need to 1) "Trick" IWAN into controlling this by adding the public IP address into the MC site-prefix list. 2) We can create a policy that matches DSCP value of 0 and says PATHA fallback PATHB.
- Probing....
- Probing is to fill empty time between active traffic and ageout timer (5 minute default) -- We can modify this timer...but do we want constant traffic?
- While there is data traffic..probing is sent 1 packet every 1/3 monitor interval (default is 30 seconds)...we can lower this value for more critical applications---this is called "quick monitor."
We can configure ONE quick monitor interval...so 4 seconds we can assign multiple DSCP values..but no other intervals (not 1 second, not 2 seconds, etc). Just know that this increases the traffic to the MC, as the monitor interval is how often to collect the information to send to the MC to make decisions!
Wednesday, January 20, 2016
Expanding a VMware datastore!
We have 3 datastores (DS1, DS2, and DS3), that we wish to grow. In this scenario they are 1 TB in size each. What can we do if these datastores are reaching the configured capacity?
1. We can present new LUNs to the hosts, create new datastores, and storage vmotion the VMs off the old LUNs.
2. We can expand the existing datastores (what we did in this scenerio).
Firstly, we have to grow the LUNs that the VMFS file system (ATM machine, ha!) is stored on. This is pretty staight forward for pool LUNs..but a bit more complex for RAID group LUNs. Should you wish to use RAID group LUN expansion..you can stripe or concatenate two LUNs together.
Striping:
Pro-Performance benefits of any additional spindles are shared by all data.
Con-Re-striping takes time...delayed availability of additional space.
Concatenation:
Pro-SIMPLE; additional capacity immediately available.
Con-No performance benefit to existing data.
Lucky for us...we have pool LUNs..so it's as simple as expanding the LUN by right-clicking, choosing expand, and selecting the NEW LUN size.
Once we've expanded the LUN size...we have to tell VMware to use this new space!
Lets go back to the vCenter web client or the sphere client. We need to do a rescan on the storage devices.
Now that we've done a rescan, right-click on the datastore that we expanded the LUN on, and choose properties.
If we click on increase, we'll see the fibre channel disk with the ACTUAL capacity of the LUN, versus the capacity of the datastore.
If we click next, we'll see the VMFS partition size (the amount of space we sized the datastore) as well as the free space that was made available by expanding the LUN!
Once you complete the process, you'll need to do a rescan on the hosts to verify the expansion.
But thats it! Pretty simple.
Additional notes:
If you're having issues expanding your datastore...we found that disabling storage filters may help!
1. We can present new LUNs to the hosts, create new datastores, and storage vmotion the VMs off the old LUNs.
2. We can expand the existing datastores (what we did in this scenerio).
Firstly, we have to grow the LUNs that the VMFS file system (ATM machine, ha!) is stored on. This is pretty staight forward for pool LUNs..but a bit more complex for RAID group LUNs. Should you wish to use RAID group LUN expansion..you can stripe or concatenate two LUNs together.
Striping:
Pro-Performance benefits of any additional spindles are shared by all data.
Con-Re-striping takes time...delayed availability of additional space.
Concatenation:
Pro-SIMPLE; additional capacity immediately available.
Con-No performance benefit to existing data.
Lucky for us...we have pool LUNs..so it's as simple as expanding the LUN by right-clicking, choosing expand, and selecting the NEW LUN size.
Once we've expanded the LUN size...we have to tell VMware to use this new space!
Lets go back to the vCenter web client or the sphere client. We need to do a rescan on the storage devices.
Now that we've done a rescan, right-click on the datastore that we expanded the LUN on, and choose properties.
If we click on increase, we'll see the fibre channel disk with the ACTUAL capacity of the LUN, versus the capacity of the datastore.
If we click next, we'll see the VMFS partition size (the amount of space we sized the datastore) as well as the free space that was made available by expanding the LUN!
Once you complete the process, you'll need to do a rescan on the hosts to verify the expansion.
But thats it! Pretty simple.
Additional notes:
If you're having issues expanding your datastore...we found that disabling storage filters may help!
Subscribe to:
Posts (Atom)




