Tuesday, September 11, 2012

CCDP (Designing Cisco Network Service Architectures)

Chapter 1: Cisco SONA and the Cisco Enterprise Architecture


SONA: Service oriented network Architecture; framework that allows a business to build an intelligent enterprise network infrastructure (separating network architecture into 3 modules: network infrastructure, infrastructure services, and business applications.)

  • Network infrastructure: Layer where all IT services are interconnected across a converged network foundation.  Objective for this module is to provide anywhere, anytime connectivity. 
  • Interactive / Infrastructure services: Enables efficient allocation of resources to applications and business processes delivered through the network architecture.  (security, mobility, storage, ID, voice services)
    • Security services:  Increase the integrity of the network by protecting network services
    • ID services: Map resources to the user and device
    • Storage services: Provide distributed and virtual storage
    • Compute services: Connect and virtualize compute resources. 
  • Business Applications: Objective for customers in this layer is to meet business requirements and achieve efficiencies by leveraging the interactive services layer.  (Instant Messenger, Unified Messaging, etc)
Benefits of SONA:
  • Functionality: Supports the organizational requirements
  • Scalability: Built into layers, it facilitates growth
  • Availability: Available services anywhere, anytime
  • Performance: Provides per-application utilization through the network infrastructure
  • Management: Provides control, performance monitoring, and fault detection
  • Efficiency: Provides step-by-step network services growth.  

Using SONA we are able to build a more intelligent network infrastructure.  Using a modular approach...

The Hierarchical Model:

  • Modular view of the network, making it easier to design and build a deterministic scalable infrastructure. 
  • Enables flexibility in design & facilitates ease of implementation and troubleshooting
  • 3 layers make up the hierarchical model: Access, Distribution, & Core
    • Access: Used to grant user access to the network.  This may include end-user work stations, IP phones, serviers, WAPs, etc.  
    • Distribution: Aggregates the wiring closets, using switches to segment workgroups & helps to isolate issues.  Also, it helps segregate WAN connections at the edge of the campus and provides policy-based connectivity.  Policy-based control enables you to prioritize traffic from the access layer to the core.  
    • Core (or backbone): High speed, designed to switch packets as fast as possible.  Provides scalability and fast convergence.  
Cisco Enterprise Architecture:
  • Enterprise Campus: Provides for high availability through a resilient multi-layer design, redundant hardware, and software features.
  • Enterprise Edge: Offers connectivity to voice, video, and data services outside the enterprise.  QOS, service levels, and security are the main issues in this module. 
  • Enterprise WAN/MAN: This is part of the edge architecture.  This enables corporate staff to work efficiently wherever they are located.  Security is provided with multiservice VPNs over Layer 2 and Layer 3 WANS, hub-and spoke, or full mesh topologies.  
  • Enterprise Data Center: Supports the requirements for consolidation, business continuance, and security.  Provides staff with secure access to applications and resources.  This solution allows the enterprise to scale without major changes to the infrastructure.  
  • Enterprise Branch: Allows enterprises to extend head-office applications and services to remote locations and users.  
  • Enterprise Teleworker: Allows enterprise to securely deliver voice and data services to SOHO environments.  VPNs / Cisco IP phones provide cost-effective access to centralized IP communication systems.
PPDIOO: Cisco network life-cycle; Prepare-plan-design-implement-operate-optimize.  To design a network that meets customer needs, the organizational goals, organization constraints, technical goals, and technical constraints must be identified.


Prepare: Establish organizational requirements, high level conceptual architecture, establish financial justification.
Plan: Identifying network requirements based on goals, facilities, user needs, and so on.  Perform gap analysis to see what/how current site/facilities, infrastructure can support the new system.  Establish milestones & should align with the scope, cost, resource parameters proposed in the prepare phase 
Design: Comprehensive detailed design that incorporates specifications to support availability, reliability, security, scalability, and performance.  The design specifications are the basis for the implementation activities.
Implement: Build according to design with a goal of not interrupting operation or creating points of vulnerability.
Operate: Final test of the design.  Fault detection, correction, and performance monitoring provide data for the optimize phase.
Optimize: Involves proactive management of the network.  Main goal is to identify issues & resolve them before they affect the organization.  In the PPDIOO, the optimize phase may trigger another network redesign.:

There are many benefits to the life-cycle approach, but a few key points are:

  • Lower cost of ownership
    • Accelerating successful implementation
    • Improving the efficiency of your network and of the staff supporting it
  • Increased network availability
    • Improving staff skills
    • Staging and testing the proposed system before deployment
  • Improved business agility
    • Continually enhancing performance
    • Readying sites to support the system you want to implement
  • Speeds access to applications and services
    • Managing & resolving problems affecting your system & keeping software applications current
    • Improving the availability, reliability, and stability of the network and the applications running on it
Using the Design Methodology under PPIDOO (Prepare, Plan, Design)
Step 1: ID customer requirements.  Key decision makers ID initial requirements & based on these requirements a high-level conceptual architecture is proposed.  This is normally done under the PPIDOO prepare phase.
How do we ID the customer requirements?
Data gathering steps are:
1. ID applications & network services.  Also determine which applications may have constraints in the traffic flows.  
2. Define organizational goals
3. Define organizational constraints
4. Define technical goals
5. Define technical constraints
Note:  Data gathering is not uni-directional.  One may go back to previous steps if information has been missed.

Step 2: Characterize existing network & sites.  Perform gap analysis, network audits to determine if existing sites, facilities, and architecture can support the proposed system.  Network behavior is also analyzed.  This is typically done in the PPIDOO plan phase.
How do we characterize existing network & sites?
First, gather as much information as possible regarding the current network.  Reviewing existing documentation, network audits, and traffic analysis can provide key information.  With this information in hand, create a summary report that describes the health of the network.  With this, you can propose hardware and software upgrades to support the network & organizational requirements.

Step 3: Design the network topology and solutions.  Create the detailed design of the proposed system.  One may even created a pilot or prototype to verify the design.  You also write the design document.
How do we design the network topology?
There are various methods, but the top-down is useful in clarifying the design goals and initiates the design from the perspective of the required applications and network solutions.  By doing so, this divides the design task into related, less-complex components.  

Once the design is completed in the PPIDOO, the next step is to develop the implementation plan and migration plan in as much detail as possible.  



Chapter 2: Enterprise Campus Network Design

Cisco hierarchical network model enables the design of  high-availability modular typologies.  The modular approach makes the network easier to scale, troubleshoot, and understand.  This methodology also facilitates troubleshooting, problem isolation, and network management.  

Access Layer:  Entry point into the network for access devices (PCs, IP phones, etc.), aggregating end users and provides up links to the distribution layer.
Important features: High availability, Convergence, Security, QoS, and IP multicast.  
Distribution Layer:  Aggregates traffic from all nodes and uplinks from the access layer and provides policy-based connectivity.
Important features: Availability (through dull paths), load balancing, and QoS (through policy-based connectivity).  Also, summarization from access layer is done at the distribution layer.  
Core Layer:  The core devices implement scalable protocols and technologies, alternate paths, and load balancing.  The core layer helps in scalability during future growth.  
Important features: Scalability, high availability and fast convergence.
Note: The core layer can be collapsed with the distribution layer, but this requires a great deal of port density.

High-Availability Considerations
A focus on minimizing link and node failures and optimizing recovery time to minimize convergence & downtime.  

Implement optimal redundancy:  
Recommended design is redundant distribution layer switches and redundant connections to the core with layer 3 link between the distribution switches.  It's worth mentioning that redundant supervisor modules may cause longer convergence times than a finely tuned IGP.  Although, in a non-redundant topology, Cisco NSF & SSO can provide significant resiliency improvements.  
Provide alternate paths:
The recommended distribution layer design is redundant distribution layer switches with redundant connections to the core with a layer 3 link between the distribution switches.  While device redundancy helps, this design lacks what can be achieved with an additional link.
Avoid single points of failure:
Cisco NSF & SSO have the most impact in the access layer, as an access layer failure is a single point of failure that causes outages for the end devices connected to it.  
Cisco NSF with SSO:
SSO allows the standby RP (route processor) to take control of the device after a hardware or software fault on the active RP.
NSF works with the SSO to continue forwarding IP packets following an RP fail-over.  NSF is supported by EIGRP, OSPF, BGP, and IS-IS.  A router running these protocols will continue forwarding traffic once an internal switchover is detected.     
Routing protocols requirements for Cisco NSF:
The main purpose of NSF is to avoid routing interruptions, known as routing flaps.  Data traffic is forwarded while the standby RP assumes control.  While the control plane builds a new routing protocol database and restarts peering agreements, the data plane relies on pre-switchover forwarding-table synchronization to continue forwarding traffic.  After the routing protocols have converged, CEF updates the FIB (which is essentially a copy of the routing table) and removes stale route entries, and then it updates the line cards with refreshed FIB information.  A known issue with this is that black holes / transient routes may be introduced before the FIB is updated.  For all of this to take place, a devices's neighbor must be NSF aware (so that the neighbor devices can send state information to help rebuild the routing tables).   
Cisco IOS Software Modularity Architecture:
 Enables several Cisco IOS control plane subsystems to run in independent processes.  Cisco IOS software modularity boosts operations efficiency and minimizes downtime.  
Control Plane:  Logical interface that connects physical chassis components and software functions into a unified logical unit.
Data Plane:  This is where packet forwarding takes place.  it is the path that packets take through the routing system from the physical layer interface module to the modular services card to the switch fabric.  
Management Plane: Where control/configuration of the platform takes place.

Benefits of software modularity?
Operational consistency:  Does not change operational point of view.
Protected memory:  Enables a memory architecture where processes make use of a protected address space.  With this, memory corruption across process boundaries is nearly impossible.  
Fault containment:  Problems within one process cannot affect other parts of the system.  If a less-critical system process fails or is not operating as expected, critical functions required to maintain packet forwarding are not affected.
Process restartability:  Allows process restartability without having to perform a system restart.  

Designing an optimum design for layer 2

Layer 2 architectures rely on the following technologies: STP, trunking (ISL/802.1q), UDLD, and Etherchannel. 

Spanning-tree protocols (STP) toolkit
  • Portfast:  Configured at the access port; purpose is to skip the learning & listening phase and to begin forwarding.  Only use portfast when connecting a single end station to a layer 2 access port.  
  • UplinkFast:  Achieves load balancing between redundant layer 2 access ports.  This is applied on the  uplink ports from the access switches to the distribution switches.  
  • BackboneFast:  Initiated when a root port or blocked port on a network device receives inferior BPDU from its designated bridge.  
  • Loop Guard:  Prevents an alternate or root port from becoming designated in the absence of BPDUs.  For example, if an interface were to go down as a result of UDLD, loop guard would prevent another port from becoming designated.  This feature is applied on layer 2 ports between distribution switches and on the ports from distro to access devices.   
  • Root Guard:  Secures the root on a specific switch by preventing external switches from becoming roots.  This is applied on the uplink ports from the access switches to the distribution switches.  
  • BPDU Guard:  When configured on a port-fast port, it will disable the port in the event BPDUs are received on the port.  
  • UDLD:  Monitors the physical configuration of the line, checking for one-way connections.  When a unilateral link is detected, the port is shut down.  
Recommended Practices for Trunk Configuration


  • The current recommended BEST PRACTICE is to use IEEE 802.1Q trunks
  • Also, as a recommended BEST PRACTICE, when configuring switch-to-switch interconnections to carry multiple VLANS, set DTP to desirable / Desirable with encapsulation negotiation to support DTP negotiation.  While hard-coding it to on/on could save seconds of convergence time, DTP would not be actively monitoring the state of the trunk (this would make it difficult to ID a misconfigured trunk).
  • Alternative practice is to set one side of the link to auto and the other end to desirable.  This setting allows for automatic trunk formation.  
  • Prune unused VLANs from trunked interfaces to avoid broadcast propagation.  
  • Disable trunks on host ports.  
  • Layer 2 port modes:
    • Trunk-Permanent trunking mode
    • Desirable-Actively attempts to form a trunk
    • Auto-Port is willing to become a trunk if the neighboring port is configured for trunk/desirable
    • Access-Access mode specifies that the port never becomes a trunk.  
    • Nonnegotiate-Disables DTP on the port.  Can only become a port if the neighboring port is manually configured as a trunk.
Recommended practices for EtherChannel
  • Create channels containing up to 8 parallel links between switches.  
  • Two variants: PAgP (port aggregation protocol) & LACP (Link aggregation control protocol)
  • PAgP modes
    • On
    • Desirable
    • Auto
    • Off
  • LACP modes
    • On
    • Passive
    • Active
    • Off

Designing an optimum design for layer 3

To achieve high availability & fast convergence for the Cisco enterprise campus network, the main objectives should be:
  • Managing over subscription & bandwidth 
  • Supporting link load balancing
  • Routing protocol design
  • FHRPs
Managing oversubscription & bandwidth
Rule of thumb: 20:1 for access ports on access to distribution uplinks & 4:1 for the distribution to core links. If congestion is infrequent, then QoS is needed.  If congestion is frequent, the design does not have sufficient bandwidth.  As bandwidth from the distribution to the core increases, some design changes must be made.  While simply adding more uplinks is not a solution, etherchannels are useful by creating single logical interfaces.  If we were to simply add more physical links, it has the following affect on routing protocols:
  • OSPF: In the event of a failed link, traffic will be rerouted, and this design will lead to a convergence event.
  • EIGRP: While a link failure may not cause a reconvergance, lost links might also overload remaining links.  
The best, yet costly, solution would be to upgrade to 10 Gigabit interfaces.  These do not increase routing complexity & the number of routing peers are not increased  

Link load balancing
The recommended BEST PRACTICE is to use Layer 3 plus layer 4 load balancing to provide as much information as possible for input to the etherchannel members.    

Routing Protocol Design
  • Build redundant triangles, not squares, to avoid IGP reconvergance.  
  • Peer only on transit links.  By default, the distribution layer switches send routing updates and attempt to peer across uplinks from the access switches to remove distribution switches on every VLAN.  This is unnecessary and wastes CPU processing time.  The BEST PRACTICE is to configure the ports toward layer 2 access switches as passive, which will suppress the advertising of routing updates.  
  • Summarize at the distribution layer to advertise a single summary route to represent multiple IP networks within the building (switch block).  This will keep an individual distribution node from advertising loss of connectivity to a single VLAN or subnet.    
First-Hop Redundancy
First-hop redundancy or default-gateway redundancy is an important component in convergence in a highly available hierarchical network design.
Note:  A FHRP is only needed if the design implements a layer 2 between the access switch and the distribution switch.  
Methods:
  • HSRP (Hot-Standby Routing Protocol):  Recommended protocol over VRRP, as it is Cisco proprietary.  
  • VRRP (Virtual Redundancy Routing Protocol): Similar to HSRP; only difference is that preempt is auto-enabled.
  • GLBP (Gateway Load Balancing Protocol ): While backup routers through HSRP / VRRP remain idle while the active router is online, GLBP load balances gateways by using a virtual MAC address given to the endpoints when endpoints use ARP to learn the physical MAC address of their default routers.  By doing so, it allows a group of routers to act as one virtual router by sharing one VIP address while using multiple virtual MAC addresses for traffic forwarding.  Note: Active forwarding routers are called....Active Virtual Forwarders (AVF) and the secondary virtual forwarder is called a (SVF).  In the event of a failure, the SVF takes over traffic destined for the impacted virtual MAC.  BEST PRACTICE: In an environment where VLANS span multiple access switches, HSRP is the preferred method of FHRP, as GLBP can result in a two-hop path at layer to for upstream traffic in the event STP is blocking.
Note:  It is important to synchronize RSTP & HSRP as the RSTP root should be the same device as the HSRP primary.  If they are not synchronized after failure recovery, the inter connection between the distro switches can become a transit link, and traffic takes a multihop Layer 2 path to its default gateway.  It is recommended practice to measure the system boot time, and set the HSRP preempt delay to 50 percent greater than this value.  

Layer 2 to Layer 3 Boundary Design Models

  • Layer 2 Distribution Switch Interconnection:  Preferred method for if you have multiple VLANS spanning access switches.  This model uses a layer 2 link between the distribution layer switches.  It is preferred to configure the active HSRP devices as also the STP roots to avoid using the inter-distribution link for transit.  If possible, implement RPVST   to further reduce convergence times in the result of a link failure.  
  • Layer 3 Distribution Switch Interconnection (HSRP):  Time proven topology where NO VLANS span between access switches.  Here, the root for each STP instance is configured for the active HSRP instance.  BEST PRACTICE: This recommended design provides the highest availability.  With this design, a distro-to-distro link is required for route summarization.  A recommended practice is to map the layer 3 subnet to the layer 2 VLAN number (ie. 10.1.10.0 for VLAN 10 & 10.1.20.0 for VLAN 20).
  • Layer 3 Distribution Switch Interconnection (GLBP):  This method is actually less preferred, even though it can load balance, due to the less deterministic traits with regards to the random distribution of ARP responses.  Because VLANS do NOT span multiple access switches, STP convergence is not required for uplink failure and recovery.  
  • Layer 3 Access to Distribution Interconnection:  This method provides the fastest network convergence, as a properly tuned IGP can achieve better convergence results than designs that rely on STP.  Also, FHRP is not required with this method, as the access switch is the gateway for network devices.  The main reason this method is not widely implemented is due to the complexity introduced with IP addressing and subnetting and the loss of flexibility associated with this design. 
    • Design recommendations when using EIGRP for a routed access layer:  When tuned, it can achieve sub-200 ms convergence.  EIGRP in the access layer is similar to EIGRP in the branch, but it's optimized for fast convergence using these 3 rules:
      • Limit scope of queries to a single neighbor.  Also, configure all access switches to use EIGRP stub nodes, so to reduce queries to that node.
      • Control route propagation to access switches using distribution lists, as the access layer switch only needs a default route to the distribution layer switches.  
      • Configure hello / dead timers to 1 and 3 respectively to help speed up convergence (default is 5/15 for a LAN).  The purpose of this is to trigger convergence events and to protect against soft failures (physical links remain up, but hello and route processing has stopped).
    • Design recommendations when using OSPF for a routed access layer: Similar to EIGRP, when finely tuned, OSPF can experience convergence times of <200 ms.  With OSPF, summarization and limites to the diamerter of the LSA propagation is provided through implementation of layer 2 to layer 3 boundaries or Area Border Routers (ABR).  Design rules:
      • Control the number of routes and routers in each area.
      • Configure each distribution block as a seperate, total stubby area.  Do NOT extend area 0 to the access switch because the access layer is not used as a transit area in a campus environment.  With each access layer switch configured as its own separate totally stubby area, LSAs are isolated to each access layer switch, so that a link flap for another switch will not be communicated beyond the distro pairs.  
      • Similar to EIGRP, fine tune OSPF timers as a secondary mechanism to improve convergence.  

Potential Design Issues

  • Daisy Chaining Access Layer Switches: Run the risk of black holes.  With a layer 3 link between the distribution switches, and a layer 2 failure, we can experience 50% packet loss.  To prevent this, implement a layer 2 link between the distros, or provide alternate connectivity across the stack in the form of a loop-back cable running from the top to the bottom of the stack.
  • Cisco StackWise Technology in the Access Layer: With StackWise, we eliminate the danger that black holes occur in the access layer in the event of a link / node failure.  Also, it eliminates the need for a loop-back between the stacks when using a layer 3 link between the distros.  By using StackWise, we are essentially combining multiple hardware devices to create a logical device.  
  • Too Much Redundancy: Too much redundancy adds unnecessary complexity to design.  For example, with 3 distribution layer switches, who should be the STP root/HSRP active router/Which links should be blocking.  Also, it makes for difficult fault detection & isolation.  
  • Too Little Redundancy: It is generally accepted that a link between the distribution layer switches is REQUIRED for redundancy.  Also, possibility for blackholes are increased.  
Using the example above, we have too little redundancy.  With the link between Dist B & Access B being blocked, we will experience a black hole if we were to lose connectivity between Access A & Dist A, as shown below.  

  1. When the link failure occurs, Access A will begin sending traffic to Dist B, attempting to get to Dist A.  Because Dist B knows no path to Dist A (STP block & link failure), packets will be dropped.  The passive device (Dist B) then becomes the primary (as a result of no incoming HSRP hellos from dist A).  
  2. Once the indirect link failure is detected by Access B (max_age timer expires), Access B will remove the block to distro B.  This may take anywhere from 50 seconds to as little as 1 second, depending on if STP or RSTP is used.  
  3. With the down link, and connectivity between Dist A and Access A & B, HSRP will preempt on Dist A.  Because Dist A will be the active HSRP router and STP root, we will experience problems when passing packets for Access A.  In order to communicate with its STP root / active HSRP router, Access A will have to go through Dist B and Access B just to get to Dist A, making the link between Dist B and Access B a transit link.  
Note: Also, incoming traffic with a destination of Access A will be dropped 50% of the time until STP convergences between Dist B & Access B.  Once converged, 50% of traffic will need to go through Access B to get to Access A.  This may impact mission critical applications and services, as a result of link saturation.  The conclusion that can be derived from this scenario is that if VLANs traverse multiple access switches, we MUST use a layer 2 link between the distros.  

  • Asymmetric Routing (Unicast Flooding): Another possible issue when spanning VLANs across access switches.  When traffic arrives on a standby HSRP, VRRP, or alternate, nonforwarding GLBP peer, if the content-addressable memory (CAM) ages out before the ARP entry for the end node, the peer may need to flood traffic to all access layer switches and endpoints in a VLAN.  When an entry ages out on the CAM table, it sends out a frame to all respective ports within the VLAN.  Because most of the access switches will not have an entry for this MAC in their CAM tables, they will in turn broadcast this MAC to all interfaces in the same common VLAN.  This unicast flooding can have a significant performance impact on the connected end stations because they may receive a large amount of traffic that is not intended for them.   Prevention: Not spanning VLANs across access switches.  If this is not avoidable, it is necessary to configure ARP timers so that it is equal to or less than the CAM aging timer.  A shorter timer causes the standby HSRP peer to use ARP for the target IP address before the CAM entry timer expires and the MAC entry is removed.      

Supporting Infrastructure Services

This section focuses on security, IP phone telephony, & QoS

IP Telephony Considerations

High availability, redundancy, and fast convergence are essential for IP phone telephony.  While IP phones affect the entire enterprise network, they have the most impact at the network edge, or the access layer of the network.  The access layer supports device attachment and phone detection, inline power for devices, and QoS features such as classification, scheduling, and the trust boundary.  

The process of IP phone detection is as follow:
  1. Phone connected to access switch
  2. Switch detects phone, applies power, via PoE
  3. Device provides information, via CDP
  4. Device placed in proper VLAN
  5. DHCP request & Cisco Unified Communications Manager Registration
PoE Requirements: Two standards for providing power are 802.3af as well as CDP to negotiate power requirements (this is optimal, because it allows the switch to reserve only the power needed for the device).  With 802.3af, a PoE-capable port is called a PSE (power-sourcing equipment).    
Power Budget and Management: Switches mange power by what is allocated, not by what is currently used.  To best budget power, Cisco recommends using Cisco IPM (intelligent power management).  This allows a network and facilities manager to effectively and economically manage the power resources within a wiring closet and help PSE-cable switches meet the objectives of the network.  When planning, it is important to plan for a MAX theoretical power draw, as if there is insufficient power in a chassis, the power management system will deactivate line cards.    
Multi-VLAN Access Port: Multiservice switches support a new parameter for IP telephony support that makes the access port a multi-VLAN access port.  Via native VLANs configured on 802.1Q trunks, we are able to ID data services, while allowing voice services to be identified by an auxiliary VLAN.  These multi-VLAN access ports are not trunk ports, even though the hardware is set to the dot1q trunk.  

QoS Considerations

Rule of thumb: 20:1 for access links and 4:1 at the distribution-to-core links.  While most campus links are underutilized, it is important to implement QoS on links that are occasionally congested.  If a link is frequently congested, the design lacks sufficient bandwidth.


  • Recommended practices for QoS:  
    • Implement end-to-end to be effective
    • Ensures mission critical applications are not impacted by link congestion
    • Uses multiple queues with configurable criteria.  
    • Enforce QoS at aggregation and rate transition points 
  • Transmit Queue Congestion: Most common type of congestion is called transmit-queue starvation (Tx-queue starvation). This type of congestion effects both LAN & WAN networks.  With regards to WAN, a router has to make the rate transition from 10/100 ethernet to WAN speeds.  When this happens, a router must queue up packets to apply QoS.  Tx-queue starvation occurs when more incoming packets are queued than outgoing packets are transmitted.  With regards to the LAN infrastructure, packets are queued when transitioning from 10 Gb/s or 1 Gb/s at the distro level to the slower 10/100 Mb/s at the access layer.  
  • QoS Role in the Campus: QoS is necessary to prioritize traffic according to its relative importance, and to provide preferential treatment using congestion management techniques.  For example, it is important to prioritize voice and video so to optimize usage.  Although services like voice / video take top priority, it is important to implement QoS to provide an adequate level of services for all network traffic.  
  • Campus QoS Design Considerations:  Design is primarily concerned with classification, marking, and policing.  Queuing is enabled at any node that has the potential for congestion.  By creating multiple queues, you can guarantee voice quality, protection mission-critical data, and throttle abnormal sources.    

Cisco Catalyst Integrated Security Features

Port Security: Can be used to prevent MAC-based attacks.  By enabling port security, it allows the network engineer to configure a port to a MAC address.  Without port security, an attacker can send out floods of MAC addresses to a switch, overloading the CAM table.  Port security can also be enabled to limit a certain number of incoming MACs.  Once this limit has been reached, various events can be triggered, such as shutting down the port if the max limit were to be reached, or simply a SNMP trap alert.  
DHCP snooping:  Can be used to protect against rogue and malicious DHCP servers.  Without DHCP snooping, a device can be connected to the network to respond to DHCP requests.  This can cause a device to mistakenly use a malicious DHCP server as its default gateway, creating a man-in-the-middle type scenario.  By configuring DHCP snooping, it allows only authorized DHCP servers to respond to DHCP requests and to distribute network information to clients.   
Dynamic ARP inspection:  Can provide protection against ARP poisoning.  Because ARP has no form of authentication, it is simple for a malicious user to spoof addresses and to poison the ARP table of other hosts on the same VLAN.  By responding to other user ARP requests, the client attempting to spoof ARP request can lead to various man-in-the-middle attacks.  ARP inspection works by intercepting all ARP requests and replies on untrusted ports, then verified for IP-to-MAC bindings.  Denied ARP packets can then be either logged or dropped. 
IP Source Guard:  Helps mitigate IP spoofing.  Helps prevent a malicious host from attacking the network by hijacking its neighbor's IP address.  This security feature is usually deployed on untrusted ports within the access layer.

Chapter 3: Developing an Optimum Design for Layer 3

Designing Advanced IP Addressing

IP Address Planning as a Foundation
With a well-planned IP addressing scheme, networks may benefit from route summarization features inherent from many routing protocols.  Also, a well-planned IP addressing scheme is the foundation for greater efficiency in operating and maintaining the network.  By summarizing, we reduce a router workload, make the network more stable, and lead to faster network convergence.   

  • Summary Address Blocks:  Take the octets that are different, and determine if their difference falls between one of these ranges (128, 64, 32, 16).  Using the following example, 172.19.160.0 and 172.19.191.0, we would focus on the 3rd octet.  191-160 equals 31, which is less than 32 consecutive numbers.  Using 32, we subtract it from 256, giving us 224 (This will be our subnet mask).  With a subnet mask of 255.255.224.0, our summary address is 172.19.160.0.  Another method is to match all matching binary bits within the interesting octet.  Comparing 160 and 191 we would have the following:  10100000 and 10111111.  The only bits that match are the first 3, 10100000.  Using this, our summary IP will be 172.19.16.160.0 and our subnet mask would be the total number of matching bits which is 3 in this octet (255.255.224.0 or /19)
  • Changing IP Address Needs:  IP telephony / Layer 3 switching at the edge, NAC, and various corporate requirements are reasons to need additional subnets.  
  • Planning Addresses:  It is advantageous to build a pattern into role-based addressing and other addressing schemes. 

Applications of Summary Address Blocks: Summary address block addressing can be used to support several network applications, such as separate voice & data VLANs, bit splitting for route summarization, NAT, and addressing for VPN clients.  

  • Implementing Role-Based Addressing:  Using a class A address as an example, we could use the following template: 10.number_for_closet.VLAN.x/24.  Another approach, using a class B address we use the following: 172.xxxx xxxx.xxxx xxxx.xxhh hhhh.  If you don't need to indicated the closet number, simply use 16 for the second octet: 172.16.cccc cccR.RRhh hhhh.    
  • Bit Splitting for Route Summarization
  • Example: Bit Splitting for Area 1:  Using 172.16._ _ _ _ xxxx . xxhh hhhh, we would fill it in to look like this:  172.16.0001 xxxx . xxhh hhhh.  We would have a zone 1 range of 172.16.16.0 through 172.16.31.255.  Using the same logic, area 0 would have the range of 172.16.0.0 through 172.16.15.255.  Again, area 2 would be 172.16.32.0 through 172.16.47.255.  With a focus on area 1, there are 1024 possible subnets with this address & subnet mask of /26.  Our first 5 subnets for area 1 would then be 172.16.16.0/26, 172.16.16.64/26, 172.16.16.128/26, 172.16.16.196/26, and 172.16.17.0/26.  
  • Addressing for VPN Clients:  Arising need for different groupings of VPN clients.  
  • NAT in the Enterprise:  Potentially dangerous, as internal NAT can make troubleshooting confusing and difficult.  It is also recommended to isolate servers reached through content devices using source NAT or designation NAT.  
  • Nat with External Partner:  Can be used to convert all partner addresses on traffic into a range of locally assigned addresses.  NAT blocks would be created for different external partners.  Doing so supports faster internal routing convergence by keeping partner subnets out of the enterprise routing table.  

Designing Advanced Routing

Route Summarization and Advanced Routing:  Helps reduce load on the router and the perceived complexity of the network.  Importance increases as network size increases.  

  • Originating Default:  Useful for summarization in routing.  Recommended alternate is to configure each ISP-connected router with static default route and redistribute that into the dynamic routing protocol.  Using the OSPF default-information originate command, you can redistribute the default route into the dynamic routing protocol.  
  • Stub Areas and Default Route:  The point of using OSPF stub areas is to reduce the amount of routing information advertised into an area.  The information that is supressed is replaced by a default route.  Note:  Any filtering must be done at the ABR.  With EIGRP, the ip default-network [network number] can be used to configure the last-resort gateway or default route.  Filtering the unnecessary routes can save on bandwidth, router CPU, and increases the stability of the network.   

Route Filtering in the Network Design:  Can be used to manage traffic flows, avoid inappropriate transit traffic, and provide a defense against inaccurate routing updates.  

  • Inappropriate Transit Traffic:  External traffic passing through a network or site.  For instance, using the example below, we would not want to accidentally use the remote site as a transit.  With OSPF, there is little to no control over intra-area traffic.  With EIGRP, it is helpful to use stub areas.  With BGP, our main concern is when we have more than one ISP.  If not configured correctly, a site may potentially cause our site to become a transit network.  The Best approach to avoid this is to filter routes advertised outbound to the ISPs, and ensure that only the company or site prefixes are advertised outward.  Also, it is important to filter all routes incoming from the ISP, so to only accept routes that the ISP should be sending you. 
  • Defensive Filtering:  Protects the network from disruptions due to incorrect advertisements of others.  For example, you would not accept routing updates about how to get to your own prefixes or about default routing.  The approach of hiding or route advertisements is called route hiding or route starvation.  Packet filtering ACLs should aslo be used to supplement security by route starvation.  

Designing Redistribution:  Should be used with planning and some degree of caution, as it is VERY easy to create routing loops with redistribution.  It is also accepted that it is much better to have distinct pockets of routing protocols and redistribute than to have a random mix of routers and routing protocols with ad hoc redistribution.  

  • Filtered Redistribution:  When using bi-directional redistribution NEVER re-advertise information back into the routing protocol region or autonomous system that it originally came from.  For example, routes redistributed from EIGRP into OSPF, should NOT be redistributed back into EIGRP (manual split-horizon).  By tagging routes, the tag information is passed along in routing updates.  By looking for tagged route matching certain parameters, a device can keep tagged routes from being re-redistributed.  
  • Migrating Between Routing Protocols:  Instead of redistributing, migrating by AD runs both routing protocols simultaneously.  The routing protocol with the lower AD is the preferred protocol.  

Designing Scalable EIGRP Designs

Scalable EIGRP Designs:  Tolerant of arbitrary topologies for small / medium networks.  As the scale increases, so does instability.  It is wise to use a structured hierarchical topology with route summarization.  The main issue with EIGRP is its EIGRP queries.  When a feasible successor is not present, its floods the network with queries.    

  • EIGRP Fast Convergence:  More summarization!!!  Speeds up CPU operations, shrinks entries in the routing table, and speeds convergence.  Also, it is generally unwise to have a large number of EIGRP peers.  
  • EIGRP Fast-Convergence mode:  Without a feasible successor, convergence time increase proportionally to the number of devices on the network.  The recommended EIGRP minimum timer for hello and dead timers are 2 and 6 respectively.  Note: No such thing as subsecond settings.  

Scaling EIGRP with Multiple Autonomous Systems:  A route that is installed into the EIGRP topology database first gets placed into the routing table.

  • Filtering EIGRP Redistribution with Route Tags:  Outbound route tags can be used to filter redistribution and support EIGRP scaling with multiple EIGRP AS.  By matching the outgoing tags to incoming tags, one can prevent redistributing back into an AS.  
  • Filtering EIGRP Routing Updates with Inbound Route Tags:  Similar to using route tags to stop redistributed routes, distribution lists can be used to stop unwanted routing updates.   

Reasons for Multiple EIGRP Autonomous Systems:  Migrating strategy after a merger or acquisition, Different groups administer the different EIGRP AS, and multiple EIGRP ASs may be used to divide a large company's network.  

Designing Scalable OSPF Design

Factors Influencing OSPF Scalability:  Scaling is determined by the utilization of three router resources:  Memory, CPU, and interface bandwidth.  The workload that OSPF imposes on a router depends on these factors:  

  • Number of adjacent neighbors for any one router:  OSPF floods all link-state changes to all routers in an area.  In general, any one router should have no more than 60 neighbors.  
  • Number of adjacent routers in an area:  The larger and more unstable the area, the greater the likelihood for performance problems associated with routing protocol recalculation
  • Number of areas supported by any one router:  Because a router must run the link state algorithm for each area in which the router resides, a single router should not be in more than three areas.  
  • DR selection:  Select routers that are no heavily loaded with CPU intensive activities.  

Number of Adjacent Neighbors and DRs:  Each OSPF adjacency means resources expelled to support: Exchanging hellos, Synchronizing link-state databases, Reliably flooding LSA changes, and advertising the router and network LSA.  On LANs, choose the most robust, or lightly used routers as DR candidates.  

Routing Information in the Area and Domain:  Use of stub / totally stubby areas are useful in reducing the workload of an OSPF router, as these areas import less information.  Area size and layout design, area types, route types, redistribution, and summarization all affect the size of the LSA database in an area.  

Designing Areas:  Geographic and functional boundaries should be considered in determining OSPF area placement.  Make it simple, Make non-backbone areas stub areas (or totally stubby), and make it summarized.

Area Size: How many routers in an Area?  It is a good idea to keep the OSPF router LSAs under the IP max transmission unit (MTU) size.  As a general rule, each area, including the backbone, should contain no more than 50 routers.  Also, if link quality is high and the number of routers is small, the number of routers may be increased.  

OSPF Hierarchy:  ABRs provide opportunities to support route summarization or create stub or totally stub areas.  A structured IP addressing scheme needs to align with the areas for effective route summarization.  Best practice is to assign a separate network number for each area.  

Area and Design Summarization:  Summarization should be supported into and out of areas at the ABR or ABSBR.  It is important to keep the following in mind:  Configure a addressing scheme so that the range of subnets assigned within an area is contiguous, create an address space that can be split as the network grows, and to plan ahead for the addition of new routers into the OSPF environment.    

OSPF Hub-and-Spoke Design:  With hub-and-spoke design, any changes at one spoke site is passed up the link to the area hub and then replicated to other spoke sites.  To prevent unnecessary network traffic, the various stub area types are useful if not necessary to minimize the amount of information within an area.  Also, it is important to note that each spoke requires a separate interface on the hub router.  

Number of Areas in an OSPF Hub-and-Spoke Design:  As the number of sites go up, you have to start breaking the network into multiple areas.  

Issues with Hub-and-Spoke Design:  Worst known issue is low-speed links and large number of spoke sites.  It is important to balance the number of areas, the router impact of maintaining an LSA database and doing the Dijkstra calculations per area, and the number of remote routers in each area.  

OSPF Hub-and-Spoke Network Types:  
  • Point-to-Point:  No DR/BDR creation, separate subnets, and considered more stable, yet difficult to implement.  Because each point-to-point connection uses a separate subnet, more address space gets used up.  
  • Point-to-Multipoint:  Longer hello/dead timers (slower to converge after link failure), simple to configure, and conserves IP space.  
  • Broadcast / NBMA:  (Default) Best to be avoided, as they lead to less stable networks where certain failure modes have odd consequences.  

OSPF Area Border Connection Behavior:  Essentially, after traffic leaves area 0, traffic cannot return through area 0.  

Using the above example, once the link from router D - F goes down, traffic from router A cannot take the path A-D-E-F.  A solution for this problem is to either implement a virtual link between D-E (not optimal) or to add a link between D-E that is in area 1.  

OSPF Area Filtering:  It is necessary for the LSDB to be identical for every router in an area.  If they do not match, OSPF has the possibility of routing loops.  One consequence is that any area filtering must be done at the ABR.  The two types of OSPF filtering are:  Border area filtering done with the area range command, and via interarea filtering with the use of prefix lists.  

Application of Interarea Filtering:  For example, you for some reason do not wish any type 3 LSAs into area 2.  By applying the area 0 filter-list prefix AREA_0_OUT out & area 0 filter-list prefix AREA_0_IN in, we can use filter-lists to block the propagation of LSAs.  The reduction in routing information makes a more stable and faster-converging OSPF area.

Full-Mesh Topology and Mesh Group:  Complex and does not scale well.  Can be helped with mesh groups, but still stupid.  

OSPF Flooding Reduction:  Helpful when LSA flooding is having too great an impact on CPU or bandwidth.  The benefit of SPF Flooding Reduction is that is eliminates the periodic refresh of unchanged LSAs.  Note:  OSPF Flooding Reduction fixes symptoms, rather than the underlying problem.  

Fast Convergence in OSPF

  • Fast Convergence with Fast Hellos:  Most helpful in scenarios with a moderate number of neighbors.  However, a good OSPF design limits the number of adjacencies.     
  • Fast Convergence with SPF:  Lab testing suggests that the SPF calculation is the biggest remaining source of delay in OSPF convergence.  As expected, SPF calculation time increases for additional nodes.  Partial SPF is much faster than full SPF.  
  • Overview of OSPF incremental SPF:  iSPF provides more rapid SPF computations.  The iSPF computation uses a modified Dijkstra algorithm to recocompute only the part of the path tree that has changed.  In doing so, only a portion of the tree rather than the entire tree is ran, saving CPU resources.  
Bidirectional Forwarding Detection
Helps speed up routing convergence.  A significant factor in routing convergence is the detection of link or node failure.  BFD uses fast Layer 2 link hellos to detect failed or one-way links, which is generally what fast hellos detect.  BFD provides a method for network administrators to configure subsecond Layer 2 failure detection between adjacent network nodes.  Also, administrators can configure their routing protocols to respond to BFD notifications and begin Layer 3 route convergence almost immediately.     

Designing Scalable BGP Designs

Scalable BGP Designs:  Provides a controlled interconnection between multiple routing domains.  Also, for IBGP, a full mesh of IBGP routers is needed because IBGP routers do not re-advertise routes learned via IBGP to other IBGP peers.  This behavior is used to prevent information from circulating between IBGP speaking routers in a routed information loop or cycle.  Unlike IBGP, EBGP relies on the AS_PATH to prevent loops.  

Full-Mesh IBGP Scalability:  Because IBGP requires a full mesh of peers, scalability is a large concern.  This means that each peer would need the CPU, memory, and bandwidth to handle update and peer status for all other routers;  This is not cost-effective to scale for large networks.  Alternatives: Route Reflectors & Confederations.

Scalable IBGP with Route Reflectors:  BGP route reflector that reflects or repeats routes learned from IBGP peers to some of its other IBGP peers.  To prevent loops, an originator ID and a cluster list is added to routes that it reflects between IBGP speakers.  Unlike Confederations, Route Reflectors are relatively easy to implement.  Also, to avoid a single point of failure, redundant route reflectors are typically used.  

BGP Route Reflector Definitions:  

  • Cluster:  Route reflector together with its clients.  The route reflector relieves the route reflector client routers of needing to be interconnected via the IBGP full mesh. 
  • Nonclient router:  is any route reflector IBGP peer that is not a route reflector client of that route reflector.  Note:  Route reflectors are typically nonclients with regards to the other route reflectors in the network.   

Route Reflector Basics:  

  • If received from a EBGP peer:  Passes that route to the other clients within the cluster, and non-clients.  
  • If received from a non-client:  Passes that route to route reflector clients, but not to other non-clients.
  • If received from a route reflector client:  Reflects the route to the other clients within the cluster, and non-clients.  

Scaling IBGP with Confederations:  Basic idea is a division of a normal BP AS into multiple sub-ASs.  The outer is called the confederation AS (This is what is visible to the outside world).  Each of the inner AS is a smaller sub-autonomous system that uses a different AS, typically chosen from the private AS number range.  

BGP Confederation Definitions

  • Peers within the same sub-autonomous system are confederation internal peers.  
  • IBGP peers that are in different sub-ASs are confederation external peers.  
Confederation Basics:  

  • A route learned from EBGP peer is advertised to all confederation external and internal peers.
  • A route learned from a confederation internal peer is advertised to all confederation external peers, and also to EBGP peers.
  • A route learned from a confederation external peer is advertised to all confederation internal peers, and to EBGP peers.  

Confederations Reduce Meshing
While confederations and route reflectors reduce the amount of IBGP meshing needed, confederations can reduce meshing requirements.  Routers in different sub-autonomous systems do not peer with each other, except at sub AS borders.  It is recommended to use two or there links between sub AS borders.  When you use sub-AS for confederations, the meshing is restricted to within the sub-AS, with some additional peering between sub AS border routers.
Note:  It is even possible to use route reflectors WITHIN confederations

Deploying Confederations:  Using confederation sub-AS systems has other advantages.  The IBGP policies can differ internally within and between the sub AS.  in particular, MED acceptance and stripping, local preference, route dampening, etc can vary between sub AS.  In other words, the greatest advantage of confederations are their ease in transitioning, say, in the hypothetical acquisition or merger. 

Chapter 4: Advanced WAN Services Design Considerations

Optical technologies to consider: Synchronous Optical Network (SONET), Synchronous Digital Hierarchy (SDH), Coarse Wavelength Divisional Multiplexing (CWDM), and Dense Wavelength Divisional Multiplexing (DWDM), Route Processor Redundancy (RPR), and Metro Ethernet. 

Advanced WAN Services Layers

Essentially, there are various benefits for delivering WAN services over ethernet to customers that are using Ethernet user-network interfaces (UNI), such as: Customers can use their existing equipment, familiar protocols are implemented, higher bandwidth is available than traditional WAN links, lower bits-per-second costs can be supported, and the underlying optical technologies allow the service provider to provide these services on their existing fiber infrastructure.  

Enterprise Optical Interconnections

SDH:  While SONET is the North American high-speed baseband standard, SDH is the European standard for digital optical links.
DWDM & CWDM:  Increase info-carrying capacity of existing fiber links by transmitting & receiving data on different light wavelengths on a single fiber strand.  
Dynamic Packet Transport (DTP) and Resilient Packet Ring (RPR): Designed for service providers to deliver scalable Internet services, reliable IP-aware optical transport, and simplified network operations, priincipally for metro-area applications.
  • Overview of SONET and SDH:  TDM technique for framing voice / data on a single fiber strand.  SONET can provide reliable transport with TDM bandwidth guarantees for TDM voice and public-safety voice and radio traffic.  While transmission distance depends on the quality of fiber, SONET can transmit 50 miles or more on single mode, and 500 meters or more on multi-mode fiber.  While both a drawback and a benefit, SONET uses dual rings to protect the bandwidth.    
  • Enterprise view of SONET:  From the customer's view, SONET is the transport underlying some other form of connection (T1 or T3) or it may be one of the various types of Ethernet services offered by a single service provider.  Questions to ask your ISP that is offering SONET:
    • Is the service based on end-to-end SONET rings?  (Need to consider if there are single points of failure)
    • What path does your service follow?  (If this is for redundancy, we want to make sure the path differs from our primary, as there would be little redundancy if the path is the same as the primary)
    • Is there oversubscription and sharing, or is the bandwidth dedicated?
  • WDM Overview:  Different than TDM, uses a multiplexer (mux) at the transmitter to place multiple optical signals on a fiber, and a demultiplexer (demux) at the receive to split them off of the fiber.  The signal uses different wavelengths.  It is important to note that before being multiplexed, source signals might be converted from electrical to optical format, or from optical format to electrical format and back to optical format.  
  • CWDM Technical Overview:  Optical technology for transmitting up to 16 channels.  By doing so, it enables enterprises and ISPs to increase their bandwidth of an existing gig ethernet optical infrastructure without adding new fiber strands.  Unlike DWDM (which can support up to 160 channels), CWDM is easier to relatively inexpensive to implement.  CWDM multiplexing is accomplished by using glass devices known as filters, which direct light from many incoming and outgoing fiber to a common transmit and receive trunk port.  The main disadvantage with CWDM is that it is not compatible with erbium-doped fiber amplifier (EDFA) technology (technology that amplifies the light transmission, making repeaters obsolete).    
  • DWDM Technical Overview:  Similar to CWDM, however, DWDM spaces the wavelengths more tightly, yeilding up to 160 channels.  The tighter channel spacing in DWDm requires more sophisticated, precise, and therefore more expensive transceiver designs.  Also, because DWDM supports EDFA, the trasmission distance is greater than with CWDM.  Typically, DWDM is used between sites and data centers. 
  • DWDM Systems:  DWDM typically uses a transponder, mux/demux, and an amplifier:
    • Transponder:  Receives the input optical signal, converts the signal into the electrical domain, and retransmits the signal using a 1550-nm band laser.
    • Multiplexer:  Takes the laser and places them into SM fiber.  An OADM (which is a passive CWDM mux / demux) extracts a channel of signal, and inserts an outgoing signal from a site.
    • Amplifier:  Provides power amplification of the multiwavelength optical signal.
The Primary challenge with mux/demux is to minimize crosstalk and maximize channel seperation so that the system can distinguish each wavelength.

RPR Overview

Layer 2 architecture providing packet-based transmission based on a dual counter-rotating ring topology.  
  • RPR in the Enterprise
RPR overcomes some limitations of SONET / SDH.  Because SONE/SDH is designed to support the characteristics of voice traffic, SONET and SDH are limited in their ability to efficiently carry bursty data traffic.  Voice traffic typically has consistent  well-characterized usage patterns, but data traffic bursts as large files are transferred.  RPR efficiently supports traffic on a service provider network because RPR can take advantage of QoS and CoS features of data traffic.  Also, RPR is based on a statistical multiplexing approach  that behaves more like Ethernet  and does not provide TDM-style bandwidth guarantees.  

Metro Ethernet Overview

Flexible transport architecture that uses some combination of optical, Ethernet, and IP technologies in the metro area.  
  • Metro Ethernet Service Model:  Metro Ethernet leverages a service provider multiservice core.  Metro Ethernet is a large market for the SP, because there is an opportunity to provide services to customers with millions of exhisting Ethernet interfaces.  
  • Metro Ethernet Architecture:  The SP may use SONET/SDH or point-to-point links, WDM, or RPR for its Metro Ethernet Architecture.  Edge aggregation devices or user provider edge (UPE) devices may multiplex multiple customers into one optical circuit to the network provider edge (NPE) device.  NPE devices connect to core provider (P) devices.  An actual implementation for the Metro Ethernet MAN service may be based on one of the following:
    • Pure ethernet MAN uses only layer 2 switching for its internal structure.  STP is used to keep the network loop free.
    • SONET/SDH based ethernet MAN is used as an intermediate step in the transition to a modern statistical network such as Ethernet.  
    • An MPLS based Metro Ethernet uses layer 2 MPLS VPNs in the provider network (P).  
Each of the following approaches allow for different oversubscription characteristics.  

Metro Ethernet LAN Services

Various Metro Ethernet Forum (MEF) types: 
Ethernet Private Line Service (EPL):  Maps layer 2 traffic directly onto a TDM circuit.
Ethernet Relay Service (ERS):  Point-to-point VLAN-based service to create point-to-point connections between customer routers.  
Ethernet Wire Service (EWS):  Point-to-point service, primarily used to link remote LANs over P-networks.
Ethernet Multipoint Service (EMS):  Multipoint-to-multipoint port-based emulated LAN (ELAN) service that is used for transparent LAN applications.
Ethernet Relay Multipoint Service (ERMS):  Multipoint-to-multipoint VLAN-based ELAN service that is used for establishing multipoint-to-multipoint connection between customer routers.  

Metro Ethernet Services are characterized by the UNI and Ethernet Virtual Circuit (EVC) attributes.  When implementing service provider Ethernet services, customers must decide whether they want to outsource routing to the service provider, or do their own routing.  Outsourced, or routing in cooperation with the SP, is typically done with Layer 3 MPLS.

  • Ethernet Private Line Service:  EPL typically uses SONET/SDH transport.  SONET protection can provide high availability for EPL services.  EPL is ideal for transparent LAN interconnection and data service intergration, for which wire-speed performance and VLAN transparency are important.  EPL services are usually used for mission-critical links. 
  • Ethernet Relay Service:  ERS is a point-to-point VLAN based E-line service that supports service multiplexing.  Service multiplexing means that many connections can be provided over one link.  Similar to frame-relay, the multiplexed UNI supports point-to-point or point-to-multipoint connections between two or more specified sites.  Instead of using DLCI as the identifier, the connection identifier is a VLAN tag.  Service multiplexing provides scalability for large sites.  ERS is ideal for interconnecting routers in an enterprise network, and for connecting to ISP and other service providers for direct internet access.  
  • Ethernet Wire Service:  Point-to-Point connection between a pair of sites.  Differs from EPLS in that it is typically provided over a shared, switched infrastructure within the SP network that can be shared among customers.  The benefit of EWS to the customer is that it is typically offered with a wider choice of committed bandwidth levels up to wire speed.  To segregate each subscriber's traffic, the SP applies VLAN tagging on each EVC, using queue-in-queue (QinQ) tunneling.  
  • Ethernet Multipoint Service:  Multipoint-to-multipoint service provided over a shared, switched infrastructure.  EMS is a multipoint version of EWS.  Also, with EMS, the P-network acts like a virtual switch for the customer, providing the ability to connect multiple customer sites and allow for any-to-any communication.  The enabling technology is Virtual Private Line Service (VPLS), implemented at the NPE.  EMS is typically used for multipoint LAN extension, LAN extension over the WAN, and disaster recovery.  
  • Ethernet Relay Multipoint Service:  Hybrid of EMS and ERS.  Provides any-to-any connectivity characteristics of EMS, and the service multiplexing of ERS.  ERMS can be used for many applications, including branch layer 2 VPNS, Layer 3 VPNs for intranet and extranet access, Internet access through the ISP, and disaster recovery.  
  • End-to-End QoS:  SP can use 802.1Q tunneling to support end-to-end QoS.  The CE device adds an 802.1Q to all frames ands upports the CoS across the network.  The UPE devices add a second 802.1Q frame to support QinQ encapsulation of the customer traffic.  The outer 802.1Q tag added by the UPE acts as a customer ID.  Switches and other devices in the SP backbone transport the encapsulated ethernet frames based on the outer tag and TOS.  The outer .1Q tag is stripped off when the frame reaches the destination or destinations indicated in the outer tag.  At the remote UPE, the ethernet frame is transparently forwarded based on the original CE 802.1Q tag with the original CoS.  The destination MAC is preserved end to end, so multicast traffic will be seen by the provider network as having multicast destination MAC address. With QinQ encapsulation, the customer VLAns can be preserved across the network, and the network supports VLAN transparency.  

VPLS Overview

Multipoint architecture that connects two or more customer devices using Ethernet bridging techniques over an MPLS network.  in VPLS, the P-network emulates an ethernet bridge, with each EMS being analogous to a VLAN.  

VPLS Architecture Model

In the VPLS architecture model, UPE devices act as standard bridges or switches.  The devices are interconnected in a full mesh of pseudowire (PW).  From the POV of the UPEs, these PWs are just ethernet connections to another switch.  VPLS will self-learn source MAC address to port association, and frames are forwarded based on the destination MAC address.  If the destination address is unknown, or is a broadcast or multicast address, the frame is flooded to all ports associated with the virtual bridge.  To simplify processing, the VPLS core does not use STP, instead, it uses split-horizon forwarding.   

VPLS in the Enterprise

VPLS is used as an enterprise WAN connectivity service.  VPLS looks like an ethernet switch to the customer with the same inherent layer 2 core issues.  
    • VPLS based on MPLS scales better than with STP
    • Important to ask what happens to traffic in the event of an outage
    • Multicast / Broadcast control
    • Design VPLS in a hierarchical fashion, so to be more scalable.  
    • Determine what the result would be of a spanning-tree loop, as all customers share bandwidth. 
    • Ensure provider is aware of and has implemented adequate layer 2 security measures.

  • Hierarchical VPLS Overview:  H-VPLS provides scaling by only interconnecting the core MPLS NPE routers with a full mesh of PWs.  The main advantage of H-VPLS appraoch for the SP is that the core of the network is an MPLS network.  The MPLS core also serves to limit any edge STP domains, speeding up STP convergence and reducing any potential instability.  H-VPLS provides an extremely flexible architectural model that also enables multipoint ethernet services (VPLS), and ethernet point-to-point layer 2 VPN services and ethernet access to layer 3 VPN services.   
  • Scaling VPLS:  Service provider VPLS design must address 3 major scaling factors:
    • Scaling of the full mesh of PWs between PE devices:  As the number of PE devices scale, each edge device must form an adjacency with all other PE devices, meaning the edge devices must hae the IP addresses of all remote PEs in its routing table.  H-VPLS helps address this issue by using UPE devices to spread the edge workload across multiple devices.
    • Frame replication and forwarding:  H-VPLS needs a lower number of PWs, because only the NPE devices are connected in a full mesh.  This helps reduce the burden on the core for frame replication and forwarding.
    • MAC address table size:  H-VPLS allows MAC tables to be spread across multiple inexpensive devices to scale the edge.  Using MPLS in the core removes the MAC learning requirement from the P devices. 
  • QoS Issues with EMS or VPLS:  While QoS is relatively easy to implement with point-to-point links, multipoint networks are harder because it requires coordination between multiple devices with unpredictable and rapidly changing traffic patterns.  If a customer wants QoS, the SP has to partner to provide it.  Customers should expect that QoS is a premium service, and determine bandwidth levels for each QoS class to help manage.
  • EMS or VPLS and Routing Implications:  If some sites experience greater packet loss levls, OSPF processign may consume more router CPU.  In the extreme case, packet loss or delay might cause insignificant levels of OSPF instability.  It is best practice to limit the number of adjacencies for both OSPF as well as EIGRP.
  • VPLS and IP Multicast:  In a campus switched network, IP multicast in a VLAN floods to all ports in the VLAN, unless IGMP snooping is in use.  However, IGMP snooping is not an option with VPLS or EMS.  When a broadcast or multicast is sent into the VPLS cloud, the frame is sent out to every CE device that has a port accocidated with the virtual switch.  To avoid wasted bandwidth and router CPU utilization spent discarding unnecessary traffic, it is necessary to use administratrive scoping to limit the propagation of multicast packets.  Otherwise, sufficicient bandwidth for unnecessary multicast traffic at the edge links will be required.
  • VPLS Availability:  Because of the underlying technology in VPLS is the psuedowires (PW), if there is a failure in the P-network, traffic will automatically be routed along available backup paths in the P-network.  Unfortunately, this may result in unicast flooding.  An increase in unicast flooding may impact customer traffic.
MPLS VPN Overview
MPLS VPN services are based on MPLS label paths that are automatically formed based on IP routing.  MPLS VPNs experience the same level of stability as exhibited by layer 3 networks in general.  The characteristics of the MPLS VPNs vary depending on whether they are implementing at layer 3 or layer 2.
    • Layer 3:  Layer 3 MPLS VPNs only forward IP packets.  CE routers become peers with the MPLS VPN provider routers.  In this scenario, routing may be a cooperative venture.  Layer 3 VPNs can support any access or backbone technology.  Service providers can use layer 3 VPNs as a foundation to provide advanced WAN services.  
    • Layer 2:  Layer 2 MPLS VPNs can forward any network protocol based on layer 2 frames.  With layer 2 connectivity, there is no peering with the service provider.  MPLS layer 2 VPNs provide point-to-point service where the access technology is determined by the VPN type.  
When deciding whether to choose layer 2 vs layer 3, it will depend on how much control the enterprise wishes to retain.  If wishing to offload routing responsibilities to the SP, a layer 3 may be the best choice.  If working with a robust IT staff / team, retaining responsibility can be done with a layer 2 design, as they can maintain control of their layer 3 policies.  
  • Customer Considerations with MPLS VPNS
    • Who does the routing?  Two main options are to use either simple static routing, or dynamic routing with the service provider.  
    • Who manages the CE devices?  Depending on the size and routing experience of the customer, they may choose to manage their own CE devices, or buy managed services from their provider. 
    • Should one or two MPLS VPN providers be used?  While redundancy is desirable, two providers can add complexity to the design.  FHRP is necessary if multi-homed with the SP.
    • Is QoS needed?  Using layer 3 VPNs allow for QoS internally
    • Is IP multicast supported?  Multicast is supported over Layer 3 MPLS VPNs.  Multicast over Layer 2 MPLS VPNs requires service provider support and may cost extra.  
  • Routing Considerations: Backdoor Routes:  It is important to consider backdoor routes, as internal routes (even if slower) will be preferred over the external routes.  In general, sites with one WAN router do not have this problem.  It is a recommended practice to minimize locations where you implement route redistribution, as doing so at many locations can adversely impact network stability.  
  • Routing Considerations: Managed Router Combined with Internal Routing:  Adds to the complexity and cost of the network.  
  • Routing Considerations: Managed Router From Two Service Providers:  Important to note the FHRP issues may arise, but it can be challenging to get your two providers to cooperate with each other to provide the FHRP services.  It is important to negotiate this at contract time.  It is easier to get providers to cooperate before contracts than after both contracts have been signed.

Implementing Advanced WAN Services

  • Advanced WAN Service Selection:  Simply put, do not rely on what information the sales team pitches.  Assume they will not cover the weak or problem areas.  It is up to the customer to ask good questions and do the research.  You need to make your decisions based on the underlying technologies, the reputation of the vendor, customer reference, trade literature, etc.  One advised plan is to use two providers, letting you experience their technical and customer service levels.  If one is inadequate, it is easier to migrate to the other provider.  Finally, it is important to emphasis SLA, with good escape clauses in the contract should the SLA not be met.  
  • Business Risk Assessment:  It is considered a great part of design to assess business risk and base design decisions on how critical the service will be.  Due diligence is required to assess the change of an outage, through questioning, surveys, and site assessments.  Such questions may include:  What sort of expert staffing, what sort of lower-level staff, how neat is cabling, how the devices and cables located, how old is the equipment, and how big is the provider?
         It's worth mentioning that gaining shallow level information can be challenging, and that patience is      required to get the desired information.  In general, you should match risk to the purpose of the design (non-critical internal Internet traffic to a low-cost but riskier provider, and more critical by a more established internet provider).  
  • WAN Features and Requirements:  Obviously, it is important to make sure all parties agree as to what the requirements are for the new WAN service being purchased.  Important questions to ask:
    • Will routing work as it does currently, or will changes be required?
    • Will existing VLANs be transparently supported across the SP core?
    • What QoS levels can be supported?
    • Multicast available?  At what cost?
    • What level of security is provided to isolate customer traffic?  At what cost?
    • Management services (SNMP, IDS, etc)?
Also, it is important to clarify which WAN features are standard, and which are optional upgrades.  In addition, it is necessary to clarify certain details:
    • What base services does the provider implement, such as ping testing, SNMP polling, and other functions?
    • What is the process for change, cost, and time to implement such changes?
    • Can we review the configs of the managed router?
    • Determine how the provider defines QoS.
SLA Overview
The SLA is a statement of intent from the provider. 
SLA should cover common managed-service metrics:
Mean time to repair (MTTR):  How long it takes to repair failures
Mean time between failures (MTBF):  Measure of how often failures occur.  
By clearly defining SLA, customers have concrete information to discuss at contract renewal time.

A good SLA also includes technical metrics:
Packet Loss:  When packets fail to reach their destination
Latency:  Measure of how much time it takes for a packet to get from one point to another
Jitter:  Variable delay that can occur between successful packets
IP Availability:  Measure of the availability of IP services end-to-end across a network

It is important to avoid SLAs that average results across many devices or links.  When averaging availability across enough routers, a device can be down for days and not violate some SLAs.  

SLA Monitoring 
Network measurements let customers know what kind of services are actually receiving.  Some service providers allow their customers to view SLA performance data through a web portal.  Internal monitoring can serve to activate an early response to a major outage.  Internal customer measurements can provide evidence of network issues to discuss with the ISP, especially if the service provider has not actively been measuring the service. 


Chapter 5: Enterprise Data Center Design

Home to the computational power, storage, and applications necessary to support en enterprise business.  The data center network design is based on a proven layered approach.  This chapter focuses on the 3 layers of the data center architecture, the one rack unit (1RU) access switch design, the modular design, as well as scaling options.

Designing the Core and Aggregation Layers

The core player provides a high-speed layer 3 fabric for packet switching.  The aggregation layer extends spanning-tree or layer 3 routing protocols into the access layer.  

Data Center Architecture Overview

Layered approach to improve scalability, performance, flexibility, resiliency, and maintenance.  The access layer network infrastructure can support both layer 2 and layer 3 topologies, and layer 2 adjacency requirements that fulfill the various server broadcast domain or administrative requirements.

Benefits of the Three-Layer Model

    • Layer 2 domain sizing:  3 layer design helps to prevent extending layer 2 through the core.
    • Service module support:  Lowers cost of ownership by sharing services across the entire access layer of switches.
    • Support for a mix of access layer models:  Permits a mix of layer 2/3 access models with 1RU and modular platforms.
    • Support for network interface (NIC) teaming and high availability clustering

Data Center Core Layer Design
Recommended when multiple aggregation modules are used for scalability.  Also, might be appropriate to use the campus core for connecting the data center fabric.  It is important to take into consideration: 10 gigabit ethernet port density (not enough to support the aggregation layer), admin domains and policies, as well as future growth.  
  • Layer 3 Characteristics for the Data Center Core
    • It is important to consider where to place the layer 2-to-layer 3 boundary.  It is accepted that the core will be layer 3, with the boundary being above or below the aggregation layer modules.  Layer 3 links allow the core to achive bandwidth scalability and quick convergence, and to avoid path blocking or the risk of controlled broadcast issues.  Obviously, layer 2 should be avoided in the core as a STP loop could result in an entire data center outage.  
    • The core layer should use an IGRP, such as EIGRP or OSPF.
    • Load balance using CEF-based hashing alogorithms.
  • OSPF Routing Protocol Design Recommendations
    • Use NSSA from the core down, as it limits LSA propogation but permits route redistribution.  
    • Use the auto-cost reference-bandwidth 10000, as OSPF needs to be tuned to support 10 Gb links.
    • Use the auto-cost reference -badwidth 10000 to reflect 10 Gb eterhnet for the interswitch SVI.
    • Use the loopback interface for the router ID for troubleshooting
    • Configure only interfaces that need to participate in the routing process (passive interfaces)
    • Tune OSPF timers to subsecond timers with the timers throttle spf command.
  • EIGRP Routing Protocol Design Recommendations
    • Configure a default route into the data center layer with the ip summary-address eigrp interface command to and from the data center. 
    • Filter other external routes with distribution lists
    • Like OSPF, use passive interfaces to configure EIGRP so to only propagate EIGRP information with the necessary interfaces.
Aggregation Layer Design

  • Scaling the Aggregation Layer:  Multiple aggregation modules allow the data center architecture to scale as additional servers are added.
    • Spanning-tree scaling:  Failure exposure can be limited to a smaller domain
    • Access layer density scaling:  As the access layer grows, port density to the aggregation layer maybe become an issue.
    • HSRP scaling:  Cisco recommends the max number of HSRP instances in an aggregation module to approx 500.  
    • Application services scaling:  Use of virtual contexts to support applications on service modules across multiple access layer switches.  
  • STP Design
    • Aggregation layer carries the largest burden
    • RSTP preferred over MST 
  • Integrated Service Modules
    • Provide services such as content switching, firewall, SSL offload, intrustion detection, and network analysis.  May be deployed in one of two scenarious:
      • Active/Standby:  Traditional deployment model
      • Active/Active:  Newer deployment model
  • Service Module Placement Consideration
  • Active STP, HSRP, and Service Context Alignment:  Recommended to align the active STP, HSRP, and service context in the aggregation layer to provide a more deterministic environment.  
    • Active component alignment prevents session flow from entering one aggregation switch and then hopping to a second aggregation switch to reach a service context.  
    • This recommended model provides a more deterministic environment and offers more efficient traffic flow and simplified trouleshooting.
  • Active/Standby Service Module Design:  
    • Predictable deployment model
    • Simplifies troubleshooting
    • Underutilized access layer uplinks because it does not use both uplinks
    • Underutilized service models and switch fabrics because it does not use both modules
  • Active/Active Service Module Design
    • Distributes the service and processing and increases overall service performance
    • Supports uplink load balancing
  • Establishing Inbound Path Preference
    • Important for path preferrence to a particular server application in an active/standby solution.  Using Cisco CSM (content switching modules), clients connect to the virtual IP of the virtual server.  The CSM then chooses a real physical server based on configured load-balancing algorithms.  
  • Using VRFs in the Data Center:  
    • Virtual routing and forwarding (VRF) can be used to map the virtualized data center to an MPLS MAN or WAN cloud.  

Designing The Access Layer

Overview of the Data Center Access Layer

Typically one of three models:
    • Layer 2 looped:  VLANs extended to the aggregation layer.  Layer 3 first performed at the aggregation layer.
    • Layer 2 loop free:  VLANs not extended to the aggregation layer.  Layer 3 first performed at the aggregation layer.
    • Layer 3:  Layer 3 routing is first formed at the access layer.
Layer 2 Looped Designs
    • Benefits:
      • VLAN extension:  Flexibility to extend VLANs between switches.  The layer 3 boundary in the aggregation layer is above the trunk link that connects the aggregation switches.  
      • Layer 2 adjacency requirements:  NIC teaming, clusters are examples that typically require NICs to be in the same broadcast domain or VLAN.  
      • Custom Applications:  Custom applications may not take layer 3 networks into consideration.
      • Service modules:  Layer 2 access permits services provided by service modules to be shared across the entire access layer.
      • Redundancy:  Looped designs are inherently redudant.
  • Layer 2 Looped Topologies
    • Looped Triangle:
      • Supports VLAN extension
      • Resiliency is achieved
      • Quick convergence with RSTP
      • Proven and widely used
    • Looped Square:
      • Supports VLAn extension
      • Resiliency is achieved with dual homing and STP
      • Quick convergence with 802.1w and 802.1s
      • Preferred for active/active uplinks
  • Layer 2 Looped Design Issues
    • Fault has a severe impact across the entire layer 2 domain
    • Important to use loop-prevention mechanisms (BPDU guard, root guard, etc)
Layer 2 Loop-Free Designs
Main reason for using a loop-free design is to avoid STP issues, often when inexperienced.  Even with a loop-free design, it is still necessary to run STP as a loop-prevention tool (in the event of a cabling issue).
A loop-free layer 2 access topology has these attributes:

  • Active uplinks:  All uplinks are active, with none blocking
  • Layer 2 server adjacency
  • Stability:  Provides fewer chances for loop conditions
Loop-Free Technologies

  • Loop-Free U
    • Layer 3 link between aggregation layer devices
    • VLANs are contained in switch pairs
    • No STP blocking
    • Layer 2 service modules black-hole traffic on uplink failure.  (Service modules can be configured to switch over roles by using interface monitoring features)
    • Cisco does not recommend the use of loop-free layer 2 access design with active/standby layer 2 service module implementation.
  • Loop-Free inverted U
    • Layer 2 link between aggregation layer devices
    • Supports VLAN extension
    • No STP blocking
    • Access switch uplink failure black-hole single attached servers
    • ISL scaling considerations
Layer 2 FlexLink Designs
Layer 2 alternative to the looped access layer.  FlexLinks provide an active/standby pair of uplinks defined on a common access layer switch.
Notes:

  • Interfaces can belong to only one FlexLink.
  • Pair can be of the same or different interface types
  • STP disabled (no BPDU propagated)
  • Failover from active to standby takes 1-2 seconds
  • Aggregation layer unaware of FlexLink configurations (configured on access side)
FlexLink Issues and Considerations

  • STP issues may arise, if a new link were to be added between access layer switches, or between the aggregation to the access layer (since no STP on FlexLinks).
  • ISL bandwidth requirements should be considered under failover conditions
  • Only advised for environments with high administrative control.  
Comparison of Layer 2 Access Designs


Layer 3 Access Layer Designs
Layer 2 trunks between pairs of access switches support the layer 2 adjacency requirements in the data center.  Cisco recommends running STP as a loop-prevention tool when you are using a layer 3 access model.  STP will only be active on the interswitch trunks and server ports on the access layer switches.  The Layer 3 design is mainly used to contain broadcast domains to a particular size.  With a layer 3 access design, all uplinks are active and use CEF load balancing up to the equal cost multipath (ECMP) maximum.  Layer 3 designs can provide convergence times faster than STP, although RSTP is close.

Multicast Source Support
With layer 2, IGMP snooping can be used at the access switch to automatically limit multicast flow to interfaces with registered clients in the VLAN.  One significant driver for layer 3 designs include situations where IGMP snooping is not available at the access layer.

Benefits of Layer 3 Access
  • Minimize broadcast domain sizes
  • Server stability requirements
  • All uplinks are available paths and are active up to the ECMP maximum
  • Fast uplink convergence is supported for failover and rollback
Drawbacks of Layer 3 Access
  • IP address space management is more difficult
  • Layer 2 adjacency is limited to access pairs, which will limit clustering and NIC teaming capabilities

Blade Server Overview

A blade server is a chassis that houses many servers on a blade or module.  They reduce operational cost and save rack space.

Blade Server Connectivity Options

  • It is important to avoid stacked or dual-tier layer 2 access switches as they complicate STP, Increase failure domain, and may lead to oversubscription of uplinks and trunks.  
  • May implement trunk failover capabilities, allowing for rapid failover ot the redundant switch in the blade enclosure if all uplinks to the primary switch fail.  
Blade Servers with InfiniBand
Standards based protocol that provides high throughput and low-latency transport for efficient data transfer between server memory and I/O devices.  InfiniBand uses remote direct memory access (RDMA) to offload data movement from the server CPU to the InfiniBand host channel adapter (HCA).

Blade Server Trunk Failover Feature
Provides layer 2 redundancy in the network when used with server NIC adapter teaming.  If all the upstream ports become unavailable, trunk failover automatically puts all the associated downstream ports in an err-disabled state, causing the interface to failover to the secondary interface.  Without the failover feature, if the upstream interfaces lose connectivity, the link states of the downstream interfaces remain unchanged.  Traffic will be black-holed.  While this is a loop-free design, STP should be enabled for loop protection. 

Layer 2 or Layer 3 Access Designs

Scaling the Data Center Architecture

EtherChannel technologies and service module designs provide options for scaling the bandwidth and density of the data center aggregation layer.  

Modular Versus 1RU Designs

Tradeoffs to consider:
    • Cabling designs:  Issues may come apparent with a higher density of servers per rack.
    • Cooling: Cable bulk increases cooling challenges.  1RU may improve the cabling design.
    • Power:  Power capacity at the server rows to support an increased component density.
    • Density:  Consideration of max number of interfaces useed per rack and per row can help determine modular vs 1RU.
    • Resiliency:  With a single access switch, we have a single point of failure.  Consider redundant power / processors.
Cabinet Designs with 1 RU Switching
Main advantage: minimizes cabling between each cabinet.  With 1RU switching, the access layer switch is located inside the cabinet with the servers, and cabling from the servers to the access switch is contained in the cabinet.  While it may be more efficient cable wise, which supports improved cooling to the cabinet, the disadvantage of the 1 RU switching model is the necessity to support an increased number of devices under network management and STP processing.

Cabinet Design with Modular Access Model
Unlike the 1 RU model, switches are located outside of the cabinet.  The modular switching design minimizes the number of required switches.  Cabling from the servers to the access switch are run under the floor or in the ceiling.  While there is reduced management complexity, decreased STP processing, and more redundancy options, there is also increased cable bulk and cooling constraints
.
The main advantage to the modular approach are fewer devices to manage and fewer uplinks to the aggregation layer.

The main disadvantage is that there are challenges in implementing and managing the cabling from the servers to the access switches.

Server NIC Density
The number of NICs required per server will affect how many servers can be supported per switch.  When planning for NIC support on switches, one should plan for at least 3 NICs per server.  

Hybrid Example with Separate OOB Switch
The 1 RU switches provide lower power consumption, while the modular switches provide options for redundant CPUs and power supplies for critical applications.  The design also supports NIC teaming.  

Oversubscription and Uplinks
Oversubscription ration per server=Number of server Gb connections/total aggregated uplink bandwidth on the access layer switch.  Example:  4 10Gb ethernet uplinks that support 336 server access ports can be calculated as:  336 Gigabit ethernet connections with 40-Gb uplink bandwidth = 8.4:1 oversubscription ratio.  

Optimizing EtherChannel Utilization with Load Balancing
To optimize EtherChannel, it is recommended to enable the layer 3 IP plus layer 4 port-based CEF hashing algorithm for EtherChannel ports.  This will improve load distribution because it presents more unique values to the hashing algorithm.  

Optimizing EtherChannel with Min-Links
This feature allows a minimum number of available ports to be specified for a port channel to be considered a valid path.  The Min-Links feature works at the physical interface level and is independent of spanning-tree path selection.

Scaling with Service Layer Switches
Moving classic bus-based service modules out of the aggregation layer switch increases the number of available slots and improves aggregation layer performance.  Service switches are useful when a farm of classic bus-based Cisco CSMs or SSL offload modules are required. 

Scaling Service on Cisco ACE Modules
Cisco (Application Control Engine) ACE module can consolidate the functions of SLB, SSL acceleration, and firewall services such as protocol inspection and filtering into one service model.  This saves time, processing power, and memory. 

Scaling Spanning Tree and High Availability

Scalability 
Ability to span VLANs across access switches is necessary to meet layer 2 adjacency, but also to meet application requirements.  STP design should answer the following questions:
    • How many VLANs can be supported in a single aggregation module?
    • Can a "VLAN anywhere" model be supported to avoid purning?
    • How many access switches can be supported in each aggregation module?
    • What are the maximum number of logical ports and virtual ports in STP?
STPs in the Data Center
RPVST and MST are the recommended STP method for the data center.

RPVST:
  • Scales to a large size (10,000 logical ports)
  • Proven solution that is easy to implement and scale
MST:  
  • Permits large-scale STP implementation (30,000 logical ports) -- useful for service providers
  • Not as flexible as RSTP
  • Has service module implications for firewalls in transparent mode.  
STP Scaling
 In a layer 2 looped topology design, ST processes are created on each instance for each active VLAN.  These instances are referred to as active logical ports and virtual ports.  These values are usually of concern only on switches that have a larger number of trunks and VLANs configured (aggregation layer.  Logical ports area a systemwide value that reflects the total number of spanning-tree processing instances used in the whole system.  


You can determine the active logical interfaces on a switch by using the show spanning-tree summary total command.    Virtual ports are STP instances allocated to each trunk port on a line card.  Virtual ports are a per-line card value that reflects the total number of spanning-tree processing instances on a particular line card.  You can also determine the virtual ports on a switch module by using the show vlan virtual-ports slot command. 

STP Scaling with 120 Systemwide VLANs
Example: 120 VLANs systewide, 45 access switches connected to each aggregation switch using 4 GECs.
Total number of active logical interfaces is calculated like this...
(120*45 access links)+(120 instances on link to AGG2)=5400+120=5520 (Less than the max number of logical interfaces for RSTP).
Total number of virtual ports is calculated like this...
(120*48 access links) = 5760 (More than the max number of virtual ports for RSTP--might consider MST).

Also, the STP design recommendations are exceeded with 120 VLANs.  

STP in 1RU Designs
Important to consider in STP design due to the increased number of physical interfaces.  A higher number of access-link trunks will increase STP logical ports 

STP Scaling Design Guidlines
    • Add aggregation modules
    • Limit HSRP instances
    • Perform manual pruning on trunks
    • Use MST if it meets the requirements

High Availability in the Data Center

Common points of failure in the data center are on the path from the server to the access switch.  These network failures can be addressed with NIC teaming.  

Common NIC Teaming Configurations

If one NIC fails, the secondary NIC assumes the IP address of the server and takes over operation without disruption.  All NIC teaming solutions require the NICs to have Layer 2 adjacency with each other.
    • Adapter Fault Tolerance (AFT):  2  NICs connect to the same switch.  Common IP / MAC, one adapter is active while the other is stadby
    • Switch Fault Tolerance (SFT):  Similar to AFT, except connects to two separate switches.
    • Adaptive load balancing (ALB):  One port receives and all ports trasmit using one IP address and multiple MAC addresses.
  • Server Attachment Methods
    • EtherChannel is another means of providing scalable bandwidth.  It also allows servers to bundle multiple links to allow higher throughput between servers and clients, and to support redundancy.   Be sure to enable layer 3 IP plus Layer 4 port-based CEF hashing!
High Availability and Failover Times: 


High Availability and Cisco NSF and SSO
The recommended data center design that uses service modules has a minimum convergence time of about 6-7 seconds.  If Cisco NSF with SSO is implemented, the service modules do not converge in the event of s supervisor failure.  Implementing dual supervisors that use NSF with SSO can achieve increased high availability in the data center network.

Currently, HSRP is not maintained by Cisco NSF with SSO.

You should not set the IGP timers so aggressively or tune them so low that NSF with SSO is defeated.  The IGP process should not react to adjacent node failure before an SSO can be determined.   


Chapter 6: SAN Design Considerations

Storage-area network (SAN) is a set of protocols and technologies that permit storage devices to have direct connections with servers over some distance.  VSANs(virtual SAN) allow a group of discrete SANs to be connected together using a virtual fabric.  To enabled a unified fabric, switched 10 Gigabit Ethernet can be used as a foundation, and Fibre Channel SAN traffic can be transported across the Ethernet switched network using the FCoE protocol. 

Identifying SAN Components and Technologies

  • Managing a SAN tends to be more complex than managing a DAS (directly attached storage).
  • SANs provide a lower total cost of ownership (TCO) because the storage is not captive to one server and there is a more efficient utilization of storage.
  • SAN backup can be easier (does not require dedicated network bandwidth / does not tie up host capacity).
  • Hardware that connects servers to storage devices in a SAN is referred to as a fabric.  

SAN Components

  • 3 main components to a fibre channel SAN: host bus adapters (HBA), data storage devices, and storage subsystems.    
    • HBA: I/O adapter that provides connectivity between a host server and a storage device.
    • Storage subsystems:  Just a bunch of disks (JBOD), storage arrays, or RAID

Raid Overview

  • Inexpensive way to put together a set of physical hard drives into a logical array of storage devices.  
  • RAID provides fault tolerance compared to stand alone drives.
    • RAID 0: Striping; no redundancy, increases storage and performance
    • RAID 1: Mirroring; increased redundancy, but requires twice the space requirements
    • RAID 3: Error detection; Data is striped across multiple disks with one used for maintaining error-correction information.  (no recovery for parity disk)
    • RAID 5: Error Correction; Data and parity info is striped across multiple disks.
    • RAID 6  Error Correction; Similar to RAID 5, except there are two parity blocks.  This results in more redundancy at the cost of storage efficiency.

Storage Topologies

DAS - Topology where storage devices connect directly to the server.

  • Commonly referred to as captive storage. 
  • Provide little or no mobility to other servers and little scalability.
  • Can be complex to implement and manage.

NAS - Storage devices that can be attached to the IP networks

  • Device dedicated to file sharing
  • Allow access at a file level using Network File System (NFS) or Common Internet File Systems (CIFS) across an IP network.  
  • NAS devices respond to requests by providing portions of the file system.  To retrieve a file, a NAS device has to open a directory, read it, locate the file, check permissions, and then transfer the file. 

SAN Technologies

SCSI Overview

  • Parallel interface technology used to connect hosts to peripheral devices.
  • Specs: 
    • Up to 75 feet
    • 320 MBps shared bandwidth
    • 16 devices per SCSI bus
    • Half-duplex operation

Fibre Channel Overview

  • Supports the extension of SCI technologies.
  • Provides high-speed transport for SCSI payload, but overcomes the distance and limitations that come with parallel SCI technology.
  • Integration of best attributes of host channel and networking technologies.  
  • 3 major topologies:
    • point-to-point
    • arbitrated loop
    • switched fabric.
  • Fiber Channel communications is similar to TCP
    • communications are point-to-point oriented 
    • Supports logical node connections (sockets)

VSAN

  • Provides isolation among devices that are physically connected to the same fabric.
  • A good SAN design is required to build a large SAN and ultimately use a higher number of ports.  
  • VSANs cut down on the physical connections needed to various targets shared by different islands.  

IVR

  • Provides connectivity between fabrics without merging them.  
  • IVR permits routing between VSANs
  • Without IVR, you would be forced to merge the separate fabrics to share information between VSANs
  • IVR used with Fibre Channel over IP (FCIP) can provide efficient business continuity or disaster recovery solutions.  

FSPF

  • Fabric Shortest path First (FSPF) is the standard path-selection protocol used by Fibre Channel fabrics.  IVR uses FSPF to calculate the best path to a remote fabric.  
  • Supports multiple paths
  • Provides a preferred route when there ate 2 equal paths available
  • Runs on a per VSAN basis
  • Uses Dijkstra algorithm for path selection (same as OSPF)

Zoning

  • Logical grouping of fabric-connected devices within a SAN or VSAN.
  • Means of restricting visibility and connectivity between devices connected to a common Fibre Channel SAN or VSAN.  
  • While software and hardware based zoning are available, hardware zoning is considered to be more secure.  
  • VSANs are first created as isolated logical fabrics within a common physical topology; then, individual unique zone sets can be applied as necessary within each VSAN. 

FICON

  • Fiber Connectivity (FICON) is an upper-layer protocol that uses the lower layers of Fibre Channel transport for connecting IBM mainframes with control units.  

SANTap

  •  Allows 3rd-party data storage applications to be integrated into the SAN.  SANTap enables data being written to a storage device to be duplicated at another appliance within the fabric.  The appliance need not be in the primary data path. 
  • Benefits:
    • Transparent insertion and provisioning of appliance-based storage applications
    • No disruption of the primary I/O from the server to the storage array
    • On-demand storage services
    • Scalable commodity appliance-based storage applications

Designing SAN and SAN Extension

  • Plan a network topology that can handle the number of ports for both present and future needs.
  • Design a network topology with a given end-to-end performance and throughput level in mind, taking into account any physical requirements of a design.
  • Provide the necessary connectivity with remote data centers to handle the business requirements of continuity and disaster recovery.

Port Density and Topology Requirements

  • Recommended practice is to design the SAN switches with the capacity to support future requirements.
  • SAN design should consider topology & physical space (buildings/floors/etc)
  • Design choice is a business decision based on the forecasting of capacity vs initial capital expenditure.

Device Oversubscriptions

  • Recommended fan-out ration of subsystem client-side ports to server connections in the range of 7:1 to 15:1.
  • Group applications or servers that burst high I/O rates at different time slots within the daily production cycle.

Traffic Management

  • The main problem with storage consolidation is that faults are no longer isolated.  
  • Logically isolating devices that are physically connected (VSANs) enable consolidation of storage while increasing security and stability.

Convergence and Stability

  • Minimize processing required within a given SAN topology by minimizing the number of switches in a SAN.
  • Implement appropriate levels of redundancy.

Comprehensive SAN Security

  • Secure roles-based management
  • Centralized authentication of devices connected to the network
  • Traffic isolation and access controls
  • Encryption of all data leaving the storage network

Simplified SAN Management

  • Provisioning and operating a storage infrastructure while also ensuring availability, reliability, recoverability, and optimal performance. 
  • Use centralized, easy to use, management tools that provide significant troubleshooting capabilities.

SAN Extension

  • Refers to transporting storage traffic over distances such as MAN and WANs.  
  • Typically use multimode fiber over short distances, or single-mode over longer distance.  
  • SAN extension across a MAN, or WAN allows the enterprise to support applications such as distributed replication, backups, and remote storage.

SAN Extension Protocols

FCIP and iSCSI stacks support block-level storage for remote devices.   Both of which uses TCP/IP as a transport mechanism, allowing them to leverage the existing network infrastructure. 

Fibre Channel over IP

  • Fibre Channel encapsulated in IP
  • Provides connectivity between two separate SANs over a WAN
  • Fibre Channel packet is encapsulated into FCIP, which is then carried over TCP/IP (similar to trunking Fibre Channel between switch fabrics over the WAN).
  • Requires high throughput with no to very few drops, low latency, and low jitter.
  • Means of providing a SAN extension over an IP infrastructure (disaster recovery).

iSCSI

  • Protocol used to carry SCSI commands, responses, and data over an IP network.  
  • Transport provided TCP/IP and not over a Fibre Channel network.  
  • Advantages over FCIP:
    • Lower overall cost of ownership
    • Standard-based
    • Standard network equipment can be used in the iSCSI network.

High-Availability SAN Extension

  • Achieved with dual fabrics
  • Augment design with additional network protection via port channels and optical protection schemes.

Integrated Fabric Designs Using Cisco Nexus Technology Overview

I/O Consideration in the Data Center

  • Benefits of a unified fabric (consolidation of similar cabling, network adapters, and access switches):
    • Reduced cabling
    • Fewer access layer switches
    • Fewer network adapters per server
    • Power and cooling savings
    • Management integration
    • Wire once

Challenges when Building a Unified Fabric Based on 10 G Ethernet

  • Challenges:
    • Integration issues with existing storage networks
    • High availability and a lossless network are required (Fibre Channel is extremely sensitive to packet loss).  
    • Architecture of the ethernet switches must provide low port-to-port latencies to support IPC traffic.
    • Must integrate easily into existing Fibre Channel-based SANs.

SAN Protocol Stack Extensions

  • FCoE and, FCIP, and Fibre Channel all use the Fibre Channel Protocol (FCD) and Fibre Channel framing.  
  • FCoE and FCIP can easily be integrated into existing Fibre Channel SANs.
  • FCIP primarily used for switch-to-switch protocol
  • FCIP uses TCP and IP on the network and transport layer, thus having higher overhead than FCoE.

FCoE Components: Converged Network Adapter

  • Allows a server to support Ethrenet-based LAN traffic and FCoE-based SAN traffic on a single connection.  
  • Presents itself to the OS as two separate devices.

FCOE Components: Fibre Channel Forwarder

  •  Combines the functions of an Ethernet and Fibre Channel switch.  
  •  Can be used to connect Fibre Channel SANs, FCoE hosts, and FCoE servers.
  • FCF consists of an Ethernet bridge, which connects to the Ethernet ports, and a Fibre Channel switch, which connects to the Fibre Channel switches or nodes.

Chapter 7: E-Commerce Module Design

The e-commerce module enables organizations to support e-commerce applications through the Internet.  

Designing SAN and SAN Extension

E-Commerce High-Availability Requirements

E-commerce applications represent the public face of the organization, thus downtime is particularly harmful.  

Components of High Availability

  • Redundancy:  Ultimate goal is to avoid single points of failure.  Designs must trade off costs versus benefits.  
  • Technology:  Examples of technology to improve availability are SSO and NSF.  Also, technologies to detect failures are necessary.  Such technologies may include service monitoring on server load balancers (SLB), Cisco IOS SLA.  Other technologies to contribute to high availability may include fast routing convergence. 
  • People:  Important to consider training, documentation, communication, testing, etc.
  • Processes:  Implementing sound, repeatable processes is important to achieving high availability (Example: PPDIOO).  It is important to build repeatable processes, use labs appropriately, implement good change control processes, and manage operational changes.
  • Tools:  Component monitoring (monitoring redundant devices so that they can be replaced before both points of redundancy are gone), performance thresholds, packet loss / latency monitoring and good documentation are useful tools.  

Common E-Commerce Module Designs

Includes routing, switching, firewall, and server content load balancing components. 

Common E-Commerce Firewall Designs

  • e-commerce module is typically implemented in a data center facility.
  • Usually have multiple firewall layers (firewall sandwich).
  • Good idea to use different OS between firewalls, to prevent OS specific attacks.

Using a Server as an Application Gateway

This scenario discuses firewalls layers that go through servers.  Each will have interfaces in separate VLANs.  For example:  One interface would provide web services, and a separate interface connects through another firewall to application servers.  
  • Increases security; hacker would have to penetrate the firewall and the web server OS to attack the middle layer of firewalls.  
  • Variation would be a single connection from the switch to each server; Port-specific ACL on firewalls to provide security.
A physical Cisco Firweall or Cisco Appliance Engine (ACE) module can be virtualized or divided into separate firewall contexts.  
  • Firewall contexts can be used to separate different internet-facing e-commerce blocks and different layers of a firewall sandwich.  
  • Contexts retain the secure separation of rules and other customer features as separate physical firewall devices.
Firewall modes:
  • Transparent (Bridged): Firewall bridges two VLANs together, switching traffic at Layer 2 between the two VLANs, which together constitute a single IP subnet.
    • Described as a bump-in-the-wire mode
    • Any traffic that goes through the firewall is subject to stateful IP address based ACL features
    • Can isolate less-secure servers from more-secure servers within the same VLAN and subnet.  
  • Routed Mode: Fireall routes traffic between VLANs (or interfaces).
    • Traffic subject to stateful IP address based ACLs
    • Used by most current designs

Common E-Commerce Server Load Balancer Designs

A Server Load Balancer (SLB) supports both scaling and high availability by distributing client requests for active servers.  The SLB intelligently passes traffic to a pool of physical servers, based on the load and on configuration rules.

SLB Design Models:
  • Router mode: SLB device routes between outside and inside subnets
    • services' VIP addresses are usually in a globally routable public IP subnet
    • SLB internal IP used as gateway
  • Bridge mode: Transparent bridging
    • Acts much like the firewall transparent bridging mode.  
    • Content load balancer acts as a 'bump in the wire' between the servers and the upstream firewall or layer 3 device (ie. router).  
    • Servers use the IP address of the firewall or layer 3 device as their default gateway.
    • Physical servers are in a globally routable subnet.  
    • STP must be considered if SLB devices are configured in a redundant fashion.  
  • One-armed or two-armed mode: Replies from servers pass back through the SLB on their way to the end user.  Gateway can be set to the SLB device, PBR can be used, or client NAT can be used.
    • Out-of-band approach
    • Not directly in line with the traffic path
    • SLB VIP and the physical servers are in the same VLAN or subnet.
    • Inbound end-user traffic is routed to the VIP of the SLB device.  SLB then translates the IP destination address to a physical server IP address and forwards the traffic to the physical server, similar to routed mode.  
    • Main difference from routed mode is that traffic must be forced to go to the SLB device so that the source IP of traffic from the physical server can be translated back to the VIP.  
    • Simplest way to go through the SLB is to set the server default gateway to the SLB device, rather than the router (alternative ways are to use client NAT or to use PBR).
    • Main advantage is that not all inbound and outband traffic has to go through the SLB device.  
    • Another advantage is that scaling by adding SLB devices is simple.
Application Control Engine (ACE) is a design thaat is the recommended approach for appliance-based content load balancers.  The real servers typically used the SLB inside as their default gateway.  As reply traffic passes back through the SLB, the source real IP is changed to the VIP address (no way for the end user to tell there is a SLB device in the path).  

Common E-Commerce Design Topologies for Connecting to Multiple ISPs

One Firewall per ISP: Commonly used in small sites, as it is easy to implement.  External DNS resolves the organization's site name to an address from either ISP's external address block.
Stateful Failover with Common External Prefix:  Firwall pair and NAT devices support some form of stateful failover.  NAT devices translate addresses to a block that both ISPs are willing to advertise for the site.  
Distributed Data Center:  Used by very large e-commerce sites with critical services (banks).  Also protects against regional problems.  

Design Option: Distributed Data Centers

  • More failover flexibility than having one chassis with dual components, or two-chassis deployment
  • Allows for non-service impacting maintenance 
  • Technology allowing active-active hot databases as opposed to active database and mirrorored hot spare database.
  • Recommended design approach is to tie together the redundant sites via an internal WAN link to avoid the need for external failover response.  

Additional Data Center Services

SSL offload: 
  • Commonly used to encrypt sensitive web application traffic.  Since SSL encryption & decryption is CPU intensive, Cisco ACE can offload this task from the servers to improve server efficiency.  
  • Useful to ensure that a web application firewall (WAF) or IPS can inspect the payload of the SSL session for malicious content.  
Cisco ACE WAF: Provides full-proxy firewall solutions for both HTML and XML based web applications.  

IPS: Provides deep packet anomaly inspection and protect against common and complex embeded attacks.

Chapter 8: Security Services Design

Designing Firewalls

  • Accomplishes network defense by comparing corporate policies on network access rights for users to the connection information surrounding each access attempt.

Firewall Modes

  • Transparent mode:  Firewall is a layer 2 device, not a router hop.  Essentially just connects two VLANs at layer 2 and utilizes security policy without creating separate layer 3 subnets.
    • One IP address to the entire bridge group
    • Can allow certain traffic that would normally be blocked by routed mode
    • Useful if it is required to pass or filter IP multicast traffic
  • Routed mode:  Firewall is considered to be a layer 3 device in the network.  Able to support NAT, multiple subnets, and requires an IP address on that subnet.  
    • Max of 256 VLANs assignable to a single context
Zone-Based Policy Firewall:  Assigns interfaces to zones, and applies an inspection policy to traffic that is moving between zones (unlike the older interface-based model).  Allow a more flexible, easily configured model.  Zone-based policy firewall default policy between zones is to deny all (significant departure from the implicitly allowed experienced through stateful inspection).

Virtual Firewall Overview

  • Allows a physical firewall to be partitioned into multiple standalone firewalls.  
  • Commonly known as 'Contexts'
  • Specefic VLANs are tied to specific security context.  
  • Each context has its own policies, such as NAT, access lists, and protocol fixups.
Design considerations:
  • Since an anomaly / attack on a single context may affect the whole system, it is important to limit the use of resources per context.
MSFC Placement: 
  • Can be placed on either side of the FWSM
  • Placing the MSFC outside the FWSM makes design and management easier
  • Placing the MSFC outside the FWSM makes the MSFC route between the internet and the switched network.
  • Placing the MSFC inside the FSWM, no traffic goes through the FWSM unless it is destined for the internet.
  • For multiple contexts, if the MSFC is placed inside the FWSM, it should connect to only a SINGLE context, or the MSFC may route between the contexts (may not be your intention).
Active/Active Firewall Topology
  • Use of two firewalls, both actively providing firewall services
  • The MAC address of the primary unit is used by all interfaces in the active contexts.  
  • Design supports preemption
  • Both devices can pass network traffic; this design can support load balancing

Asymmetric Routing with Firewalls

  • 32 ASR groups supported
  • Each ASR group supports a maximum of eight interfaces
  • Asymmetric routing is supported in the active/active design & is supported in both the routed and transparent modes of firewall operation.  
  • Interfaces inside a common ASR group allow packages 
  • Interfaces inside a common ASR group in an active/active topology support asymmetric routing.

Performance Scaling with Multiple FWSMs

  • Up to 4 FWSM can be installed on a single chassis
  • Can use policy based routing (PBR) to steer traffic through multiple FWSMs.
  • Can also use basic routing, such as static or equal cost multipath (ECMP) routing, to direct traffic flows.
PVLANs:  Allow layer 2 isolation between ports within a VLAN, without assigning ports to separate VLANs or assigning individual hosts to different layer 3 IP networks.  The port that belongs to a PVLAN are also associated with a primary VLAN.  
Types of secondary VLANs:
Isolated VLANs: Cannot communicate with other ports on the switch other than the promiscuous port.
Community VLANs: Can communicate with one another in the community bot cannot communicate with ports in other communities or isolated ports.

Types of PVLAN ports:
Promiscuous: Communicates with all other PVLAN ports; sends traffic using the primary VLAN.
Isolated: Complete layer 2 isolation; use a secondary VLAN to send traffic and blocks any traffic coming from the secondary VLAN.
Community: Separate secondary VLAN is allocated for each community.
Note:  A broadcast sent out from the promiscuous port reaches all community / isolated ports.  

Designing NAC Services

Use of the network infrastructure to enforce security policy compliance on all devices seeking network resources.  NAC allows network access to only compliant endpoints, while restricting access of non-compliant devices.

Network Security with Access Control

  • Identify:  Network access rights of the user must be verified.
  • Enforce:  Enforce the use of security software before allowing network access.
  • Isolate:  Forded to meet compliancy before allowing access
Cisco Identify Based Network Solutions (IBNS) is an integrated tool that offers authentication, access control, and user policies.  While tolls such as 802.1X authentication proves who and what they are, it fails to authenticate a user's condition.  NAC Appliancy Agent (NAA) queries anti-virus versions, OS versions, etc.  If any issues are found via the NAA, it directs the device to where it can receive the update.  If any device is found with a security issue, it can be quarantined until rectified.  
  
NAC Comparison

  • Cisco NAC Framework:  Embedded software within NAC-enabled products; Integrates network solutions from more than 60 manufactures of antivirus, security solutions, etc.
  • Cisco NAC appliance:  Turnkey solution for controlling and securing networks by condensing NAC capabilities into an appliance.  
Cisco NAC Appliance Fundamentals
Important Components:

  • Cisco NAC Appliance Manager (NAM): Single point of administration & authentication proxy to the authentication servers that reside on the back end.
  • Cisco NAC Appliance Server (NAS): Enforcement server between the untrust and the trusted network.  NAS enforces the policies defined by the NAM.  
  • Cisco NAC Appliance Agent (NAA): Resides on windows client machines.  Checks applications, files, services, or registry keys to ensure machine meets network and software requirements.
  • Cisco NAC Profiler: Real-time, contextual inventory of network devices.
  • Cisco NAC Appliance policy updates: Used to check up-to-date status of OS and antivirus client software.
  • Cisco NAC Guest Server: Facilitates the creation of guest accounts for temporary network access.
Cisco NAC process flow:
1. End user attempts to access network
2. User is redirected to a login page
3. If the device is non compliant with corporate policies or the login is incorrect, the user is denied access to the network and assigned to a quarantine role with access to only online remediation resources.  If the device is compliant, they are permitted access.

Cisco NAS Scaling
The number of users supported per server is influenced by many factors that consume CPU and server resource.  The number of users supported on a server is a measure of concurrent user that have been scanned for posture compliance, not network appliances like printers or phones.  

Cisco NAS Deployment Options

Cisco NAS can operate as a logical layer 2 or layer 3 device, depending on the gateway mode configured:

  • Virtual gateway mode: Operates as a layer 2 ethernet bridge typically when the untrusted network already has a layer 3 gateway.  This is the most common and easiest to deploy, but it may require additional equipment and may also hinder throughput.  
  • Real IP gateway mode:  NAS device operates as the layer 3 default gateway for untrusted network clients.  Cisco NAS can perform DHCP services or act as a DHCP relay.
  • NAT gateway mode:  With NAT, clients are assigned IP addresses dynamically from a private address pool.  The NAS device performs the translations between the private and public addresses as traffic routed between the internal and external network.  This method is limited in the number of connections it can handle, and is not often used in production networks.  
Operating Modes:
Cisco NAS has two traffic flow deployment models: in-bound or out-of-bound.  The prime difference is that one is in line with traffic path, while the other is only in the traffic path during posture assessment.  In-band operation allows for ongoing ACL filtering and bandwidth throttling.  

NAS Client Access Modes:
  • Layer 2:  The MAC address of the client is used to uniquely ID the device.  This is the most common method.  This mode supports both virtual and real IP gateway operations with in-band and out-of-band deployments.  
  • Layer 3:  The client device is not layer 2 adjacent to the NAS device.  This mode supports both virtual and real IP gateway operations with in-band and out-of-band deployments.  
Physical Deployment Models:
  • Edge Deployment: Physically and logically inline with the traffic path.  Can become complex when there are multiple access closets.
  • Central Deployment:  Most common option and the easiest deployment option.  NAS is logically inline, but not physically inline.
NAC Framework Overview
Architecture-based framework solution designed to take advantage of an existing base of both Cisco network technologies and existing deoployments of security and management solutions from other manufactures.  

IPS and IDS Overview

Designed to ID and stop worms, network viruses, and other malicious traffic. 

Thread Detection and Mitigation

Architecture systems work together to ID and dynamically respond to attacks.  The architecture has the ability to ID the source of the threat, visualize the attack path, and to suggest response actions.  

IDSs

  • Passively listen to network traffic
  • Not in the traffic path; listens to copies of the network traffic
  • Sensor does not affect the packet flow
  • Cannot stop malicious traffic
Intrustion-Prevention Systems

  • Active devices in the traffic path
  • Listens to inline traffic and permits or denies traffic flows and packets into the network.
  • Unlike IDSs, IPS can actually stop traffic
  • May adversely affect packet-forwarding rates
  • Allows for deep packet analysis (Layers 3-7) to stop packets that might have been previously overlooked.

IDS and IPS Overview
Sensors:  Host based (Cisco Security Agent) or network based (IPS appliance)

  • Signature-based: IDS or IPS looks for specific predefined patterns in network traffic.  Unable to find day 0 attacks.
  • Anomaly-based: Checks for defects or anomalies in packets or packet sequence.
  • Policy-based:  Based on network security policy and detects traffic that does not match the policy.  
Security management and monitoring infrastructure: Configures the sensors and serves as the collection point for alarms for security management and monitoring.  CSM is used to centrally provision device configurations and security policies for Cisco firewalls, VPNs, and IPSs and provides some light monitoring functions.

IDS or IPS sensors are placed in the network where they can effectively support the underlying security policy.  Deployment decisions are based on where you need to detect or stop an intrustion as soon as possible.  Typically, placing the sensors at the perimeter of the network outside a firewall or internal to the network inside the firewall between boundaries between zones of trust and at critical servers are the desired placement locations.  It's important to take into consideration where the IPSs are placed, as being in-path introduces risk for latency and loss of connectivity in the event of their failure.

IDS or IPS Deployment Considerations

  • IDS inside the firewall can show firewall failures by showing what they let through
  • IDS outside the firewall can detect all attacks and will generate a lot of alarms but is useful for analyzing what kind of attacks are reaching the organization.  
  • Internet and extranet connections are generally protected first because of their exposure.  Next, management networks and data centers.  Lastly, IPS deployment at remote and branch offices protect the branch from corporate incidents and vise-versa.  

IPS Appliance Deployment Options

  • Two layer 2 devices (no trunk):  Typical campus design.  Can be between the same VLAN on two different switches or be between different VLANs with the same subnet on two different switches.  
  • Two layer 3 devices: Common in the internet, campus, and server farm designs.  The two layer 3 devices are in the same subnet.  Easy to implement as integration can take place without touching other devices.
  • Two VLANs on the same switch:  Allows a sensor to bridge VLANs on a switch.  
  • Two layer 2 devices (trunked):  Common scenario providing protection of several VLANs from a single location.

IPS Deployment Challenges
Asymmetric traffic patterns and high availability are the main challenges.

Preferred design places the monitoring interface on the outside network, and the mangement interface on a separate inside VLAN.  This isolates the management interface by an IPS management VLAN from the rest of the inside network.  It is also recommended to use SSH or SSL protocol for management.

IDS and IPS Monitoring and Management

  • Cisco Security MARS (Monitor, analyze, Response system): 
    • Useful for small to medium sized organizations
    • Provides multivendor event correlation and proactive response, distributing IPS signatures to mitigate active threats.  
  • CSM (Cisco Security Manager):
    • Enables organizations to manage security policies on Cisco security devices.
    • Supports the management and configuration of Cisco IPS sensors.

Chapter 9: IPsec and SSL VPN Design

VPNs are an alternative WAN infrastructure, replacing or autgmenting existing private networks that use dedicated WANs based on leased-line, Frame Relay, ATM, or tother technologies.

Designing Remote-Access VPNs

  • Enables enterprise to reduce communication expenses
  • Levarage patcket-switching infrastructure of the ISP

Remote-Access VPN Overview

  • Permit secure, encrypted connections between mobile or remote users and their corporate network across public networks
  • Technology that connects the VPN headend and the end clients:
    • IPsec: Used for data confidentiality and device authentication
    • SSL: Provide security for web traffic
  • Both IPsec and SSL work towards providing secure communication: Authenticity, Integrity, Confidentiality
SSL VPN Overview
  • Protocol designed to enable secure communication on an unsecure network such as the internet
  • Provides encryption between a web server and a web browser
    • Web server uses certificates to authenticate the ID of a website to visiting browsers.  Via SSL technology, the browser is able to authenticate the ID of the web server and encrypt information.  
    • Aside from basic website support, SSL VPNs can also support complex applications (corporate directory services, calendar systems, etc)
    • Access Mechanisms:
      • Clientless Access:  System connects to a web server, downloads and translates a web page, and then transfers it over an SSL connection to the browser of the end user.  The SSL VPN overwrites or translates content so that the internal address and names on a web page are accessible to the end users.  Also, does not require special software on the client machine.
      • Thin Client:  Depends on a small application  to enable port forwarding which acts as a local proxy server.  The application listens for connections on a port defined for an application on a local host address and tunnels packets that arrive on this port inside of an SSL connection to the SSL VPN device, which unpacks them and forwards them to the real application server.  
      • Thick Client: Automatically delivered through the web page and does not need to be manually distributed or installed.  Thick client should be used when users need full application access and IT wants tighter integration with the operating system for additional security and ease of use.

Remote-Access VPN Design Considerations
  • Can be deployed in parallel with a firewall, inline with a firewall, or in a DMZ (Best practice is place the public side of the VPN termination device in a DMZ behind a firewall).
    • Firewall should limit traffic coming to and from the VPN termination device to IPsec and SSL.  
  • Use static routes for the address blocks for remote devices pointing to the private interfaces on the headend device for nonlocal subnet IP addresses.  
  • Authentication:
    • Can authenticate using digital certificates, one-time passwords (OTP), or even active directory for convenience.  Static passwords should be avoided.
  • Access Control:
    • Possible to maintain all access rules on an internal firewall based on the source IP of the client.  
    • Access control rules can be defined at a per-group basis on the VPN headend device.

Designing Site-to-Site VPNs

Alternative WAN infrastructure used to connect branch offices, home offices, or business partners to all or portios of an enterprise network.  VPNs do not change private WAN requirements, but instead meet these requirements more cost-effectively and with greater flexibility but possiblyu lower performand or SLA aggreements.  

Site-to-Site VPN Applications

  • Replace costly WAN services
  • Provide backup for disaster recovery purposes
  • Used to support regulatory constraints and business policies (OSHA / HIPAA) 
Site-toSite VPN Design Considerations
  • Organization's routing and addressing schema, size, scale, performance expectations.  
    • Addressing and routing:
      • IPsec is an overlay on an existing IP network.
      • VPN termination devices need a routable IP for the outside Internet connection (can be accomplished via NAT).  
      • IPsec tunnel mode encapsulates an entire IP packet, hiding the pre-encrypted packet header
      • IPsec cannot support most IGP routing protocols because IPsec does not support multicast.  To overcome this, GRE is commonly used on conjunction with IPsec.  
    • Scaling, Sizing, and Performance
      • Must consider the number of route sites, access connection speeds, routing peer limits, encryption engine throughput, supported features, and applications transported.
        • Redundant headend device should be deployed in a configuration that results in CPU utilization less than 50% 
        • Branch devices should be deployed in a configuration with less than 65% CPU utilization.
    • Cisco router performance with IPsec VPNs
      • Packets-per-second (PPS) matters more than throughput bandwidth for connection speeds being terminated or aggregated.  In general, routers and crypto engines have upper boundaries for processing a given number of PPS.  
    • Design Topologies:
      • Remote peers are typically connected to the central site over a shared infrastructure in a hub-and-spoke topology with tunnels from the multiple spokes to the headend hub.  
      • If there are multiple locations with high traffic, to eliminate the performance penality due to two encyrption/decryption cycles for spoke-to-spoke traffic, a partial-mesh can be used.  Partial mesh is similar to hub-and-spoke, except there are some direct spoke-to-spoke connectivity.
      • Full mesh is the most difficult to provision and are scaling issues as the number of sites increase.

VPN Device Placement Design

  • Parallel to a firewall
    • Adv: Simple to deploy (firewall addressing does not need to change) and high scalability (multiple VPNs can be deployed)
    • Disadv: Decrypted traffic is not firewall inspected
  • On a firewall DMZ
    • Adv: Firewall can inspect decrypted VPN traffic and allows for moderate to high scalability.
    • Disadv:  Configuration complexity increases and bandwidth restrictions imposed by firewall
  • Integrated firewall and VPN
    • Adv: Firewall can inspect decrypted VPN traffic and easier to manage (fewer hardware devices).
    • Disadv: Scalability & configuration complexity.

Using IPsec VPN Technologies

IPsec VPN overview

  • Provides data encryption at the IP packet level.
  • With IPsec, the network manager can defeine which traffic should be protected by configuring ACLs and applying these ACLs to interfaces by way of cyrpto maps.
  • Standard IPsec VPNs support only unicast traffic.
Cisco Easy VPN

  • Simple VPN deployment for remote offices and teleworkers with little IT support or for large custer deployments where it is impractical to individually configure multiple remote devices.
  • Allows Cisco routers to act as remote VPN clients
  • Can receive predefined security policies and configuration parameters from the VPN headend at the central site.

GRE over IPsec Design Considerations

  • Used in conjunction with IPsec to allow support for IGP dynamic routing protocols (IPsec does not inherently allow multicast traffic, which is needed for IGP communication).
  • Encapsulates non-IP and IP multicast or broadband packets into IP unicast packets.  These GRE packets can be encrypted by the IPsec tunnel.  
  • With GRE over IPsec design, the hub router uses a single GRE interface for each spoke.
  • In an aggressive design, the headend routing protocol can scale up to 500 peers
  • EIGRP is the recommended routing protocol (conservative use of router CPU network bandwidth, fast convergence, summarization options, and route filtering options.)
  • If using static routing, can use GRE keepalives for failure detection
  • When using multiple headend devices for redundancy, one must be preferred so to avoid asymmetric routing; the routing metric should be consistent both upstream and downstream. 
  • Hub-and-spoke is the most commonly used deployment model.  

DMVPN

  • Hub-and-spoke sites do not scale when there are more than 10 sites.
  • DMVPN solution should be considered when there is 20 percent spoke-to-spoke traffic.  
  • On demand tunnel creation; reduces maintenance and configuration on the hubs.
  • Combination of IPsec, GRE, and Next Hop Resolution Protocol (NHRP).
  • Easier to add a node than with static hub-and-spoke
  • Backbone hub-and-spoke that allows direct spoke-to-spoke functionality 
Virtual Tunnel Interface Overview

  • Provide a routable interface type for terminating IPsec tunnels and an easy way to define protection between sites form an overlay network.  
  • IPsec tunnel associated with a tunnel interface
  • Allow interface commands to be applied directly to the IPsec tunnel
  • Supports QoS, multicast, and other routing functions that previously required GRE.
  • Dynamic or static IP routing can be used to route the traffic to the virtual interface
  • Simplifies design and configuration
  • Improves network scaling
  • Support interoperability with standards-based IPsec installations of other vendors.  
Group Encrypted Transport VPN

  • Provides a tunnel-less technology to provide end-to-end security for voice, data, and video for a fully meshed network.  
  • Largely suited for an enterprise running MPLS, as it preserves the original source and destination in the encryption header for optimal routing.  
  • Uses a group management model in which the Group Domain of Interpretation (GDOI) protocol operates between a group member and a group controller or key server.  

Managing and Scaling VPNs

Recommendations for Managing VPNs

  • Use dedicated management interfaces if possible for out-of-band management.
  • Use caution when managing VPNs across the internet.  If you cannot use IPsec to connect to remote devices, use SSH or SSL.
  • Use static public IP addresses at remote sites and crypt maps at the headend to manage remote devices through a VPN tunnel.
Considerations fro Scaling VPNs

  • Number of branch offices
    • Affects routing plan, high-availability design, and throughput that must be aggregated by the VPN headend router.
  • Connection speeds and packets per second
    • Since IPsec VPN conectiond  not have a bandwidth associated with them, it is important to consider the overall physical interface connection speeds.  
  • IGP routing peers
    • Must be maintained by VPN headend
  • High availability
    • To maintain high availability, there may be multiple aggregation tunnels to the various redundant headend devices
  • Supported applications
    • Important in determining packet-per-second rates as well as multicast support

Chapter 10: IP Multicast Design

IP Multicast Technologies

Introduction to Multicast

  • Packets are not duplicated for every receiver, but replicated on the links where receiving hosts exist.
  • Multicast groups are identified by Class D IP addresses (224.0.0.0-239.255.255.255).
  • Enables data to be sent over networks to a group of destinations in the most efficient way.
  • The source address for multicast packets is always a unicast source address
  • Applications:
    • One-to-many: 
      • One sender sends data to many receivers.
      • Typical for audio/video distribution, push media, etc.
    • Many-to-many:
      • Any number of hosts send to the same multicast group.
      • Increases application complexity.
  • Advantages:
    • Enhanced efficiency: Bandwidth used more efficiently.
    • Optimized performance:Due to traffic redundancy elimination, less processing power is required for the equivalent amount of multicast traffic.
    • Support for distributed applications
  • Disadvantages:
    • UDP delivery results in packet drops
    • UDPs lack of congestion control
    • Duplicate packet creation from UDP
    • Out-of-sequence delivery of packets
    • Security issues
  • Local address scope is 224.0.0.0-224.0.0.255 (never leave local network)
  • Global scoped is 224.0.1.0-224.0.1.255 and 232.0.0.0/8 and 233.0.0.0/8
  • 32:1 overlap of IP addresses to MAC addresses
  • Layer 3 IP multicast addresses are typically assigned statically
IGMP and CGMP
  • IGMP is used between hosts and their local router
  • Hosts use IGMP to register with the router to join (and leave) specific multicast groups
  • IGMP1v2v3
    • 1: 
      • Hosts send 'reports' to the group they wish to join
      • Must be at least one active member group on a local segment
      • NO mechanism for hosts to leave a multicast group (times out when no query response)
    • 2:
      • Deals with issues with leaving and join latencies
      • Group specific queries (query members in a single group instead of in all groups)
    • 3:
      • Adds ability to filter multicast based on the multicast source so hosts can indicated that they want to receive traffic only from particular sources within a multicast group.

Multicast with Layer 2 Switches

  • IGMP is a layer 3 protocol (switches not aware of which groups hosts belong to)
  • An administrator can manually associate a multicast MAC address with multiple ports.  (Not scalable)
  • Solution for scaling: IGMP & CGMP

IGMP Snooping

  • Dynamically "snoops" on IGMP messages sent between routers and hosts and updates its MAC address table dynamically.
  • Any multicast traffic received by the switch destined for the particular multicast group is forwarded only to the ports associated with the group in the multicast table.
  • Allows a layer 2 switch to more efficiently handle IP multicast.
  • May have a negative impact on switch performance, as each layer 2 multicast packets must be inspected.  

CGMP

  • Cisco proprietary protocol that runs between a multicast router and a switch.
  • Hosts register using IGMP with the router, which informs downstream switches of this association using CGMP.  
PIM Routing Protocol

  • Used by routers that are forwarding multicast packets
  • Uses the normal routing table in its multicast routing calculations
  • Multicast traffic does not flow to the destination until connection messages are sent toward the source to set up the flow paths for the traffic.
  • PIM dynamically creates trees that control the path that IP multicast traffic takes through the network to deliver traffic to all receivers.  
    • Source Tree: Tree is created for each source sending to each multicast group.  The source has its root at the source and has branches through the network to the receivers.
      • Separate tree is built for every S sending to group G (referring to the forwarding state association (S,G) where S is the IP of the source and G is the multicast group address)
      • Advantage of creating the optimal path between source and destination
      • Overhead required to maintain path information for each source is the biggest disadvantage.
    • Shared Tree: Single tree that is shared by all sources for each multicast group.  Shared tree has a single common root, called a rendezvous point (RP).  Sources initially send their multicast packets to the RP, which turn forwards data through a shared tree to the members of a group.
      • Forwarding state is identified by the notation (*,G)
      • Advantage of requiring minimal amount of state in each router.
      • Path between source and receivers may not be the optimal path.  
      • Source trees form from the source to the RP, and Shared trees form between the RP and the receiver.
  • Reverse Path Forwarding (RPF): Forwarding multicast traffic away from the source, rather than to the receiver.  (Source IP donates the source, and destination IP address donates a group of unknown receivers.)
  • RPF loops are prevented by inspecting packets arriving at a router.  If the packet arrives on an interface that does not match the reverse path to the source, the packet is dropped.  

Deploying PIM and RPs

PIM Deployment Models

  • Any Source Multicast (ASM):  Uses a combo of shared and source trees. Also known as PIM-SM
    • Uses a "pull" model to send multicast traffic.  Only sends traffic to receivers that request data.
    • RP administratively configured on the network.
    • Routers use their unicast routing table to determine if they have a better path to the RP or to the source directly.
    • When an edge router receives a request from a receiver to join a group, the edge router sends a (S,G) join message toward that source.
    • The last hop router knows the IP address of the RP router for a particular group, and sends a (*,G) join for this group toward the RP.  
    • Supports SPT (shortest path tree) switchover (capability to form a source tree from the edge router to the first-hop router in the event the traffic rate to the RP exceeds set threshold).
  • Bidir-PIM: Uses shared trees exclusively.  Drastically reduces the total (S,G) state information needed on the network.
    • Shared trees are unidirectional
    • In order to support many-to-many applications, a separate source tree to the RP would be required.
    • Similar to STP, Bidir-PIM uses a designated forwarder on each link, to ensure that sources can reach the RP.  
  • SSM:  Uses source trees exclusively.  Greatly simplifies the network.
    • Enables a receiver to receive content directly from a specific source, instead of receiving it from a shared RP.
    • SSM has the capability to exclude particular sources, as traffic from a source to a group that is not explicitly listed on the include list will not be forwarded to uninterested receivers.  
    • Recommended practice for one-to-many applications
    • Easy to install and provision (does not require the network to maintain which active routers are sending to which multicast groups).
  • PIM Dense:  Flood and prune (obsolete)
    • Builds source-based multicast distribution based on flood-and-prune principle.
    • Multicast packets from a source are flooded to all areas of a PIM dense mode network, and PIM routers that receive multicast packets that do not have directly connected multicast group members or PIM neighbors send a prune message back up the source-based distribution tree towards the source.
    • Reflooding of unnecessary traffic consumes network bandwidth.

RP Considerations

  • Only required in networks with shared trees (not required for SSM).  
  • Methods for RP deployment:
    • Static RP addressing: Routers statically configure RP on the network.  No RP redundancy in the event it fails, unless MSDP is running between each RP.
    • Anycast RP:  All downstream routers are configured with the anycast address of the RP.  IP routing will automatically select the topologically closest physical RP for each source and receiver to use.  Basic routing re-convergence will be performed in the event of a loss of connectivity to the RP.   
    • Auto-RP:  Dynamic way for every router in the network to learn RP information.
      • Candidate RPs announce their willingness to become the RP and a separate ''mapping agent" builds a table with the information it learns.  
      • Mapping agents then send out the consistent multicast group-to-RP mappings to all other routers.
    • Bootstrap Router (BSR):
      • RP selection protocol that supports interoperability between vendors.  
      • Similar to Auto-RP, in that it uses candidate routers for the RP function and for relaying RP information for a group.  
      • Different from Auto-RP, in that BSR does not elect the active RP for a group.  This task is left to each individual router in the network.  Each router in the network elects the currently active RP for a particular group range.  With Auto-RP, the mapping agent elects the active RP for a group range.  With BSR, each router elects the same RP for a particular group range because they all are running the same algorithm against the same list of C-RPs.  

Securing IP Multicast

Security Considerations for IP Multicast

  • Main goal is to keep multicast operating even if there are configuration errors, malfunctions, or network attacks.
  • Involves managing network resources and access control for multiple senders and receivers by defining what multicast traffic is to be allowed on the network.
  • Questions to ask, to manage multicast security:
    • Why?  To support an organization's policies or protect access to resources.
    • How?  Policing, filtering, and encryption techniques.
    • What?  Content of service, control plane of network devices, and links between devices.
    • Where?  Local to router or switch.
  • While unicast is mainly concerned with the ingress packet rate, multicast is also concerned with the egress packet rate, and the ability to replicate outgoing packets.
  • Egress state information increases with the number of applications across the number of receiver branches on outgoing interfaces.
  • For multicast traffic, filtering must be placed after the last replication point to other potential receivers.  Inversely, filtering sources must happen before the first replication point so that source information is blocked throughout the network.  
  • With SSM, unknown source attacks are not possible because receivers must join a specific source host in a specific multicast group.
  • With PIM-SM or Bidir-SM, an end device receives traffic when it joines a group, meaning an attacker cannot attack a specific receiver host but can attack against a multicast group or the entire network.
  • Types of attacks:
    • Attacks against content
    • Attacks against bandwidth
    • Attacks against routers and switches
  • Address scoping and ACLs can be configured as a form of access control for applications.  

Multicast Access Control

  • If the packet-based ACL that filters traffic is deployed at the network ingress interface on the data plane before multicast processing, no state is created for the dropped traffic.
    • Adv: Simplicity and clarity
    • Disadv: Effort required to apply ACL to an inbound interface on which multicast source might be, including all subnets on which users or servers reside.
  • IGMP access groups can be used to provide host receiver-side access control.
Multicast over IPsec VPNs

  • Multicast is not supported for most types of IPsec VPNs.
  • Key used to encrypt the packet is not recognized by all the receiving routers.  
  • RPF (reverse path forwarding) issues.  RPF verifies whether the multicast packets received on an interface points back to the source.  Packets are never forwarded back to the RPF interface.
  • To overcome replication issues, point-to-point IPsec tunnels can be used.
  • Because of RPF, packets can never be replicated on the interface in which the packets were received.  For nonbroadcast networks, this is necessary for operation.  IPsec virtual tunnel interfaces (VTIs) removes this limitation.  This association allows routing protocol adjacency and PIM adjacency to be established across the logical point-to-point links between the peers.  

Multicast over IPsec GRE

  • GRE is a tunneling mechanism that can be used to establish a logical point-to-point link.
  • Allows routing protocols to establish routing adjacency.
  • Consider the following..
    • Multicast replication takes place before encryption.  With a large number of spokes, a in a hub-and-spoke design, a stream from the hub site to many spokes generates a burst of packets that all need to be encrypted at the same time.  The overhead associated with encrypting and replicating the multicast traffic can add up quickly.
    • Topology of the overlay VPN determines unicast traffic paths and multicast distribution trees, which may result in sub-optimal routing.  Using the hub-and-spoke design, spokes will have to traverse the hub to communicate with one another.  

Multicast over DMVPN

  • Enables dynamic configuration on the hubs.  
  • Backbone hub-and-spoke design, but permits spoke to spoke functionality using tunneling.
  • Combination of IPsec, GRE, and Next Hop Resolution Protocol (NHRP)
  • Unlike unicast traffic, multicast traffic does not trigger the establishment of spoke to spoke tunnels and cannot be sent directly from spoke to spoke.
  • Multicast is only partially supported on DMVPN due to spoke to spoke limitations.

Multicast using GET VPN

  • Provides secure transport for unicast and multicast traffic on an existing private network.
  • Introduces the concept of trusted groups to eliminate point-to-point tunnels.
  • Only provides data security; relies on the underlying network for routing.
  • Does not create an overlay VPN, unlike IPsec direct encapsulation, VTI, IPsec GRE, and DMVPN.
  • Does not suffer from scalability issues caused by multicast replication.
  • Can be used across an existing semiprivate network, such as MPLS, but is less suited to be used across public networks, such as the internet.


Chapter 11: Network Management Capabilities Within Cisco IOS Software

Cisco IOS Embedded Management Tools

  • Network management affects the performance, reliability, and security of the entire network.
  • Embedded management helps the network manager better manage devices on the network.

Embedded Management Rationale

  • Verify network is working well
  • Characterize performance
  • Understand amount and location of traffic
  • Provide tools and information to troubleshoot
Network Management Functional Areas

  • ISO standard for network management: FCAPS
    • Fault Management:  Detecting, diagnosing, and correcting network and system faults.
    • Configuration Management:  Installation, identification, inventory removal, and configuration of hardware, software, firmware, and services.
    • Accounting Management:  Tracking the use of resources in a network (billing).
    • Performance Management:  Concerned with the measurement of both short-term and long-term network and system statistics related to utilization, response time, availability, and error rates.
    • Security Management:  Concerned with controlling access to network resources.
Designing Network Management Solutions
Requirements that a network management solution design is based on includes various different sources.  The  solution chosen may be based on different sources: Business goals, network operations (log of configuration changes), network architecture (how layout affects management design), or device technology (support of management features on equipment).

Cisco IOS Software Support of Network Management
Embedded management software subsystems within Cisco IOS Software that help manage, monitor, and automate network management:

  • Syslog
  • NetFlow
  • Network Based Application Recognition (NBAR)
  • Cisco IOS SLAs
Application Optimization and Cisco IOS technologies
Increasing interest in supporting application optimization in the network:

  • Baseline application traffic:  Snapshot taken to understand basic traffic and application flows
  • Optimize to meet objectives:  Apply policies and prioritize traffic so that each application has an optimal portion of network resources.
  • Measure, adjust, and verify: Use ongoing measurements and proactive adjustments to verify that the optimization techniques provide network resources needed to meet service objectives.
  • Deploy new applications: 
Syslog Considerations

  • Embedded Syslog Manager (ESM) provides a programmable framework that allows a network manager to filter, escalate, correlate, route, and customize system logging messages
  • Messages can be saved either locally or to a remote logging server
Cisco IOS Syslog Message Standard
Made up of the following: [Facility][Severity][Mnemonic][Message-text] -- may also enable time stamp.

Issues with Syslog

  • Severity of message is not consistent across platforms.
  • May provide too much informational messages fora specific problem.
  • Delivery mechanism is based on UDP

NetFlow:  Provides visibility into network behavior and how network assets are being used.

Netflow Overview

  • Provides a key set of services for IP applications, including network traffic accounting, usage-based network billing, network planning, security, DoS monitoring capabilities, and network monitoring.
  • Answers question of what, when, where, and how traffic is flowing in the network.
Principal Netflow Uses

  • Use is dependent on the organizational needs.
  • Can be used to help diagnose slow network performance.
  • Service provider use for customer accounting and billing.
  • ID applications causing congestion.
  • Help detect unauthorized WAN traffic
  • Flows identified as a combination of the matching attributes: IP source, destination, source port, destination port, layer 3 protocol field, ToS byte, and input interface.
  • NetFlow information read either through show commands or by exporting to the Flow Collector server.
  • Typically used on a central site
  • Two-tier architecture: collectors placed near key sites in the network, then aggregate and forward data to a reporting server.
  • Two-tier architecture allows remote aggregation of data and can help manage WAN bandwidth.
  • Ingress measurement technology that should be deployed on edge, aggregation, or WAN access routers.
  • May be deployed incrementally (interface by interface, router by router, etc).
  • Network designer should determine key routers and key interfaces where NetFlow should be implemented based on customer traffic-flow patterns and network topology and architecture.  
  • Important to ensure that flows are not double-counted.
  • If the reporting collection server is centrally located, implementing NetFlow close to the reporting collector is optimal.

NBAR:  Provides visibility into how network assets are used by applications.

NBAR Overview

  • Traffic classification can help organizations answer many questions about network resources:
    • Which applications run on the network?
    • What is the resource utilization?
    • Are users following application usage policies?
    • How much bandwidth should be assigned to different QoS classes?
    • How should applications be allocated and deployed most efficiently?
  • Provides full packet inspection.
  • Foundation for applying QoS policies to traffic flows on the network.
  • Enables network administrators to ID the variety of protocols and the amount of traffic that is generated by each protocol.  
NBAR Packet Inspection

  • IDs applications and protocols using info from layer 3-7.
    • Statically assigned TCP/UDP
    • Dynamically assigned TCP/UDP
    • Subport and deep inspection into layer 3 payloads
    • Native and non-native Packet Description Language Modules
NBAR Protocol Discovery

  • Maintains the following per-protocol statistics for each enabled interface:
    • Total number of input packets and bytes
    • Total number of output packets and bytes
    • Input bit rates
    • Output bit rates
  • Statistics gathered are used to define traffic classes and traffic policies for each traffic class.
NetFlow and NBAR Differentiation
Netflow is a more passive technology that monitors layers 2-4 with a focus of providing visibility into network behavior.  NBAR is to ID and classify traffic based on payload attributes and protocol characteristics.  NBAR is an active protocol that uses QoS to support optimization of application performance.

NBAR and Cisco AutoQoS

  • AutoQoS VoIP:  Provides a means for simplifying the implementation and provisioning of QoS for VoIP traffic.
  • AutoQoS for the Enterprise:  Automates deployment of QoS in a general business environment.  Creates class maps and policy maps that are based on best practices after using NBAR protocol discovery on lower-speed WAN links.

IP SLA Considerations:  ..generate probe traffic in a continuous, reliable, and predictable manner.

IP SLA Overview

  • Defines minimum and expected level of service
  • Basis for planning budgets and justifying network expenditures
  • Specifics of an SLA vary depending on the applications an organization is supporting in the network
  • Typically, the technical components of an SLA contain gaurantee level for network availability, performance in terms of round-trip time (RTT), and response in terms of latency, jitter, and delay.

Cisco IOS IP SLA Measurements

  • Network latency and response time
  • Packet-loss statistics
  • Network jitter and voice-quality scoring
  • Statistical end-to-end matrix of performance information
  • End-to-end network connectivity

IP SLA SNMP Features

  • Unlike NetFlow, which simply passively monitors the network, SLA measurements actively send data across the network to measure performance between network locations.
  • Provide a proactive notification feature with an SNMP trap.
  • Can be configured to run a new SNMP operation automatically when the threshold is crossed after a configured number of times.
Deploying IP SLA Measurements

  • First ask the question: "What must be monitored?"
  • Jitter and packet loss can be more noticeable with VoIP and Video (UDP) than with normal data transfer.
  • Most common deployment is UDP jitter

Scaling IP SLA Deployments

  • A dedicated router to for sourcing IP SLA measurement operations are often deployed in large hub-and-spoke networks at the hub site.
    • Easy to upgrade Cisco IOS software releaseon the dedicated router
    • Management and deployment flexibility
    • Allows scalability with endpoints
    • Seperate memory and CPU from hardware in switching path

Hierarchical Monitoring with IP SLA Measurements

  • For extremely large sites, you can use a mesh of IP SLA measurements at multiple points in the network.
  • Hierarchical approach allows regional aggregation routers to source IP SLA measurement traffic for access routers in each region.
    • Gives an approximate answer for end-to-end measurement.
Network Management Applications Using IP SLA Measurements

  • CiscoWorks IPM and a wide variety of vendor partners