Network resilience

Capodecina
Permabanned
Joined
31 Dec 2003
Posts
5,172
Location
Barrow-In-Furness
Can anyone point me in the direction of any good reads/tutorials on creating network resilience?

Fairly basic please, although I understand the concepts, reasoning and general ideas i'm not all knowing.

Thanks :)

(Preferably with pretty pictures, ha-ha)
 
I'm aware of spanning tree, i'm looking for more implementing resilience at the design phase. So the actual physical linking of switches and routers to create network resilience.

Thanks though I will read what you've posted!
 
depends what you want to do, if you're just interested in the concepts then have a read up on HSRP and etherchannel (I'd stay well clear of spanning tree, because basically if you ever need to use it then you've done a **** job of designing your network).

Resiliency is relatively easy to design, it's just expensive in terms of equipment. The main choice is between network resiliency (HSRP etc) or device resiliency (basically building a completely resilient router - dual power supplies, dual routing engines, multiple line cards etc).
 
i have to disagree with the spanning tree thing there.

Spanning tree is a great protocol and you can do a lot of things with it - very flexible indeed.

Worked in a few datacenters on a vast range of customers from single servers to vast arrays of servers and almost all those designs with some switch resilience built in is going to be using the spanning tree protocol somewhere to good effect.

Designing a good network doesn't just start at layer 3 - need to think about traffic flows at all levels especially those with link redundancy

jimjamuk
 
i have to disagree with the spanning tree thing there.

Spanning tree is a great protocol and you can do a lot of things with it - very flexible indeed.

Worked in a few datacenters on a vast range of customers from single servers to vast arrays of servers and almost all those designs with some switch resilience built in is going to be using the spanning tree protocol somewhere to good effect.

Designing a good network doesn't just start at layer 3 - need to think about traffic flows at all levels especially those with link redundancy

jimjamuk

My major issue with spanning tree is that it's shutting down a link completely until it's needed. If you're dealing with the sort of infrastructure I do then switch to switch links are 10Gbit, which makes a link doing nothing very bad value for money. I've haven't used spanning tree in a design for a long time, I do like cisco switch stacks for some things but I'm firmly a believer in the 'route if you can, switch if you must' philosophy...
 
It depends on the network I guess, i've done tons with all different types of spanning tree, I support the UKs biggest carrier ethernet network, which is layer 2 and runs the entire length of the country, it runs RSTP, QinQ and goes up to 10Gbps.. The failover is a lot quicker as it doesn't need to wait for routing convergence, as opposed to an MPLS network where BGP carries everything on the edge (1+ minute convergence..) its great for high bandwidth customers which are hundreds of miles apart, but its only really Vlans and trunking so the only way we can do it is with spanning tree, if the customer needs routing and high bandwidth links they normally pay for a big leased line and go via MPLS..

If I was putting anything in on a campus network of some kind, i'd avoid spanning-tree like the plague, Cisco don't recommend it now that full layer 3 switching has come right down in price (3560s etc) you're far better off running EIGRP/OSPF in the core layer with 6500s leading into 4500s/3560s in the distribution layer running SVIs, its far neater, easier to configure and troubleshoot... However much spanning-tree has saved my bacon in the past when layer3 switches weren't available, when it goes wrong - it goes wrong big time and it doesn't recover, if you think nailing a routing loop is bad, try nailing a layer 2 loop when every single interface on every single switch in the network is sending traffic in circles, remember with layer 3 you usually have many tools available which help nail a loop <trace etc> with layer 2 theres hardly anything which will even work lol..
That said - spanning tree can be quicker than any routing protocol if its setup correctly but thats half the battle... EIGRP on a medium sized LAN is plenty quick enough.
And remember, if your using spanning tree, you're most likley running PVST, which means you have layer 3 gateways somewhere which will need Layer3 routing anyway, makes sense to have everything layer 3 where you can...
 
It's the area I want to eventually be involved in, with the ultimate aim of specialising in security (firewalls, IDS etc).

Obviously you are all far more knowledgeable than me, but I thought the point of spanning-tree was to prevent broadcast storms when there is network resilience in place? (By effectively shutting down a link between switches if I remember rightly).

I understand what you are saying about that being a wasted link though and it results in traffic that could have took that route heading another way.

I think I need to read up on layer-3 switching.
 
Last edited:
I understand what you are saying about that being a wasted link though and it results in traffic that could have took that route heading another way.

I think I need to read up on layer-3 switching.


Not always the case, if you're running a largish spanning-tree network, your more than likeley going to have many different Vlans, if you run PVST (per vlan spanning tree) you can configure links with different costs and priorities, so that if you had a switch with 4 Vlans and 2 links, you could have traffic for 2 Vlans on each port, if either ports fails the reminder carries traffic for all 4 Vlans, because PVST ports will only go into blocking mode for those specific vlans, and forwarding for others. So link bandwidth isn't always wasted, its basic load balancing with spanning-tree.

Layer 3 switching is pretty much the same as layer 3 routing, in the Cisco world switches use a cef cache which is basically a cached forwarding table, it follows the rule of "route first switch after" where the first packet in a flow takes the path in the routing table (the bit which needs the processing and takes time) this then goes into the cef cache and all subsequent packets don't even look at the routing table, very complex and very fast... Some of our switch interfaces are currently approaching 1 million packets per second without breaking a sweat, all done via cef.
 

Well MPLS isn't too tied to BGP reconvergence, it depends more on the convergence time of the IGP. The next hop injected into the routing table from BGP is still valid whatever the path to that next hop and that path is calculated by the IGP rather than BGP (usually, it's how we do things anyway).

There's a lot you can do to optimise IGP convergence. We're don't use cisco in the core so we dont' like EIGRP and use OSPF instead. It's difficult to measure full convergence time but our worst case is about 4 seconds (the price we pay for that is a relatively chatty network but we have the bandwidth to play with).

True, you can load balencing by choosing different blocking ports with PVST, when you're up to a couple of hundred vlans and 3 or 4 possible paths thats a hell of a lot of configuration and complexity to worry about. If you've got decent automated provisioning software that helps but it's still a bit of a nightmare.
 
Well MPLS isn't too tied to BGP reconvergence, it depends more on the convergence time of the IGP. The next hop injected into the routing table from BGP is still valid whatever the path to that next hop and that path is calculated by the IGP rather than BGP (usually, it's how we do things anyway).

Agreed, I was more referring to PE-CE routing, where an actual BGP session is killed somewhere, and stuff gets blackholed for a short amount of time, as its such a slow protocol, even though the core has converged correctly and almost instantly, it still takes time for the other CEs and PEs to react, especially with multiple head office/DR, the failover time has always been an issue.

There's a lot you can do to optimise IGP convergence. We're don't use cisco in the core so we dont' like EIGRP and use OSPF instead. It's difficult to measure full convergence time but our worst case is about 4 seconds (the price we pay for that is a relatively chatty network but we have the bandwidth to play with)

We run ISIS and OSPF in our core (ex NTL/Telewest) you'd be insane to run EIGRP with that many routes across that many routers anyway, EIGRP was only really designed for LANs and smallish WANs.

True, you can load balencing by choosing different blocking ports with PVST, when you're up to a couple of hundred vlans and 3 or 4 possible paths thats a hell of a lot of configuration and complexity to worry about. If you've got decent automated provisioning software that helps but it's still a bit of a nightmare.

Totally agree, hence why I said avoid it like the plague in the first place, but it can be done....
 
Back
Top Bottom