Posted by: Eric Siegel
End of the month, and I've been going over the list of the "dialogues" I had with Burton Group clients to see if there's something that might be of general interest.
So I uncovered one that I participated in with my colleague Ken Agress of Burton Group's consulting organization, and it's interesting because I think that the topic of QoS congestion on links into remote branches isn't given enough consideration in QoS network design.
The basic symptom of the problem was that despite QoS on the routers, there was excessive packet loss on high-priority flows into the branches on a hub-and-spoke network design.
But that's because the server-room router was trying to push too much data into the network cloud for a specific branch office. The discards weren't occurring at the server room boundary router into the cloud, because the fat access link from the server room will easily accommodate lots of traffic to the branch, especially if the other branches aren't receiving much traffic at that time.
If you have a large bandwidth access link into a cloud, but just a small bandwidth access link out of the cloud to a branch office destination, the cloud will need to discard data packets on high-bandwidth flows or during bursts, because the cloud itself (i.e., the backbone routers) has minimal buffering.
True, a single TCP flow will self-adjust (by "self-clocking") to a bandwidth-restricted exit from the cloud because its acknowledgements will be delayed, but if there are a lot of parallel flows, or there are non-flow-controlled UDP flows, you'll be in trouble. TCP flows will get the "global synchronization problem," in which all TCP flows ramp up in bandwidth concurrently, lose packets concurrently, reset their flow rates downwards concurrently, and then repeat. RED or WRED or FRED mechanisms can help avoid global synchronization, but you'll still lose packets.
So, how should this be handled? If you own the cloud or can signal to the cloud's administrator (e.g., by setting DSCP fields or EXP tags on packets going into a MPLS cloud), then the cloud can pick which packets to discard if the exit bandwidth isn't sufficient at the branch office. And the tags will, we hope, tell the cloud to discard packets from low-importance flows. RED / WRED / FRED on those flows will then get those low-importance flows to cut their bandwidth use. Those low-importance flows will be losing packets and will be miserable, but the higher-importance flows will get through.
A better idea, we think, is to avoid letting the cloud do your packet discards -- especially if you don't actually own/administer the cloud directly. Don't give it more traffic than it can handle; use traffic shaping on subinterface definitions to ensure that you never feed more data into the cloud than can exit at the destination. You will probably do a better job of discriminating among flow types and of smoothing brief traffic spikes (e.g., your routers probably have larger buffers than the cloud's routers have).
For example, see the following Cisco documents:
href="http://www.cisco.com/en/US/docs/ios/qos/configuration/guide/reg_pkt_flow_shaping.html" target=_blank>Regulating Packet Flow Using Traffic Shaping href="http://www.cisco.com/en/US/tech/tk543/tk545/technologies_tech_note09186a0080114326.shtml" target=_blank>Applying QoS Features to Ethernet SubinterfacesOr you could install a Blue Coat PacketShaper in the server room's exit path to do the traffic shaping for both TCP and UDP. And the new PacketShapers can also do compression.
