Transcription

Troubleshooting Riverbed Steelhead WAN OptimizersEdwin Groothuis

Public DocumentPublic DocumentTroubleshooting Riverbed Steelhead WAN OptimizersEdwin GroothuisPublication date Mon 19 May 2014 10:59:17Copyright 2011, 2012, 2013, 2014 Riverbed Technology, IncAbstractGeneral:All rights reserved.This book may be distributed outside Riverbed to Riverbed customers via the Riverbed Support website. Exceptfor the foregoing, no part of this book may be further redistributed, reproduced or transmitted in any form or by anymeans, electronic or mechanical, including photocopying, recording, or by any information storage and retrievalsystem, without written permission from Riverbed Technology, Inc.Warning and disclaimer:This book provides foundational information about the troubleshooting of the Riverbed WAN optimization implementation and appliances. Every effort has been made to make this book as complete and accurate as possible,but no warranty or fitness is implied.The information is provided on as "as is" basis. The author and Riverbed Technology shall have neither liability norresponsibility to any person or entity with respect to any loss or damages arising from the information containedin this book.The opinions expressed in this book belong to the author and are not necessarily those of Riverbed Technology.Riverbed and any Riverbed product or service name or logo used herein are trademarks of Riverbed Technology,Inc. All other trademarks used herein belong to their respetive owners.Feedback Information:If you have any comments regarding how the quality of this book could be improved, please contact the author.He can be reached via email at [email protected] or at [email protected], hispersonal website can be found at http://www.mavetju.org/.

Public DocumentPublic DocumentTable of ContentsPreface . vi1. Introduction to WAN Optimization . 11.1. Index . 11.2. Why WAN optimization? . 11.3. Three different optimization methods . 21.4. Effects of WAN optimization on other equipment . 62. The Riverbed Approach to WAN Optimization . 92.1. Index . 92.2. Appliance Setup . 92.3. Basic configuration of a Steelhead appliance . 132.4. The layout of an optimized TCP session . 152.5. The setup of an optimized TCP session. . 172.6. WAN Visibility . 222.7. The optimization protocol . 222.8. Hardware models . 233. The Command Line and Tools . 343.1. Index . 343.2. Dealing with the Command Line Interface . 353.3. Tcpdump, the network packet capture tool . 383.4. Tcpdump-x . 483.5. Ping . 493.6. Traceroute . 523.7. Telnet . 553.8. Tproxytrace . 563.9. Nettest . 573.10. SSL connect . 604. Installation Related Issues . 644.1. Index . 644.2. Installation steps for fail-to-wire scenarios . 644.3. Fail-to-Wire . 654.4. Cables . 664.5. Network Interface Card speed issues . 684.6. IP Subnet configuration related issues . 704.7. Wrong location . 734.8. In-path support on interface not enabled . 744.9. Port security . 754.10. Link State Propagation (LSP) . 764.11. VPN Concentrators . 774.12. Licenses . 784.13. LAN and WAN cable switched . 814.14. After an RMA . 824.15. Best cabling for remote management . 835. Operation Related Issues . 845.1. Index . 845.2. Hardware related issues . 855.3. Admission Control . 895.4. Auto-negotiation and duplex issues . 955.5. LAN speed maxed out . 965.6. Watchdogs . 975.7. Unexpected restarts of the optimization service . 975.8. Unexpected reboots . 1005.9. Downgrade and upgrade issues . 1025.10. Time related issues - The NTP service. . 1055.11. Firewalls in the path . 1085.12. Data Store Synchronization . 109iii

Public DocumentTroubleshooting Riverbed Steelhead WAN OptimizersPublic Document5.13. Data store related errors . 1115.14. WCCP issues . 1135.15. Asymmetric Routing . 1165.16. IP Routing Related Issues . 1235.17. Alarms and health status . 1275.18. Long lived TCP session treatment . 1435.19. Order of the in-path rules . 1455.20. Network monitoring and network management systems . 1475.21. Web proxies and web caches . 1485.22. Security . 1495.23. The Riverbed Services Platform . 1615.24. Access related problems . 1635.25. Partitions running out of space . 1675.26. Memory usage on the Steelhead appliance . 1695.27. LAN side traffic is seen on the Steelhead appliance . 1705.28. Service Error . 1725.29. Can this application be optimized? . 1735.30. Interceptor cluster related issues . 1755.31. Expiring SSL certificates . 1785.32. High CPU related issues . 1795.33. Central Management Console related issues . 1866. System Dump . 1896.1. Index . 1896.2. Creation of a system dump . 1896.3. Configuration of the system . 1906.4. Log files . 1916.5. Hardware Information . 1956.6. System Information . 2126.7. Simplified Routing related files . 2246.8. ATA controller SMART data . 2256.9. Asymmetric routing table . 2256.10. The Image History file . 2266.11. The file "lsof" . 2266.12. Memory dump of the process sport . 2276.13. Out-of-memory profiling . 2286.14. CIFS pre-population related data . 2296.15. RSP related data . 2306.16. SAR statistics . 2316.17. Active Directory Integration . 2316.18. Process dumps . 2327. Different Network Troubleshooting Scenarios . 2347.1. Index . 2347.2. Traffic is blocked when the Steelhead appliance goes in by-pass. . 2347.3. Traffic is not optimized between two sites. . 2357.4. An optimized TCP Session gets reset or hangs . 2437.5. Slow network . 2457.6. Connection-based Admission Control . 2488. Latency Optimization . 2528.1. Index . 2528.2. Introduction to Latency Optimization . 2528.3. CIFS Latency Optimization . 2548.4. CIFS Pre-population . 2578.5. NFS Latency Optimization . 2658.6. MAPI Latency Optimization . 2658.7. Windows Active Directory integration . 2668.8. MS-SQL Latency Optimization . 2688.9. FTP Latency Optimization . 2688.10. HTTP Latency Optimization . 269iv

Public DocumentTroubleshooting Riverbed Steelhead WAN OptimizersPublic Document8.11. SSL Pre-optimization .8.12. SCA Latency Optimization .9. Logging .9.1. Index .9.2. Logging format .9.3. TCP Optimization .9.4. OOB Splice and Connection Pool .10. Further help and support .10.1. Riverbed websites .10.2. Documentation .10.3. Riverbed TAC .A. Jargon .B. Troubleshooting workflow for network performance related issues .B.1. Step One: Does the TCP session get optimized? .B.2. Step Two: Does TCP optimization and data reduction work? .B.3. Step Three: Does latency optimization work .v275279286286286287294296296296297299302302306307

Public DocumentPublic DocumentPrefaceIntroductionWelcome! This book is about troubleshooting Riverbed Steelhead appliances and networks optimized by the Steelhead appliances. It contains the experiences with cases handled by the Riverbed TAC on how to detect the sourceof problems in networks with Steelhead appliances and how to troubleshoot problems with Steelhead appliances.For many years, networking has been fun but it has become relative boring: Moving packets around, processingrouting updates for reconnected networks and linking up different transport layers. Everybody knows and understands IP routing and troubleshooting it is a pretty straight forward process. WAN optimization is a new field, aSteelhead appliance is not just a router or switch where packets go through, it is a hop in the network where packetsget mangled, contents get changed and some magic happens: Shazam! Knowing how this magic works will helpyou understand what is happening and resolve issues with the Steelhead appliances faster and more efficient.Edwin.Intended AudienceThis is not a book on how to manage and deploy Steelhead appliances, the Riverbed Deployment Guide and theSteelhead Management Console User's Guide (both available from the Riverbed Support website) are the rightbooks for that. This is a book about what can go wrong with Steelhead appliances in a network. This book is foreverybody who has two or more Steelhead appliances in their network and wants to understand how the Steelheadappliances interact with other devices in the network, what can go wrong and how to use the features on theSteelhead appliance to troubleshoot it.The easiest way to troubleshoot Steelhead appliances is to know what is expected to happen when you poke andprod it. Depending on the behaviour seen, the next steps can be easily determined. Very simple steps, very logicalsteps, sounds nearly like a normal networking troubleshoot approach. And that is often all what is required.This book describes experiences, possible issues and troubleshooting with the xx20, xx50, CX and EX seriesSteelhead appliances and software versions up to, and including, RiOS 8.5.Organization of this bookThis book is split in several chapters; The first chapters with some background on WAN optimization, the nextset of chapters are about tools available on Steelhead appliances, followed by the troubleshooting parts.vi

Public DocumentPublic DocumentChapter 1. Introduction to WANOptimization1.1. IndexThis chapter describes the background of WAN optimization in generic terms, from a vendor neutral point of view. Why WAN optimization is needed: Where are the delays? The three different optimization methods: The network transport layer, the packet payload and the applicationprotocol. Effects of WAN optimization on the networks, servers and clients.1.2. Why WAN optimization?WAN optimization is a set of techniques that address various issues experienced in a Wide Area Network (WAN): Link related issues, caused by physical limitations. Protocol related issues.1.2.1. Link related issuesA network link has two characteristics: The link speed, which defines the speed of the data through the physical layer. The link delay, which defines how long it takes for a bit to reach the other side of the link.1.2.1.1. Historical changes in link speedThe link speed is the number of bits per second which can be transferred over a link. In general, the speed of theLAN is much faster than the speed of the WAN.Around 1995, the speed of a WAN link was measured in multitudes of 64 kbps and a LAN link was still workingon a 10 Mbps shared coax cable. It wasn't for another five years in 2000 that an E1 link at 2 Mbps became thestandard speed for WAN links and that a 100 Mbps Ethernet cables towards a LAN switch became available forthe desktop. Five years later in 2005 the general speed for a WAN link was 10 Mbps and the network switch wasconnected to the desktop at gigabit Ethernet speed (1000 Mbps).This LAN/WAN speed difference made it in the past necessary to have services such as file servers, mail serversand web proxy servers, on the local LAN on the remote location (the branch). However, the speed of the WANlink in 2005 is now fast enough to consolidate them into a data center.Consolidation of these remote servers into data centers made the management of the machines easier: Reduction ofthe number of machines by consolidating then, better control over the environment the machines are operating in,easier implementation of fail-over scenarios because of the availability of high bandwidth needed for replication,and a simplified management of the services running. However, the bandwidth towards the client was reducedfrom gigabit speeds back to pre-2000 speeds of 10 Mbps.1.2.1.1.1. Serialization delayThe link-speed defines the serialization delay, the time it takes for a packet to be converted from set of bytes inthe memory of a host (computer, router, or switch) into a string of bits on the wire.1

Public DocumentIntroduction to WAN OptimizationPublic DocumentFor example, to forward one packet of 1500 bytes through a router with a WAN interface of 1 Mbps, it will take12 milliseconds (1500 bytes * 8 bits per byte / 1 000 000 bits per second 12 ms).If WAN optimization reduces that packet of 1500 bytes to a packet of 100 bytes, this serialization delay will bereduced to 0.8 milliseconds.Figure 1.1. Serialization delay Hopdelay / / / / / / / / / ----------------------------------------------- network hop distanceclient WAN routerWAN routerserver1.2.1.2. Link delayOn longer distances, the speed the data travels through the medium comes into play. For example, the speed oflight in a fiber cable, which is about 5 microseconds per kilometer (1.5 / 300 000 km/s, where 1.5 is the refractionindex for fiber{SOURCE:Wikipedia Latency (engineering)}). For a fiber cable with a length of 1000 kilometersthis would add 5 ms for the signal to reach the other side.1.2.2. Network protocolsThe TCP network protocol has had lots of improvements over the last years, but not all TCP stacks support allfeatures. By terminating the TCP session locally on the LAN and setting up a new one which does support all thenew features, the WAN optimizer can make these TCP sessions faster over the WAN.Some protocols are smart, like the connection-oriented TCP protocol, which makes sure that all the data sent bythe sending application is presented to the receiver application in such a way that nothing is missing and that itis in the right order.1.2.3. Application protocol related issuesApplication protocols are the layer in which the client and the server talk to each other.Application protocols can be unintentionally implemented with a special environment in mind. For example resultsfrom database queries can be requested either in bulk or one by one. The first implementation works great overa WAN but needs more memory on the client to store all the answers, while the second implementation worksbad over a WAN but doesn't need as much memory. Serialization delay is here the cause of the problem for aWAN deployment.1.3. Three different optimization methodsWAN optimization attacks the issues mentioned previously in three different ways: Transport layers optimization. Data reduction. Latency optimization.2

Public DocumentIntroduction to WAN OptimizationPublic Document1.3.1. Transport layers optimizationThese days, the most popular transport layers are Ethernet, IP and TCP.Figure 1.2. A typical Ethernet -------------------------. Ethernet IP TCP Payload --------------------' ------------------------------ 1460 bytes ----- ------------------------------------ 1480 bytes ----- ----------------------------------------- 1500 bytes ----- Ethernet is the local network layer to the next hop in the network. The default maximum payload for an Ethernetframe size is 1500 bytes. IP is the layer which takes care of the routing of the packet in the network. TCP is the layer which takes care of the data exchanged between the client and server and which makes surethat the data is fed to the receiving application in the same order as the sending application has delivered it. Ittakes care of the retransmission of lost packets and time-outs, if there is a problem it backs off a little bit andallows the network to recover.Standard TCP has the following features and limitations: The TCP Sliding Window method, which makes it possible for the sender to keep sending data until the limitfor maximum number of bytes in flight as specified by the receiver is reached. In the original TCP designthis was limited to 64 kilobytes{SOURCE:RFC 793}.Figure 1.3. Bytes in-flight over time.BytesInflight -- -- -- -- -- -- -- -------------------------- --- time \ First acknowledgement received The original TCP Acknowledge method specifies that the receiver can only acknowledge packets it has received. It cannot tell the sender that it has missed packets but has to wait for a timeout in the acknowledgement from the server. Then the server has to retransmit all the unacknowledged packets.Figure 1.4. Retransmission due to a time out in the acknowledgement -- -- -- -- -- -- -- -- N N N N N N N N - - - 3 2 1 1 2 1 2 ----------------------------------------- --- time ACK timeout\\ TCP Window full\ Lost packet The TCP Window Size uses a Slow Start mechanism which starts with a Window Size of 2 * SegmentSize{SOURCE:RFC 793} and increases it with one Segment per received acknowledgment packet.3

Public DocumentIntroduction to WAN OptimizationPublic DocumentIn case of packet loss it will half the TCP Window Size and slowly increase the Window Size again perreceived acknowledgment packet.Figure 1.5. Retransmission due to time outWindow Size -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --------- --- time First ACK\\\\ Retransmission\\\ Lost packet\\ TCP Window maximum size\ First ACKTransport layer optimization can add the following features for the traffic between two Steelhead appliances: For Ethernet there is not much which can be improved here, unless there is full control of all the WAN devicesbetween the clients and servers, in which case Jumbo frames can be used.The payload of Ethernet Jumbo frames can be increased up to 9000 bytes {SOURCE: Wikipedia Jumbo Frame}.This means the IP payload, instead of having to split a stream in pieces of 1480 bytes, can be split in piecesof 8980 bytes. For the routers in the path this means that they have to process only one sixth of the numberof packets. However on the TCP level the improvement is not comparable since the maximum number

The three different optimization methods: The network transport layer, the packet payload and the application protocol. Effects of WAN optimization on the networks, servers and clients. 1.2. Why WAN optimization? WAN optimization is a set of techniques that address various issues experienced in a