Some time ago I was running through a SteelCentral Demo with a partner in the region when they said the root cause of a problem they were having was excessive TCP retransmissions. They seemed unhappy with that, and they were unsure on what to do next. This is a prime example where knowing the basics will take over, and lacking basic knowledge will prevent you from figuring out how to solve this problem.

To figure out what is wrong boils down to the questions: how does TCP work and when will TCP retransmit? As TCP is a reliable protocol, this means that packets sent are acknowledged that they are received before the sliding window algorithm can move on. If an acknowledgement is not received then the same packet is resent until an acknowledgement is heard. So if there are TCP retransmissions that boils down to two main scenarios:

  1. The traffic never made it to the receiver or it was received and failed a data integrity check and therefore never sent an acknowledgement
  2. The acknowledgement was lost returning to the sender and was never received, OR the acknowledgement took too long (received after timeout), therefore incurring a retransmission

At this point more knowledge of the network is needed and both sides need to be examined/excluded as cause. If there are packet drops, we then need to look at causes of packet loss in the network. Some prime examples include:

  • Incorrectly configured QoS/rate limiters/queue drops
  • Over taxed network equipment which causes packet loss/delay
  • A security device in the network (or client side) that drops traffic (IPS, stateful firewall)
  • Asymmetric routing or other routing issue (black holes)

We also need to look at reasons for data corruption/excessive latency to cause retransmission.

  • Dirty/incorrect fiber types
  • SMF over very short distances (yes, using Single Mode Fiber at a short distance will cause reflections along the fiber and “echoes” that can cause incorrectly received bits at the receiving end)
  • Asymmetric routing/routing issues which can cause latency
  • Other environmental issues (I have seen things like rats gnawing on fiber and even fiber that had been thrown over a hot water pipe and was partially melted)

Please keep in mind that these are not comprehensive lists, just a “top of my head” list that I thought of in a few minutes, but again, basics are what leads me to figure out my next steps and how to address this issue.