I am interested in your ways to identify a bottleneck within a network.
In my case, I've got 2 locations, one in UK, one in Germany. Hardware is Fortigates for FW/routing and switches are Cisco/HPE. Locations are connected through an Ipsec VPN over the internet and all internet connections have at least a bandwidth of 100 Mbps.
The problem occurs as soon as one client in UK tries to download data via SSH from a server in Germany. The max download speed is 10 Mbps and for the duration of the download the whole location in UK has problems accessing resources through the VPN in Germany (Citrix, Exchange, Sharepoint, etc).
I've changed some information for privacy reasons but I'd be interested in your first steps on how to tackle such a problem. Do you have some kind of runbook that you follow? What are common errors that your encounter?
(independently from my case too, just in general)
EDIT: Current list
packet capture on client and server to check for packet loss, latency, etc. - if packets dropped, check intermediate devices
check utilization of intermediate devices (CPU, RAM, etc)
check throughput with different tools (ipfer3, nc, etc) and protocols (TCP, UDP, etc) and compare
check if traffic shaper/ QoS are in place
check ports intermediate devices for port speed mismatch
MTU/MSS mismatch
is the internet connection affected too, or just traffic through the VPN
Ipsec configuration
turn off security function of FW temporary and check if it is still reproducible
traceroute from A to B, any latency spikes?
check RTT, RWND, MSS/MTU, TTL via pcap, on the transferring client itself and reference client, without and while an active data transfer
Prob not related but noteworthy:
check I/O of server and client
I'll keep this list updated and appreciate further tips.
Update
I had to postpone the session and will do the stress test on Monday or Tuesday evening. I'll update you as soon as I have the results.
Update2
So, I'll try to keep it short.
First iperf3 over TCP run (UK < DE) with same FW rules let me reproduce the problem. Max speed 10 Mbps, and DE < UK even slower, down to 1-2 Mbps. Pattern of the test implies an unreliable connection (short up to 30 Mbts, then 0, and so on). Traceroute shows same hops in both directions, no latency spikes, all good.
BUT ICMP and iperf3 over UDP runs show a packet loss of min 10% and up to 30% in both directions!
Multiple speed tests to endpoints over the internet (UK>Internet) showed a download of 80 Mbts andupload of like 30 Mbts, which indicates a problem with the IPSec tunnel.
Some smaller things we've tried without any positive effect:
routing changes
disabling all security features for affected rule set
removed traffic shaper
Port speed/duplex negotiations are looking good
and some other things that I already forgot
Things we prepared:
We have opened some tickets at our ISPs to let them check it on their site > waiting for response
Set up smokeping to ping all provider/public/gw/ipsec endpoinrts/host IPs and see where packets could be dropped (server located in DE)
Planned a new session with an Fortigate expert to look in-depth into the IPSec configuration.
Need to do:
look through all packet captures (takes some time)
Are you sure that the download speed is 10Mbit/s and not 10Mbyte/s which would be close to saturating the 100Mbit/s link and would explain the other symptoms you are seeing?
Have you checked for resent packets or connection resets or similar things that might use up more bandwidth than the successfully received packets? I would probably use Wireguard for that.