Community discussions

MikroTik App
 
leoz
just joined
Topic Author
Posts: 1
Joined: Wed Nov 06, 2024 4:34 pm

High Latency Between Clients and Ceph OSDs via CHR Despite Good Ping and Bandwidth

Wed Nov 06, 2024 6:57 pm

Hello everyone,
this is my first post. Hope I'll do everything right :)

This is my current network setup:
- Ceph hosts with 40Gbit LACP bond each
- Clients with 20Gbit LACP bond each
- CHR with 200Gbit LACP bond ( connected to 2 stacked switches where all other hosts are connected )

CHR is virtualized on a Proxmox server where nothing else is running, has the following resources:
- 40cores ( type: host )
- RAM 32GB
- 1GB NVMe disk
- 2x100Gbit NICs are passed as PCI devices ( I noticed a better performance compared to bridge ) and LACP bonded CHR side

What I’m experiencing is high latency between clients and Ceph OSDs routed through CHR ( spans from 10ms to 50ms ), even though ping times and transfer tests are good:
- pings (from client network to Ceph network) take about 0.25ms
- running iperf3 I'm able to saturate the NICs ( 40Gbit bond LACP ) and CHR CPU usage stays within 15%

Unfortunately during a Flent test, I observed significant UDP packet loss which in case of TCP packets would cause more issues of course (attached image).

Something I noticed that when traffic increase between clients and Ceph hosts, the latency would decrease. This made me think that the latency could be a problem of CPU frequency scaling and that the scenario was going better when higher frequencies were hit. So I set a narrow range of higher frequency, from 3000Mhz to 3500Mhz, but this changed nothing.
I also noticed that the C-STATES of the Proxmox host is very high on C6, but and this could reflect on CHR routing speed ( maybe ) but I don't have counter proofs for this as I didn't try to force C-STATES to C1 at most. I would need some scheduled maintenance for this as CHR is my primary router.

I use CheckMK to monitor CHR via SNMP and I don't see anything strange from there, NICs are all up and running and no errors occur.

Do you have any experience in a setup like this one: Proxmox + CHR + Ceph ?
I'd need to understand if I'm doing something wrong in CHR or if the problem is somewhere else.
You do not have the required permissions to view the files attached to this post.

Who is online

Users browsing this forum: No registered users and 4 guests