btest 127.0.0.1 CHR verses CRS326-24G-2S+ (which is faster ?)
Just for my information - FYI … I thought I would compare CPU performance between a (physical) Mikrotik CRS326-24G-2S+ and a (virtual VMware ESXi hosted) CHR
((( Yea I know this may not be a fair test - but it is one of several possible tests to consider when deciding on which Mikrotik solution may be needed/wanted when deciding on a router )))
Using this CLI command; tool bandwidth-test direction=transmit interval=00:00:05 protocol=udp user=admin password=MyPassword address=127.0.0.1 duration=20
I got the following speed test results:
CRS326-24G-2S+
1621.9 Mbps (aka 1.6 Gbps)
CHR
23.6 Gbps ( greater than 14 times faster than the CRS326-24G-2S+ )
And my VMware ESXi box used for testing the hosted/guest CHR was a 2-year-old Intel Xeon box that was not even the all that fast.
I sure wish Mikrotik has some near-future plans/designs in the works for much faster CPU hardware based platforms which can carry the load of small-carrier grade routing performance in the near 10-Gig range and faster.
The CRS is a switch with a weak CPU not meant for routing so if you want to test you should test against a CCR instead.
Also pushing against loopback I wonder what that really shows other then bus speed of the server.
I have a old HP ProLiant DL380 G6 at home and I can get simular numbers with that when testing bandwidth between two servers (VM) running switching all within the hardware server.
Re: …The CRS is a switch with a weak CPU not meant for routing…
The “R” in CRS326-24G-2S+ stands for “Router”
And yes , I know it is not a high-end router , but it does have some 10-Gig ports. Thus as a router, I question how well/fast it can handle routing between two 10-Gig ports.
Re: …Also pushing against loopback I wonder what that really shows other then bus speed of the server…
Because 127.0.0.1 is an internal software loopback interface, the btest is actually giving you a decent base-line of information for:
On a single CPU (best only uses 1 CPU)
Maximum CPU processing ability
Maximum CPU throughput to/from RAM memory
Clock speed of the CPU is only part of what makes a computer appear to run fast. Throughput also needs fast memory & possibly internal CPU built-in high-speed cache & very little or no CPU clock & I/O wait-states and large hardware based buffers.
As a pure switch, it may be near line-speed (depending on the chipset – internal&external bus). However , things like software based filters, bridges, firewalls, access-lists, routing, rate-limit queues and system services all need CPU processing resources.
Badly. It’s a SWITCH with a bit of routing capability. The 10G ports are designed for SWITCHing not routing. Routing with a weak CPU will ALWAYS be slow. Testing it is essentially pointless, as you seem to have found out.
btest to/from a device with a weak CPU is also pointless. It will be slow. What does that prove? Nothing.
As has been said many times before, you need to test THROUGH a device to/from sources/destinations with enough capability.
Metro network - switch located in a building to many customers.
Lets say you have a 10-gig fiber in and a 10-gig fiber out - both 10-gig fibers are switched/bridged/connected.
Now you have 20 Ethernet ports to 20 different customers in the building.
Cust 1-of-20 is 100 meg
Cust 2-of-20 is 200 meg
-repeat 3 through 7-
Cust 8-of-20 is 800 meg
Cust 9-of-20 is 900 meg
Customers 10-through-17 are 750 meg
Customer 18 and 19 is only using you as a layer 2 bridge between different points in the building and paying for a 250 meg link.
Customer 20 wants a full 1-gig internet account
Now you need to manage bandwidth to ensure your customers only get what they are paying for.
So at what point does this not require a Router or require serious CPU processing abilities somewhere in the bandwidth management configuration ?
Keep in mind - if the CPU has to touch everything , then you might be limited to 1621.9 Mbps total system throughput for all interfaces combined.
That is why you need to design your network so that bandwidth management is done on hardware with fast CPU before the switch, not on a switch with weak CPU.
mrz — Yup - I agree 100 percent
Large & high-throughput L2/L3 networks deserve serious consideration and planning on the capabilities of all devices and configurations.
20 years ago , 10 meg was considered a fast network. 10 years ago , 100 and gig was considered a fast network. Today 10-gig is norm of a business server network. Currently - and in the very near future, 20+/40+ gig networks will be the norm.
L2 switching is pretty much straight forward at the hardware level, but when you need to configure L3 devices to operate at routing multiple 10+ gig networks on many different interfaces, I have some concerns where and how Mikrotik is going to handle this.
Question - what is the fastest Mikrotik hardware solution for multi L3 networks in the 5 to 50 gig range? ((( This is what I would like to see developed at Mikrotik so that Mikrotik can move into the mid-high end carrier-grade stuff ))) . The reason I ask is because as an ISP with fiber to customers and business I am currently already L3 routing on many 10-gig interfaces to 10-gig connected customers with greater than 1-gig Internet connections per customer.
FYI - I am a huge Mikrotik fan. I have several thousand Mikrotik devices in my networks everywhere. If I had it all to do over again, I would have even more Mikrotiks.
Tom, I think that’s not entirely correct and out of context: CRS stands for Cloud Router Switch, meaning a switch for the “cloud router”. It si not “cloud router and switch”. The routing capabilities are just a side effect of ROS as a unified OS on these devices.At least this is how I see it (after, of course hitting the ceiling with one of them - the CRS125 in my case).
A somehow fair comparison would be a CCR1036-8G-2S+, to have at least the needed interfaces for that speed.
I don’t understand what you thought you were achieving? You’ve tested a good CPU against a bad one and the good one was better.
If you take a minute to look at the spec sheets it hits you in the face the switching and routing performance of the CRS series. They clearly switch very well but bog down dramatically with throughput (take note of that word also THROUGHPUT) once you start routing.
Throw into the mix also that the BT function of routeros really is a meh of a meh and isn’t meant to showcase your shiny new switch doing 10GB routing. If you REALLY care about throughput then you would be testing correctly using iPerf on 2 high performance machines either side of the test subject. Also the figures show throughput, not traffic generated by the switch.
Yes there’s an R in CRS but ultimately it ends in S for Switch. All of the blurbs say they are switches with some L3 functionality. Why would MT make a killer device at the 399 mark when they have CCR’s running into double and triple that? Surely anybody clued up enough to be thinking about 10G routing has the knowledge to look at a spec sheet and use the information correctly?
And yes - I know this is not a true representation of the CPU in this CCR1016-12S-1S+ (tile) Mikrotik router. However it does provide some useful information which helps indicate the CPU processing throughput. This does raise some interesting questions/thoughts about how fast it can route multi-L3-interface traffic from/to twelve 1-Gig interfaces and/while-also routing from/to the single 10-Gig interface when total desired routed throughput of all interfaces combined needs to sustain 1500Mbps (1.5 Gig) ) up-to full network speeds on all ports routing at full L3 port speeds at the same time.
When I can locate a 10-gig cable, I will perform a btest between two different 10-gig connected x86 virtual routers which route through this Mikrotik CCR and post by results.
North Idaho Tom Jones
FYI - I really do like all of Mikrotik devices -and- I want more Mikrotiks in my networks
That is the near-future plan.
Use the 10-Gig interface and connect up about a dozen 1-gig routed customers
So - with a 10 gig feed and all 1-Gig routed ports running at full speed to different customers , will a CCR do the job ?
I can’t see why a CCR wouldn’t. I think your limiting factor would only be amount of ports on the CCR you choose.
I guess it depends on how your back end network is built, if you are actually routing to the customer endpoints then you need additional grunt due to the routing, if you aren’t routing and are connecting the customers to your backend network and the routing is done “upstream” such as a PPPoE setup where you only need to connect the customer to your L2 then you may get away with using a CRS to switch into it.
Either way it all sounds very exciting once you mention the magical 10G!
Tom, It’s not ok at all. You are hitting the limits of the bandwidth test tool running on the router, not the routing capability of the device.
For real results, run the bandwidth test between two high performance machine with the device in the middle so you do really evaluate the routing performance, not the speed of a single core running some tool.
First run the btest between them, see if they could max out a 10G link, and insert the router in the test loop afterwards.
Re: …Either way it all sounds very exciting once you mention the magical 10G! …
Yea - kinda like hitting 8th gear floored on a very very long straight away, no curves, no stop lights, no hills, no bumps, no pot holes, no speed limits.
((( No Nat, No OSPF, No BGP, No filters, no firewalls, no rate limiting, just pure static IP routing fed from a full 10-gig pipe to a dozen 1-gig customers. )))
One route & one WAN from my head-end 10-gig connected primary Internet router to this CCR.
The CCR then subnets the routed IPs to a dozen customer-WANs and routes some subnets to the dozen customers also.
On the CCR, the total sum of the 1-gig customers can potentially out-run a the 10-gig WAN - however I only expect about 30 percent running at 1-gig at a time.
You can’t test a CCR this way. The bandwidth test tool is single threaded: it will use only one of the 16 available cores. You need to use two powerful computers to generate the traffic going through the CCRs, one at each end. Firewall, routing, queues… all of this is multi threaded, so they can (and will) use all the cores. The btest… not so much.
Re: … The bandwidth test tool is single threaded … Well - I am not quite sure that is totally true
I discovered I can get all CPUs running at 100 percent with btest.
To do so , I did this:
16 winbox MAC connections
then each winbox , I types in this:
tool bandwidth-test direction=transmit interval=00:00:05 protocol=tdp user=admin address=127.0.0.1
-and/or this
tool bandwidth-test direction=receive interval=00:00:05 protocol=tdp user=admin address=127.0.0.1
-and some of these:
tool bandwidth-test direction=transmit interval=00:00:05 protocol=udp user=admin address=127.0.0.1
I am guessing that btest using tcp just might use a different CPU
Try it
So now, I am looking for a CPU throughput measurement combination of tests/tools/commands method(s) with all CPUs at 100 percent which will produce a simpel baseline number that can be used against other mktic devices
This way You will have 16 instances of btest running - each one of them single threaded. Problem is: each one of them will use CPU resources to generate/receive traffic. You don’t have 100% of the 16 cores available to process the traffic, and so your result is not comparable to real life usage. Take a look at the profile, and see how much CPU is used by the btest processes per se.