First question: Are you changing the hardware queue type on the MikroTik's? What are you using and what settings?
Second question: Are you using a common template for QoS settings and would you care to share it?
Answer to First question: No we aren't. One thing you need to realize is that, at least on MikroTik routers, ethernet interfaces do not consider the packet priority at all when deciding what packets to drop - only NV2 and WMM wireless interfaces do this. Changing the hardware queue type does not change this behavior. I generally do not bother doing QoS on two routers cabled together because if it is a cable and it is at 1G and it is maxed out, you can always bring it up to 10gig by moving to an SFP+ port or increase it by combining two links in a LAG pair, and remove the possibility of congestion.
Where we do need to have the MikroTik routers queue things themselves is sometimes we go over a rate-limited link offered by another provider - ex. we may buy 100Mbps or 200Mbps from someone else to connect to one of our towers. In this case we do need QoS and this is accomplished by setting up a queue tree structure on the interface facing the provider with a parent queue and 8 children, one for each priority 0-7. We set up this queue tree on both ends of the link to do QoS over that link. On these devices, after the "set priority" action, we have seven different packet mark rules to mark the packet "scavenger" if priority=1, mark it "background" if priority=2, basically marks for all priorities except priority 0 since no-mark works nicely for that. Doing this results in the router dropping the low priority packets before the high priority packets if the queue tree is maxed out.
Here is an example of the type of queue tree setup that we would have on both sides of a 40M link through another provider:
/queue tree
add max-limit=40M name=To-Tower1 parent=ether3
add name=0_best_effort_tower1 packet-mark=no-mark parent=To-Tower1 priority=6
add limit-at=2M max-limit=2M name=7_monitoring_and_routing_tower1 packet-mark=monitoring-and-routing parent=To-Tower1 priority=1
add limit-at=2M max-limit=2M name=6_mgmt_traffic_tower1 packet-mark=mgmt parent=To-Tower1 priority=2
add name=5_ent_priority_tower1 packet-mark=ent-priority parent=To-Tower1 priority=3
add name=4_ent_tower1 packet-mark=ent parent=To-Tower1 priority=4
add name=3_retail_priority_tower1 packet-mark=retail-priority parent=To-Tower1 priority=5
add name=2_background_tower1 packet-mark=background parent=To-Tower1 priority=7
add name=1_scavenger_tower1 packet-mark=scavenger parent=To-Tower1
Note: the priority values above are the priorities for the queues, not the packets.. the mapping goes like this: packet priority 7 is queue priority 1, packet priority 6 is queue priority 2, packet priority 5 is queue priority 3, packet priority 4 is queue priority 4, packet priority 3 is queue priority 5, packet priority 2 is queue priority 7, packet priority 1 is queue priority 8 (the default), and packet priority 0 is queue priority 6.
The reason for this is that, while there are 8 queue priorities and 8 packet priorities, the scale is different: for queue priorities (allowed values 1-8), 1 is the highest priority and 8 is the lowest priority. In terms of packet priority (VLAN priority, NV2 priority, etc., allowed values 0-7), they are highest to lowest: 7,6,5,4,3,0,2,1 (*not* 7,6,5,4,3,2,1,0)
Then you would apply the packet marks depending on the packet priority (ex. if priority is 1, apply packet mark "scavenger", if priority is 2, apply packet mark "background").
We only do this queue tree setup on links from 3rd party connectivity vendors where they guarantee us a certain bandwidth amount where we are at risk of actually maxing out that amount. It doesn't make sense to set up these queue trees and packet marks if the router is only connected to radio links and perhaps direct cables to other routers, because the radios will automatically do the QoS based on CoS (and the maximum rate can be variable depending on modulation), and the cables to other routers generally won't be maxed out so the queueing becomes an unnecessary complexity with those links.
Answer to Second question: We did have a common template, but some of our routers might be missing a few things so I would have to check to see which is the best before sharing an example. However, our setup looks similar to your lab setup - the biggest difference is that we use an interface list on each router called Trust-DSCP. If a packet is coming from an interface in this interface list, we trust the DSCP value. If it is not, we reset the DSCP to zero (if we want it to be best effort) or change it to what we want (if we don't want it to be best effort). That way the customer cannot set the DSCP. If the packet is coming from another router that has already set the DSCP and arrives via an interface in the Trust-DSCP interface list, we assume the DSCP on that packet is correct and simply set the priority based on that. That way you address pe1chl's concern that you don't really need to keep changing the DSCP over and over again on each router.