Currently, MikroTik's VXLAN implementation lacks two critical features that are standard in enterprise networking equipment:
-
Explicit VTEP (VXLAN Tunnel Endpoint) configuration - VTEPs are implicitly created and cannot be managed as independent objects
-
VTEP session state management - No visibility into VTEP health, reachability, or aggregated statistics
These limitations make large-scale VXLAN deployments difficult to manage, troubleshoot, and monitor effectively.
Current Implementation Problems
Problem 1: Implicit VTEP Creation
Currently, VTEPs are created implicitly based on the local-address parameter of each VXLAN interface:
/interface vxlan
add name=vxlan100 vni=100 local-address=10.0.0.1
add name=vxlan200 vni=200 local-address=10.0.0.1
add name=vxlan300 vni=300 local-address=10.0.0.1
/interface vxlan vteps
add interface=vxlan100 remote-ip=10.0.0.2
add interface=vxlan200 remote-ip=10.0.0.2
add interface=vxlan300 remote-ip=10.0.0.2
**Issues with this approach:**
1. **Configuration Redundancy**: The `local-address` must be repeated for every VXLAN interface (30 times for 30 VNIs)
2. **No VTEP Object**: VTEPs exist only as implicit side effects, not as manageable first-class objects
3. **Error-Prone**: Easy to accidentally specify different `local-address` values, creating unintended multiple VTEPs
4. **Unclear Reuse**: Whether VTEPs are shared depends on "accidental" matching of IP addresses
5. **Difficult Bulk Operations**: Changing VTEP properties requires modifying all associated VXLAN interfaces
6. **No Validation**: System cannot verify if a VTEP configuration is correct until runtime
### Problem 2: No VTEP Session State
There is currently no way to view VTEP-level state information:
What we can see:
/interface vxlan print detail
Shows: interface running/down
/interface vxlan vteps print
Shows: configured remote-ip only
What we CANNOT see:
Is the remote VTEP actually reachable?
Is the VTEP session established?
How long has the VTEP been up/down?
What is the aggregate traffic for all VNIs using this VTEP?
Are there any VTEP-level errors or packet drops?
Which VNIs are active on this VTEP?
How many MACs have been learned through this VTEP?
**Impact on operations:**
1. **No Health Monitoring**: Cannot determine if a remote VTEP is alive without checking each VXLAN interface individually
2. **Troubleshooting Difficulty**: Must check N interfaces to diagnose connectivity to one site
3. **No Aggregated Statistics**: Cannot see total traffic to a remote site (must manually sum all VNI statistics)
4. **No Alerting**: Cannot create alerts for VTEP state changes
5. **No Historical Data**: No uptime tracking or state change history
## Proposed Solution
### Part 1: Explicit VTEP Configuration
Introduce VTEP as a first-class configuration object, similar to Cisco's NVE (Network Virtualization Edge) interfaces:
Step 1: Create VTEP objects independently
/interface vxlan vtep
add name=vtep-to-siteB
local-address=10.0.0.1
remote-address=10.0.0.2
port=4789
comment="VTEP to Site B"
add name=vtep-to-siteC
local-address=10.0.0.1
remote-address=10.0.0.3
port=4789
comment="VTEP to Site C"
Step 2: VXLAN interfaces reference VTEPs
/interface vxlan
add name=vxlan100 vni=100 vtep=vtep-to-siteB
add name=vxlan200 vni=200 vtep=vtep-to-siteB
add name=vxlan300 vni=300 vtep=vtep-to-siteB
add name=vxlan400 vni=400 vtep=vtep-to-siteC
add name=vxlan500 vni=500 vtep=vtep-to-siteC
**Benefits:**
1. ✅ **Configuration Clarity**: `local-address` specified only once per VTEP
2. ✅ **Explicit Relationships**: Clear which VXLANs use which VTEP
3. ✅ **Error Prevention**: Cannot reference non-existent VTEP (validation at config time)
4. ✅ **Bulk Operations**: Change VTEP properties once, affects all associated VXLANs
5. ✅ **Easier Management**: Named VTEPs are self-documenting
6. ✅ **Deletion Protection**: Cannot delete VTEP while in use
**Example of bulk modification:**
Current method: must modify all interfaces
/interface vxlan set [find local-address=10.0.0.1] local-address=10.0.0.10
Proposed method: modify once
/interface vxlan vtep set vtep-to-siteB local-address=10.0.0.10
### Part 2: VTEP Session State Management
Add comprehensive state tracking for VTEP objects:
View all VTEPs with state
/interface vxlan vtep print
Flags: X - disabled, R - running, D - down
NAME LOCAL-ADDRESS REMOTE-ADDRESS STATE UPTIME INTERFACES
0 R vtep-to-siteB 10.0.0.1 10.0.0.2 up 2d3h45m 10
1 R vtep-to-siteC 10.0.0.1 10.0.0.3 up 1d12h30m 5
2 D vtep-to-siteD 10.0.0.1 10.0.0.4 down -- 10
View detailed VTEP state
/interface vxlan vtep print detail where name=vtep-to-siteB
0 R name="vtep-to-siteB"
local-address=10.0.0.1
remote-address=10.0.0.2
port=4789
state=up
uptime=2d3h45m12s
last-state-change="Jan/15/2025 10:23:45"
rtt=2.3ms
interfaces=10
active-vnis="100,200,300,400,500,600,700,800,900,1000"
learned-macs=1234
View aggregated statistics
/interface vxlan vtep print stats
NAME STATE RX-PACKETS TX-PACKETS RX-BYTES TX-BYTES ERRORS
0 vtep-to-siteB up 12,345,678 11,234,567 15.2 GB 14.8 GB 0
1 vtep-to-siteC up 8,765,432 8,123,456 10.5 GB 9.8 GB 0
2 vtep-to-siteD down 0 0 0 0 0
Monitor in real-time
/interface vxlan vtep monitor vtep-to-siteB
VTEP: vtep-to-siteB
State: up
Uptime: 2d3h45m18s
Current Rates:
RX: 125.3 Mbps (45,234 pps)
TX: 120.8 Mbps (43,567 pps)
Active VNIs: 10/10
**Proposed state attributes:**
**Basic State:**
- `state` - up/down/unknown
- `admin-state` - enabled/disabled
- `uptime` - duration since last state change
- `last-state-change` - timestamp
**Reachability:**
- `reachable` - yes/no (based on keepalive or data plane activity)
- `last-packet-received` - timestamp
- `rtt` - round-trip time (if keepalive enabled)
**Statistics (aggregated across all VNIs):**
- `rx-packets` / `tx-packets`
- `rx-bytes` / `tx-bytes`
- `rx-rate` / `tx-rate` (5-minute average)
- `rx-errors` / `tx-errors`
- `rx-drops` / `tx-drops`
**VNI Information:**
- `interfaces` - count of VXLAN interfaces using this VTEP
- `active-vnis` - list of VNIs currently active
- `learned-macs` - total MAC addresses learned through this VTEP
**Events:**
- State change events logged to `/log`
- Optional script hooks for automation (`on-up-script`, `on-down-script`)
## Comparison with Industry Standards
This design aligns with how other vendors implement VXLAN:
**Cisco NX-OS:**
interface nve1
source-interface loopback0 # = VTEP config
member vni 100 # = VNI references VTEP
member vni 200
show nve interface # = VTEP state
show nve peers # = remote VTEP state
**Arista EOS:**
interface Vxlan1
vxlan source-interface Loopback0
vxlan vlan 100 vni 10100
show interfaces vxlan1
show vxlan vtep
**Juniper:**
set protocols evpn encapsulation vxlan
set vlans vlan100 vxlan vni 10100
show evpn instance
show evpn database
All major vendors provide:
1. ✅ Explicit VTEP/NVE configuration objects
2. ✅ VTEP state visibility
3. ✅ Aggregated statistics
4. ✅ Hierarchical configuration (VTEP → VNI)
## Use Cases
### Use Case 1: Multi-Site Deployment
**Scenario**: Data center with 3 remote sites, 10 VNIs per site
**Current configuration**: 30 VXLAN interfaces + 30 vteps entries = 60 configuration lines
/interface vxlan
add name=vxlan100-B vni=100 local-address=10.0.0.1 # repeated 30 times
add name=vxlan200-B vni=200 local-address=10.0.0.1
...
/interface vxlan vteps
add interface=vxlan100-B remote-ip=10.0.0.2 # repeated 30 times
add interface=vxlan200-B remote-ip=10.0.0.2
...
**Proposed configuration**: 3 VTEPs + 30 VXLAN interfaces = 33 configuration lines
/interface vxlan vtep
add name=vtep-siteB local-address=10.0.0.1 remote-address=10.0.0.2
add name=vtep-siteC local-address=10.0.0.1 remote-address=10.0.0.3
add name=vtep-siteD local-address=10.0.0.1 remote-address=10.0.0.4
/interface vxlan
add name=vxlan100-B vni=100 vtep=vtep-siteB
add name=vxlan200-B vni=200 vtep=vtep-siteB
...
**Benefit**: 45% reduction in configuration lines, much clearer structure
### Use Case 2: Troubleshooting Connectivity
**Current process** (to check if Site B is reachable):
1. Find all VXLAN interfaces to Site B: `unknown method`
2. Check each interface individually: `10 separate checks`
3. Manually aggregate statistics: `manual calculation`
4. No way to see historical uptime
**Proposed process**:
1. Check VTEP state: `/interface vxlan vtep print where name=vtep-siteB`
2. View aggregated stats: `/interface vxlan vtep print stats where name=vtep-siteB`
3. View state history: `/log print where topics~"vtep" and message~"vtep-siteB"`
4. Monitor real-time: `/interface vxlan vtep monitor vtep-siteB`
**Benefit**: Single point of visibility, faster troubleshooting
### Use Case 3: Monitoring and Alerting
**Current**: No built-in way to monitor VTEP health
- Must check each VXLAN interface
- No aggregated metrics
- Cannot create meaningful alerts
**Proposed**:
Create alert when VTEP goes down
/interface vxlan vtep
set vtep-siteB on-down-script=alert-admin
/system script
add name=alert-admin source={
:log error "VTEP $vtepName is down!";
/tool e-mail send to="admin@example.com"
subject="VTEP Down Alert"
body="VTEP $vtepName went down at $(date)";
}
SNMP monitoring
snmpwalk ... vxlanVtepTable
Benefit: Proactive monitoring, automated alerting