Explicit VTEP Configuration and Session State Management for VXLAN

Currently, MikroTik's VXLAN implementation lacks two critical features that are standard in enterprise networking equipment:

  1. Explicit VTEP (VXLAN Tunnel Endpoint) configuration - VTEPs are implicitly created and cannot be managed as independent objects

  2. VTEP session state management - No visibility into VTEP health, reachability, or aggregated statistics

These limitations make large-scale VXLAN deployments difficult to manage, troubleshoot, and monitor effectively.

Current Implementation Problems

Problem 1: Implicit VTEP Creation

Currently, VTEPs are created implicitly based on the local-address parameter of each VXLAN interface:

/interface vxlan
add name=vxlan100 vni=100 local-address=10.0.0.1
add name=vxlan200 vni=200 local-address=10.0.0.1
add name=vxlan300 vni=300 local-address=10.0.0.1

/interface vxlan vteps
add interface=vxlan100 remote-ip=10.0.0.2
add interface=vxlan200 remote-ip=10.0.0.2
add interface=vxlan300 remote-ip=10.0.0.2


**Issues with this approach:**

1. **Configuration Redundancy**: The `local-address` must be repeated for every VXLAN interface (30 times for 30 VNIs)
2. **No VTEP Object**: VTEPs exist only as implicit side effects, not as manageable first-class objects
3. **Error-Prone**: Easy to accidentally specify different `local-address` values, creating unintended multiple VTEPs
4. **Unclear Reuse**: Whether VTEPs are shared depends on "accidental" matching of IP addresses
5. **Difficult Bulk Operations**: Changing VTEP properties requires modifying all associated VXLAN interfaces
6. **No Validation**: System cannot verify if a VTEP configuration is correct until runtime

### Problem 2: No VTEP Session State

There is currently no way to view VTEP-level state information:

What we can see:

/interface vxlan print detail

Shows: interface running/down

/interface vxlan vteps print

Shows: configured remote-ip only

What we CANNOT see:

:cross_mark: Is the remote VTEP actually reachable?
:cross_mark: Is the VTEP session established?
:cross_mark: How long has the VTEP been up/down?
:cross_mark: What is the aggregate traffic for all VNIs using this VTEP?
:cross_mark: Are there any VTEP-level errors or packet drops?
:cross_mark: Which VNIs are active on this VTEP?
:cross_mark: How many MACs have been learned through this VTEP?


**Impact on operations:**

1. **No Health Monitoring**: Cannot determine if a remote VTEP is alive without checking each VXLAN interface individually
2. **Troubleshooting Difficulty**: Must check N interfaces to diagnose connectivity to one site
3. **No Aggregated Statistics**: Cannot see total traffic to a remote site (must manually sum all VNI statistics)
4. **No Alerting**: Cannot create alerts for VTEP state changes
5. **No Historical Data**: No uptime tracking or state change history

## Proposed Solution

### Part 1: Explicit VTEP Configuration

Introduce VTEP as a first-class configuration object, similar to Cisco's NVE (Network Virtualization Edge) interfaces:

Step 1: Create VTEP objects independently

/interface vxlan vtep
add name=vtep-to-siteB
local-address=10.0.0.1
remote-address=10.0.0.2
port=4789
comment="VTEP to Site B"

add name=vtep-to-siteC
local-address=10.0.0.1
remote-address=10.0.0.3
port=4789
comment="VTEP to Site C"

Step 2: VXLAN interfaces reference VTEPs

/interface vxlan
add name=vxlan100 vni=100 vtep=vtep-to-siteB
add name=vxlan200 vni=200 vtep=vtep-to-siteB
add name=vxlan300 vni=300 vtep=vtep-to-siteB

add name=vxlan400 vni=400 vtep=vtep-to-siteC
add name=vxlan500 vni=500 vtep=vtep-to-siteC


**Benefits:**

1. ✅ **Configuration Clarity**: `local-address` specified only once per VTEP
2. ✅ **Explicit Relationships**: Clear which VXLANs use which VTEP
3. ✅ **Error Prevention**: Cannot reference non-existent VTEP (validation at config time)
4. ✅ **Bulk Operations**: Change VTEP properties once, affects all associated VXLANs
5. ✅ **Easier Management**: Named VTEPs are self-documenting
6. ✅ **Deletion Protection**: Cannot delete VTEP while in use

**Example of bulk modification:**

Current method: must modify all interfaces

/interface vxlan set [find local-address=10.0.0.1] local-address=10.0.0.10

Proposed method: modify once

/interface vxlan vtep set vtep-to-siteB local-address=10.0.0.10


### Part 2: VTEP Session State Management

Add comprehensive state tracking for VTEP objects:

View all VTEPs with state

/interface vxlan vtep print

Flags: X - disabled, R - running, D - down

NAME LOCAL-ADDRESS REMOTE-ADDRESS STATE UPTIME INTERFACES

0 R vtep-to-siteB 10.0.0.1 10.0.0.2 up 2d3h45m 10
1 R vtep-to-siteC 10.0.0.1 10.0.0.3 up 1d12h30m 5
2 D vtep-to-siteD 10.0.0.1 10.0.0.4 down -- 10

View detailed VTEP state

/interface vxlan vtep print detail where name=vtep-to-siteB

0 R name="vtep-to-siteB"
local-address=10.0.0.1
remote-address=10.0.0.2
port=4789
state=up
uptime=2d3h45m12s
last-state-change="Jan/15/2025 10:23:45"
rtt=2.3ms
interfaces=10
active-vnis="100,200,300,400,500,600,700,800,900,1000"
learned-macs=1234

View aggregated statistics

/interface vxlan vtep print stats

NAME STATE RX-PACKETS TX-PACKETS RX-BYTES TX-BYTES ERRORS

0 vtep-to-siteB up 12,345,678 11,234,567 15.2 GB 14.8 GB 0
1 vtep-to-siteC up 8,765,432 8,123,456 10.5 GB 9.8 GB 0
2 vtep-to-siteD down 0 0 0 0 0

Monitor in real-time

/interface vxlan vtep monitor vtep-to-siteB

VTEP: vtep-to-siteB
State: up
Uptime: 2d3h45m18s

Current Rates:
RX: 125.3 Mbps (45,234 pps)
TX: 120.8 Mbps (43,567 pps)

Active VNIs: 10/10


**Proposed state attributes:**

**Basic State:**
- `state` - up/down/unknown
- `admin-state` - enabled/disabled
- `uptime` - duration since last state change
- `last-state-change` - timestamp

**Reachability:**
- `reachable` - yes/no (based on keepalive or data plane activity)
- `last-packet-received` - timestamp
- `rtt` - round-trip time (if keepalive enabled)

**Statistics (aggregated across all VNIs):**
- `rx-packets` / `tx-packets`
- `rx-bytes` / `tx-bytes`
- `rx-rate` / `tx-rate` (5-minute average)
- `rx-errors` / `tx-errors`
- `rx-drops` / `tx-drops`

**VNI Information:**
- `interfaces` - count of VXLAN interfaces using this VTEP
- `active-vnis` - list of VNIs currently active
- `learned-macs` - total MAC addresses learned through this VTEP

**Events:**
- State change events logged to `/log`
- Optional script hooks for automation (`on-up-script`, `on-down-script`)

## Comparison with Industry Standards

This design aligns with how other vendors implement VXLAN:

**Cisco NX-OS:**

interface nve1
source-interface loopback0 # = VTEP config
member vni 100 # = VNI references VTEP
member vni 200

show nve interface # = VTEP state
show nve peers # = remote VTEP state


**Arista EOS:**

interface Vxlan1
vxlan source-interface Loopback0
vxlan vlan 100 vni 10100

show interfaces vxlan1
show vxlan vtep


**Juniper:**

set protocols evpn encapsulation vxlan
set vlans vlan100 vxlan vni 10100

show evpn instance
show evpn database


All major vendors provide:
1. ✅ Explicit VTEP/NVE configuration objects
2. ✅ VTEP state visibility
3. ✅ Aggregated statistics
4. ✅ Hierarchical configuration (VTEP → VNI)

## Use Cases

### Use Case 1: Multi-Site Deployment

**Scenario**: Data center with 3 remote sites, 10 VNIs per site

**Current configuration**: 30 VXLAN interfaces + 30 vteps entries = 60 configuration lines

/interface vxlan
add name=vxlan100-B vni=100 local-address=10.0.0.1 # repeated 30 times
add name=vxlan200-B vni=200 local-address=10.0.0.1
...

/interface vxlan vteps
add interface=vxlan100-B remote-ip=10.0.0.2 # repeated 30 times
add interface=vxlan200-B remote-ip=10.0.0.2
...


**Proposed configuration**: 3 VTEPs + 30 VXLAN interfaces = 33 configuration lines

/interface vxlan vtep
add name=vtep-siteB local-address=10.0.0.1 remote-address=10.0.0.2
add name=vtep-siteC local-address=10.0.0.1 remote-address=10.0.0.3
add name=vtep-siteD local-address=10.0.0.1 remote-address=10.0.0.4

/interface vxlan
add name=vxlan100-B vni=100 vtep=vtep-siteB
add name=vxlan200-B vni=200 vtep=vtep-siteB
...


**Benefit**: 45% reduction in configuration lines, much clearer structure

### Use Case 2: Troubleshooting Connectivity

**Current process** (to check if Site B is reachable):
1. Find all VXLAN interfaces to Site B: `unknown method`
2. Check each interface individually: `10 separate checks`
3. Manually aggregate statistics: `manual calculation`
4. No way to see historical uptime

**Proposed process**:
1. Check VTEP state: `/interface vxlan vtep print where name=vtep-siteB`
2. View aggregated stats: `/interface vxlan vtep print stats where name=vtep-siteB`
3. View state history: `/log print where topics~"vtep" and message~"vtep-siteB"`
4. Monitor real-time: `/interface vxlan vtep monitor vtep-siteB`

**Benefit**: Single point of visibility, faster troubleshooting

### Use Case 3: Monitoring and Alerting

**Current**: No built-in way to monitor VTEP health
- Must check each VXLAN interface
- No aggregated metrics
- Cannot create meaningful alerts

**Proposed**:

Create alert when VTEP goes down

/interface vxlan vtep
set vtep-siteB on-down-script=alert-admin

/system script
add name=alert-admin source={
:log error "VTEP $vtepName is down!";
/tool e-mail send to="admin@example.com"
subject="VTEP Down Alert"
body="VTEP $vtepName went down at $(date)";
}

SNMP monitoring

snmpwalk ... vxlanVtepTable

Benefit: Proactive monitoring, automated alerting