Issues with BGP peering and TCP MD5 Keys

Howdy,

Has anyone else had issue with BGP peering when it involves the use of the a TCP MD5 key?

I’m trying to setup BGP peering with Cymru for bogons and I’m having trouble. My router never forms a complete connection with the Cymru peers. To give you some background, I attempted this one before with them, and we finally got it working, but we did so by disabling the use of a key. This time, I’d like to use it.

My network looks kinda like this:

[Internet] => Comcast business modem[NAT, DHCP for border only] => RB850Gx2[NAT, Internal DHCP, etc] => Internal network

I have all of the blocking firewall functions on the comcast gateway disabled, so it shouldn’t be interfering. There’s a chance NAT is corrupting the signature in the packets, but since I can’t capture packets on the outside of the comcast gateway, I can’t say for sure.

So far, when I capture on the WAN side of my RB850Gx2, I see the packets inbound from Cymru with the MD5 key and all. My device is trying to connect as well, sending out TCP/179 packets destined for the Cymru peer IPs. Neither end acks the other. Everything seems to be arriving intact, but they’re ignoring each other and I can’t figure out why.

One thing I noticed is that packets inbound from Cymru have 24 bytes of TCP options consisting of the following:

RouterOS, on the other hand, is sending out BGP packets with 32 bytes of TCP options, consisting of the following:

  • No-operation (1 byte)
  • No-operation (1 byte)
  • TCP MD5 signature (18 bytes)
  • Max segment size (set to 1460) (4 bytes)
  • No-operation (1 byte)
  • No-operation (1 byte)
  • SACK permitted (2 bytes)
  • No-operation (1 byte)
  • Window scale (shift count 3) (3 bytes)


    According to RFC 2385 (https://tools.ietf.org/html/rfc2385):
    “The total header size is also an issue. The TCP header specifies where segment data starts with a 4-bit field which gives the size of the header (including options) in 32-byte words. This that the total size of the header plus option must be less than equal to 60 bytes – this leaves 40 bytes for options.”

Looking at my packet captures, the header for the cymru packets is 44 bytes with 24 bytes for the options and the header for the RouterOS packets is 52 bytes with 32 bytes for the options. While they’re different, both are technically within the expected size. One thing I don’t see in the RFC is any reference to the use of “No-operation” options.


Right now my running theory is that one of two things is happening:

  1. Something on the comcast gateway is corrupting the signature in the packet before it reaches my internal router
  2. Something isn’t being handled correctly on the RouterOS side (e.g. since it’s generating packets very different from the Cymru packets, perhaps it’s also looking for something different)

I’ve dropped in some packet exports and config exports below. I promise my firewall config is correct. I’ve gone to great lengths to ensure the firewall on my RB850Gx2 not altering the inbound packets. I will not post my firewall config at this time.

Does anyone have any suggestions?



Here’s a sanitized version of a packet export from Cymru:

Ethernet II, Src: (comcast_gateway), Dst: (rb850Gx2)
Internet Protocol Version 4, Src: (cymru_ipv4), Dst: (internal_ip)
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0x20 (DSCP: CS1, ECN: Not-ECT)
        0010 00.. = Differentiated Services Codepoint: Class Selector 1 (8)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 64
    Identification: 0xda99 (55961)
    Flags: 0x02 (Don't Fragment)
        0... .... = Reserved bit: Not set
        .1.. .... = Don't fragment: Set
        ..0. .... = More fragments: Not set
    Fragment offset: 0
    Time to live: 242
    Protocol: TCP (6)
    Header checksum: 0x3996 [correct]
    [Header checksum status: Good]
    [Calculated Checksum: 0x3996]
    Source: (cymru_ipv4)
    Source or Destination Address: (cymru_ipv4)
    [Source Host: (cymru_ipv4)]
    [Source or Destination Host: (cymru_ipv4)]
    Destination: (internal_ip)
    Source or Destination Address: (internal_ip)
    [Destination Host: (internal_ip)]
    [Source or Destination Host: (internal_ip)]
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 58661 (58661), Dst Port: bgp (179), Seq: 0, Len: 0
    Source Port: 58661 (58661)
    Destination Port: bgp (179)
    Source or Destination Port: 58661 (58661)
    Source or Destination Port: bgp (179)
    [Stream index: 51]
    [TCP Segment Len: 0]
    Sequence number: 0    (relative sequence number)
    Acknowledgment number: 0
    1011 .... = Header Length: 44 bytes (11)
    Flags: 0x002 (SYN)
        000. .... .... = Reserved: Not set
        ...0 .... .... = Nonce: Not set
        .... 0... .... = Congestion Window Reduced (CWR): Not set
        .... .0.. .... = ECN-Echo: Not set
        .... ..0. .... = Urgent: Not set
        .... ...0 .... = Acknowledgment: Not set
        .... .... 0... = Push: Not set
        .... .... .0.. = Reset: Not set
        .... .... ..1. = Syn: Set
        .... .... ...0 = Fin: Not set
        [TCP Flags: ··········S·]
    Window size value: 16384
    [Calculated window size: 16384]
    Checksum: 0xc657 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    Options: (24 bytes), Maximum segment size, TCP MD5 signature, End of Option List (EOL)
        TCP Option - Maximum segment size: 1460 bytes
            Kind: Maximum Segment Size (2)
            Length: 4
            MSS Value: 1460
        TCP Option - TCP MD5 signature
            Kind: MD5 Signature Option (19)
            Length: 18
            MD5 digest: 49 64 b4 83 87 7a 2e 26 fc 83 26 cb 50 b1 b9 88
        TCP Option - End of Option List (EOL)
            Kind: End of Option List (0)
    [Timestamps]

And here’s a sanitized version of a packet export from RouterOS:

Ethernet II, Src: (rb850Gx2), Dst: (comcast gateway)
Internet Protocol Version 4, Src: (internal_ip), Dst: (cymru_ipv4)
    0100 .... = Version: 4
    .... 0101 = Header Length: 20 bytes (5)
    Differentiated Services Field: 0xc0 (DSCP: CS6, ECN: Not-ECT)
        1100 00.. = Differentiated Services Codepoint: Class Selector 6 (48)
        .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
    Total Length: 72
    Identification: 0x72a5 (29349)
    Flags: 0x02 (Don't Fragment)
        0... .... = Reserved bit: Not set
        .1.. .... = Don't fragment: Set
        ..0. .... = More fragments: Not set
    Fragment offset: 0
    Time to live: 255
    Protocol: TCP (6)
    Header checksum: 0x93e2 [correct]
    [Header checksum status: Good]
    [Calculated Checksum: 0x93e2]
    Source: (internal_ip)
    Source or Destination Address: (internal_ip)
    [Source Host: (internal_ip)]
    [Source or Destination Host: (internal_ip)]
    Destination: (cymru_ipv4)
    Source or Destination Address: (cymru_ipv4)
    [Destination Host: (cymru_ipv4)]
    [Source or Destination Host: (cymru_ipv4)]
    [Source GeoIP: Unknown]
    [Destination GeoIP: Unknown]
Transmission Control Protocol, Src Port: 34555 (34555), Dst Port: bgp (179), Seq: 0, Len: 0
    Source Port: 34555 (34555)
    Destination Port: bgp (179)
    Source or Destination Port: 34555 (34555)
    Source or Destination Port: bgp (179)
    [Stream index: 50]
    [TCP Segment Len: 0]
    Sequence number: 0    (relative sequence number)
    Acknowledgment number: 0
    1101 .... = Header Length: 52 bytes (13)
    Flags: 0x002 (SYN)
        000. .... .... = Reserved: Not set
        ...0 .... .... = Nonce: Not set
        .... 0... .... = Congestion Window Reduced (CWR): Not set
        .... .0.. .... = ECN-Echo: Not set
        .... ..0. .... = Urgent: Not set
        .... ...0 .... = Acknowledgment: Not set
        .... .... 0... = Push: Not set
        .... .... .0.. = Reset: Not set
        .... .... ..1. = Syn: Set
        .... .... ...0 = Fin: Not set
        [TCP Flags: ··········S·]
    Window size value: 14600
    [Calculated window size: 14600]
    Checksum: 0x74a2 [unverified]
    [Checksum Status: Unverified]
    Urgent pointer: 0
    Options: (32 bytes), No-Operation (NOP), No-Operation (NOP), TCP MD5 signature, Maximum segment size, No-Operation (NOP), No-Operation (NOP), SACK permitted, No-Operation (NOP), Window scale
        TCP Option - No-Operation (NOP)
            Kind: No-Operation (1)
        TCP Option - No-Operation (NOP)
            Kind: No-Operation (1)
        TCP Option - TCP MD5 signature
            Kind: MD5 Signature Option (19)
            Length: 18
            MD5 digest: a7 de ca f6 8c b1 37 b8 68 90 b5 3c 5b de 8d 5c
        TCP Option - Maximum segment size: 1460 bytes
            Kind: Maximum Segment Size (2)
            Length: 4
            MSS Value: 1460
        TCP Option - No-Operation (NOP)
            Kind: No-Operation (1)
        TCP Option - No-Operation (NOP)
            Kind: No-Operation (1)
        TCP Option - SACK permitted
            Kind: SACK Permitted (4)
            Length: 2
        TCP Option - No-Operation (NOP)
            Kind: No-Operation (1)
        TCP Option - Window scale: 6 (multiply by 64)
            Kind: Window Scale (3)
            Length: 3
            Shift count: 6
            [Multiplier: 64]
    [Timestamps]

My routing instance config:

/routing bgp instance
set default as=<my_private_AS> client-to-client-reflection=no !cluster-id !confederation disabled=no ignore-as-path-len=no name=default \
    out-filter=BGP-DROP redistribute-connected=no redistribute-ospf=no redistribute-other-bgp=no redistribute-rip=no \
    redistribute-static=no router-id=<my_external_ip> routing-table=""

My peering config looks like:

/routing bgp peer
add address-families=ip,ipv6 !allow-as-in as-override=no cisco-vpls-nlri-len-fmt=auto-bytes default-originate=never disabled=no hold-time=6m in-filter=BOGON-SERVER-IN instance=default !keepalive-time max-prefix-limit=200000 !max-prefix-restart-time multihop=yes name=FULLBOGONS-CYMRU-1 nexthop-choice=default out-filter=BGP-DROP passive=no remote-address=<cymru_ipv4> remote-as=<cymru_AS> remove-private-as=no route-reflect=no ttl=255 use-bfd=no

add address-families=ip,ipv6 !allow-as-in as-override=no default-originate=never disabled=yes hold-time=6m in-filter=BOGON-SERVER-IN instance=default !keepalive-time max-prefix-limit=200000 !max-prefix-restart-time multihop=yes name=FULLBOGONS-CYMRU-2 nexthop-choice=default out-filter=BGP-DROP passive=no remote-address=<cymru_ipv6> remote-as=<cymru_AS> remove-private-as=no route-reflect=no ttl=255 use-bfd=n

I’m not sure exactly what’s going on, but off the cuff, I’d say that NAT is the problem.

That’s because if there’s a checksum being generated at the TCP header level, then it’s almost certainly including the IP address / port number in that checksum, and there’d be no way for the router to re-calculate a valid hash on the new src-IP. (I’ve never looked too deeply into the MD5 hashing behavior of BGP, so I’m a bit surprised to see it in the TCP header and not part of the actual BGP payload)

Based on a doc I found(http://costiser.ro/2013/03/31/bgp-md5-authentication/#.Wi2TjEqnFHb) describing the BGP MD5 auth mechanism and its interaction with NAT, it looks like NAT is my problem. I tried a few more times, but I can neither determine exactly what is being altered, nor apparently do anything about it with my current topology (specifically, the comcast business modem).

Honestly, I’d just opt to get public IP space from them and not have my router be behind NAT if I were in your situation.
Comcast Business uses a special router for their connections - it’s actually a Cisco from what I understand, and as such they don’t offer placing their modem into bridge mode like you can do with a home cable modem. Thus they will route a /29 to their router’s LAN interface. Not sure what it costs, but being behind NAT is annoying if you are a router jockey and like the ability to do stuff on your own.