Resurrecting this old thread that pertains to an ongoing issue.
I hope this is helpful to others.
Ecobee WiFi Disconnect Research
Environment
16 ecobee smart thermostats are deployed across 6 locations.
The locations use a mix of MikroTik routers and access points (hAP ax3, hEX, RB5009, CRS326, Cubes, wAPs, NetMetal, cAP ax) as well as Ubiquiti UniFi equipment (UDM, LR6, AC Mesh) for networking.
All ecobees connect via 2.4GHz WiFi and maintain persistent TLS connections to the ecobee.com cloud (161.38.184.18:8190) for remote control and monitoring.
Home Assistant integrates all thermostats via the ecobee cloud API.
For years, ecobees across multiple properties have been randomly losing their WiFi connections and failing to reconnect without a physical power cycle. The disconnects appeared random -- different locations, different times, no obvious pattern. Extensive troubleshooting had ruled out WiFi configuration (after what feels like every possible combination of every single , signal strength, and AP compatibility.
I continued to investigate because the burden of physically power cycling the thermostats, as well as randomly losing connectivity, is substantial.
In the ongoing attempt to understand what is causing this, I built a continuous packet-capture pipeline that catches and saves the 60 minutes of network traffic preceding each disconnect.
The discovery is fascinating.
Root Cause: TCP SYN Scan Crashes Ecobee WiFi SoC
Ecobee thermostats use an embedded WiFi System-in-Package for 2.4GHz 802.11 b/g/n connectivity. The ecobee4 generation is confirmed to use a Qualcomm Atheros-based WiFi radio; the specific chip is unverified but likely an AR4100 or QCA4002 based on the feature profile (2.4GHz only, SPI interface, no BLE, no WPA3). Newer ecobee models (Smart Thermostat Premium, ECB601) use a MediaTek WiFi/Bluetooth IC instead.
The exact chip matters less than the class of device: all are low-power embedded IoT WiFi SoCs with minimal RAM (typically 128-512KB), lightweight TCP stacks, and designs optimized for a handful of concurrent connections.
A full-port SYN scan sending 65,535 packets at 1,500/second is orders of magnitude beyond the design envelope of any such chip, regardless of manufacturer.
When this WiFi SoC receives a high-rate TCP SYN scan (port scan), its IP stack crashes. The device enters a state we can call "wedged" wherein:
- The WiFi radio stays associated to the AP (still transmitting LLC/XID frames)
- The IP stack is completely dead (no ARP replies, no TCP, no UDP)
- The ecobee.com cloud continues to report "connected" using stale cached data
- HA may continue showing temperature readings from cached cloud data
- The device will NOT recover on its own -- requires a physical power cycle
The Incident
On 2026-05-03, my monthly scheduled (and on-demand) from-the-inside-of-the-lan full nmap scan ran: nmap -sS -p- -T4 --min-rate 1500
This scans all 65,535 TCP ports per host at 1,500+ SYN packets/second. Four ecobees crashed within minutes of being scanned.
The pcap archives show the exact SYN flood hitting each ecobee immediately before it went silent. Ecobee-1 capture shows the transition in detail: normal TLS traffic to peer ecobees on port 1201, then XID LLC frames begin appearing interspersed with IP traffic, then a burst of mDNS re-discovery queries for all peer ecobees, then pure LLC XID frames only.
Ecobee Failure Signature in Packet Captures
A healthy ecobee produces:
- TCP traffic to 161.38.184.18:8190 (ecobee.com cloud, persistent TLS session)
- TCP traffic to peer ecobees on port 1201 (local mesh, TLS encrypted)
- Periodic mDNS announcements (_ecobee._tcp.local)
- ARP replies
A wedged ecobee produces:
-
LLC XID frames only (Basic Format, Type 1 LLC, Class I, Window Size 1)
-
Every 6-30 seconds, irregular intervals
-
No IP traffic of any kind
-
No ARP replies
-
WiFi association maintained (AP still sees the client)
### Ecobee Hardware Details
-
Local mesh: TCP port 1201 between peer ecobees on same subnet
-
mDNS service: _ecobee._tcp.local
How to Determine if an Ecobee is Connected
Do NOT rely on:
Home Assistant state: HA gets ecobee status from the ecobee.com cloud API. The cloud caches connection state and is unreliable in BOTH directions:
- May report "connected" with temperature readings for hours after an ecobee has actually crashed. On 2026-05-03, HA reported two different ecobees as connected with live temperatures when both had been in the wedged LLC-only state.
- May briefly report "unavailable" for ecobees that are actually connected. On 2026-05-04, HA triggered ha_unavailable alerts for two ecobees while both were confirmed connected via conntrack. These are cloud API glitches.
The HA listener is still useful as ONE trigger for archiving pcap data, but its state should never be used as the sole determination of ecobee connectivity.
ICMP ping: Ecobees may not respond to ICMP ping even when fully connected. This is normal behavior for the ecobee WiFi SoC, not an indication of a problem.
Reliable methods:
MikroTik ARP table (most reliable across all sites):
/ip/arp/print where mac-address~"44:61:32"
Interpretation:
- Status "reachable" or "stale" WITH a MAC address = ecobee has working IP stack
- Status "failed" or entry missing/no MAC = ecobee IP stack is dead, needs power cycle
- ARP entries expire naturally, so absence alone is not proof of failure -- but "failed" status IS proof (the MT actively tried to reach the ecobee and got no response)
MikroTik connection tracking (definitive when visible):
/ip/firewall/connection/print where dst-address~"161.38.184.18"
Interpretation:
- An "established" TCP connection to 161.38.184.18:8190 = ecobee is definitely connected to the cloud right now. The flags "SAC" (Seen-reply, Assured, Confirmed) prove bidirectional traffic.
- HOWEVER: on MTs with fasttrack/hw-offload enabled, established connections are offloaded to hardware and do NOT appear in the connection table. Empty conntrack on these devices is inconclusive.
- On some devices, the hardware switch chip means traffic is routed through CPU, so conntrack entries ARE visible.
- On some devices, conntrack entries are visible (confirmed working).
CRITICAL: Check conntrack on the ROUTING device, not the AP.
Some devices might be in a bridge/AP only -- and do not route or NAT ecobee traffic -- so the actual gateway/NAT needs to be checked for conntrack.
/ip/firewall/connection/print where dst-address~"161.38.184.18" and src-address~"<ecobee-address>"
The same hardware-switch-bypass principle applies to devices regarding the TZSP sniffer -- only sees mDNS (multicast) from the ecobee, not unicast cloud traffic, because the switch chip forwards unicast frames to the upstream port without involving the CPU. This means pcap capture from the those device's sniffer shows only mDNS even when the ecobee is fully connected -- do not interpret
mDNS-only pcap traffic as a sign of failure on this device.
Summary decision tree:
- Check ARP -- if "failed" or missing with no MAC -> DEAD, power cycle
- Check conntrack -- if "established" to 161.38.184.18 -> ALIVE
- If conntrack empty and fasttrack enabled -> check ARP only
- HA status -> use as supplementary info only, never as sole determination
Protections Against Future SYN Scan Crashes
Layer 1: MikroTik Firewall (defense in depth)
Firewall rules on each ecobee-serving MT rate-limit inbound SYN packets to 5/second with a burst of 10. This is 150x more than an ecobee needs in normal operation (1-2 new TCP connections per minute) but kills any port scan dead.
Each MT has:
- Address list "ecobees" containing the local ecobee IP(s)
- Forward chain rule: accept SYN to ecobees at 5/sec burst 10
- Forward chain rule: drop excess SYNs to ecobees
Layer 2: Scan Script Exclusion
Scheduled scan should explicity exclude all ecobee devices by IP.
Layer 3: Reduced Scan Aggressiveness
The monthly scan was changed from:
- T4, --min-rate 1500 (extremely aggressive, ~4 hours)
To:
- T3, --max-rate 300, --scan-delay 500ms (gentle, ~8-12 hours)
- Scheduled 10pm UTC on the 1st of each month (was 3am)
- Cron on dos-checker: 0 22 1 * * ...
This protects ALL devices on the network, not just ecobees. The original rate was aggressive enough to destabilize MikroTik SSH services across multiple sites.
Packet Capture Pipeline
Architecture
The pipeline runs on a linux LXC in Proxmox (named: "dos-checker") and captures ecobee WiFi traffic from two types of sources:
TZSP (MikroTik /tool/sniffer): The MT filters traffic by ecobee MAC and streams raw Ethernet frames via TZSP (UDP/37008) to dos-checker. Used for sites where the MT device is the WiFi AP. (Except where the MT device is a simple bridge.)
UniFi SSH tcpdump: app on dos-checker SSHs into UniFi APs and pipes tcpdump output back.
In a mixed environment (Unifi APs and MT router) the MT router might not be able to sniff Ethernet traffic because its hardware switch chip forwards frames in silicon, bypassing the CPU entirely.
Rolling Buffer
Each source writes 60-second pcap files, keeping 60 files = 60 minutes of history. On disconnect, the archive script copies the entire buffer plus creates a MAC-filtered merged pcap for quick analysis.
Disconnect Detection (two independent triggers + SYN flood alert)
-
HA WebSocket listener: Subscribes to state_changed events. When any ecobee entity transitions to "unavailable", triggers an archive.
-
Activity monitor: Scans pcap files every 60 seconds, counting IP frames per ecobee MAC. If a previously-active MAC goes silent for >15 minutes, triggers an archive. This catches the "wedged but RF-active" state that HA and the cloud miss.
-
SYN flood detector: Integrated into the activity monitor. Each scan cycle checks for TCP SYN packets destined to ecobee MACs with more than 10 unique destination ports. Normal ecobee traffic hits only 2-3 ports (8190 cloud, 1201 mesh). A port scan hits hundreds. If detected, fires an immediate email alert BEFORE the ecobee crashes. This catches scans that bypass the MT firewall (e.g., from the same L2 subnet as the ecobee).
Archive Output
Each archive contains:
- metadata.txt (entity, MAC, trigger reason, timestamp)
- source_/ecobee_.pcap (raw per-source pcap files, full 60-min buffer)
- filtered_MACADDR.pcap (merged, MAC-filtered pcap for quick review)
Archives are saved to a file.
Key Files on dos-checker
| File |
Purpose |
| /root/ecobee_telemetry/tzsp_receiver.py |
TZSP UDP listener, writes rotating pcaps |
| /root/ecobee_telemetry/ha_disconnect_listener.py |
HA WebSocket, triggers archive |
| /root/ecobee_telemetry/activity_monitor.py |
IP silence detector, triggers archive |
| /root/ecobee_telemetry/archive_pcap.py |
Copies buffer + builds filtered pcap |
| /root/ecobee_telemetry/unifi_pcap_streamer.sh |
SSH tcpdump to UniFi APs |
| /root/ecobee_telemetry/health_monitor.sh |
15-min cron, checks everything |
| /root/ecobee_telemetry/apply_ecobee_fw.sh |
Applies MT firewall rules |
| /root/ecobee_telemetry/setup_mt_sniffer.sh |
Configures MT TZSP sniffer |
| /root/drop/ecobee_telemetry/rolling/ |
Live pcap buffers (8 subdirs) |
| /root/drop/ecobee_telemetry/archive/ |
Disconnect archives |
| /root/drop/ecobee_telemetry/disconnect_log/disconnects.jsonl |
Event log |
| /root/drop/ecobee_telemetry/health_monitor.log |
Health check log |
Services (all enabled, Restart=always)
ecobee-tzsp-receiver
ecobee-ha-listener
ecobee-activity-monitor
ecobee-unifi-371-90
ecobee-unifi-371-93
ecobee-unifi-355-shop
ecobee-unifi-355-white
Monitoring Commands
Definitive per-ecobee status:
bash /root/ecobee_status.sh
This script checks each ecobee via the most reliable method available for its site: conntrack, ARP (fasttrack hides conntrack), pcap IP frame analysis.
Check for new disconnects:
tail -f /root/drop/ecobee_telemetry/disconnect_log/disconnects.jsonl
Check pipeline health:
tail -50 /root/drop/ecobee_telemetry/health_monitor.log
Check all service status (on dos-checker):
systemctl is-active ecobee-{tzsp-receiver,ha-listener,activity-monitor,unifi-371-90,unifi-371-93,unifi-355-shop,unifi-355-white}
Hard-Won Lessons
-
mergecap snaplen mismatch: TZSP pcaps have snaplen 65535, UniFi have 2048. mergecap produces pcapng with mixed snapshot lengths. tcpdump 4.99.3 silently fails to read these (0 packets). Fix: filter each source file individually with tcpdump, then merge only the filtered results.
-
Certain MT hardware switch chip: RB750Gr3 forwards Ethernet in silicon. Cannot use /tool/sniffer for local traffic. Must capture at the UniFi APs instead.
-
UniFi SSH rate limiting: BusyBox sshd on UniFi APs blocks source IPs that reconnect too rapidly. Fix: exponential backoff in the streamer (10s->60s cap).
-
tcpdump -Z privilege drop: Local tcpdump drops to user 'tcpdump' after first file rotation, losing write access to /root paths. Fix: -Z root flag.
-
Activity monitor CPU: Original per-MAC scan ran ~1000 tcpdump invocations per 60s cycle (15 MACs x ~80 files). Fix: single-pass scan per file, count all MACs in output. Reduced to ~80 invocations, 1.4s scan time.
-
Duplicate logging: systemd StandardOutput=append and Python manual file writes to the same log file caused every line to appear twice. Fix: removed manual file writes from Python, let systemd handle it.
-
RouterOS password-authentication=yes-if-no-key: If an SSH key is imported for a user, password auth is completely disabled for that user. If key auth then fails (network issues, key mismatch), you're locked out entirely. Fix: set password-authentication=yes on all MTs so password always works as fallback.
-
bridge, not a router: For MT devices that are configured as bridges (and not routers) ecobee traffic to the is handled at the router/gateway. It does not route or NAT. Check cloud connectivity at gateway/router. The TZSP sniffer devices connected to a bridge-only will only sees mDNS, not cloud traffic.
-
Ecobee post-reboot mDNS-only state: After a power cycle, an ecobee may spend time in mDNS discovery mode (_ecobee._tcp.local queries) before establishing its cloud connection. This is normal boot behavior, not a failure. Do not confuse mDNS-only traffic with a wedged state -- wedged ecobees produce LLC/XID frames, not mDNS.
-
HA unavailable false alarms: The HA WebSocket listener fires on every "unavailable" transition, but the ecobee cloud API occasionally reports brief unavailable states for ecobees that are actually connected. These generate disconnect log entries and email alerts that are false positives. Always verify via conntrack/ARP before concluding an ecobee is actually down.