Auto reboot WiFi on Hetermeter


 
Chris,

I've updated my script to do restart of the networking stack instead of a reload.

You can get the updated script by repeating steps 1 to 3 on my first reply in this thread.
 
I've tried the solution (different firmware and patch the brcmfmac driver) and the good news is that the HeaterMeter no longer gives the errors on disconnect. It reconnects to wifi, gets a new IP, and can connect out and updates heatermeter.com/devices. The bad news is that I can't connect to it.

I can see when I try to connect (by IP), arp resolution fails. If I try to ping it from my router, which has the HeaterMeter MAC cached already from doing DHCP, the ping works. On my windows machine, I can add a static arp cache entry with
Code:
netsh interface ip add neighbors "Ethernet 3" "192.168.2.44" "b8-27-eb-cf-f2-fd"
And I can then ping the HeaterMeter and also connect to the webui and everything is fine. If I delete the static ARP entry, it stops working. I tried a `/etc/init.d/network reload` on the HeaterMeter itself and it does nothing (as none of the network config has changed). Completely stopping the network and restarting it appears to fix whatever's in a weird state on the Pi but that's not very helpful. Rebooting also fixes the issue.

I'm not sure what to do next to fix this. I will not add a service that watches the wifi IP and if it changes, restarts all the networking, that's an awful hack. Linux devices change IPs all the time and continue to work so there's something unique here that needs to be fixed but I'm not sure where to even look.

Sounds like the Pi is not responding to ARP requests. Does an ifconfig show a NOARP flag on the offending interface?
 
Yeah I can't tell if the ARP isn't getting to the Pi or the wifi driver isn't receiving them properly or the router itself isn't properly associated or something and therefore not forwarding the ARP to the wifi device. It is very odd that the Pi gets a different IP address every time the router reboots, and all the other devices on my network keep their same IPs. It makes me think the driver is doing something wrong on the networking stack. The 3B+ works fine and keeps the same IP, but it uses a different chip/firmware albeit with the same driver (brcmfmac).

I did just check the interface and it shows UP BROADCAST RUNNING MULTICAST. It must be doing something right because its arp cache shows a growing list of addresses over time. I've also tried with the adapter in promiscuous mode and with/without power saving enabled which made no difference.

I also need to set up another wifi router to test with because having my desktop go off the network for several minutes for each reboot is annoying, especially because it breaks the connection to my development VM.
 
<chuckle> Bryan, I'm almost surprised you don't have a dedicated WiFi network just for HM development like this. Not necessarily for isolation (although that can be handy,) just so you can drop the WiFi network at will and not lose any other connections. Are you running consumer grade gear or enterprise?
 
It is just consumer hardware. Having to work on wifi connectivity isn't something I ever do so it hasn't been an issue needing a second AP around, also all my important devices are hard wired. My neighborhood is pretty saturated on 2.4GHz too so I didn't want to make it worse by adding another network. I did find that it fails just as well just disabling the wifi radio on the router then turning it back on so it is a lot less of a pain to test.

It isn't just ARPs that are a problem getting to it. I can do an SSDP M-SEARCH request and there's no response from the device because it doesn't get the multicast packet, yet *during* the search I can see a NOTIFY coming from the device (because it sends one every so often). It is pretty crazy, there's definitely something wonky in the driver/broadcomm firmware blob because I can run tcpdump on my router and the HeaterMeter and I see the ARP broadcasts on the router coming from my computer but nothing on the HeaterMeter. For what it is worth, I have a Pi 3 that does a buncha home automation stuff and testing it I see it exhibits the same behavior. If the IP address is still in the ARP cache, it works perfectly and never has an issue.

I've just pushed a new snapshot that uses the updated brcm43430 firmware but I think it causes more problems than it solves. Even the RPi people seem to be going back and forth on using it versus the old version. I also can't tell if all the mucking about I've done on my home network is causing problems. I suppose the good news is that the new firmware does reconnect, although if the device is actually accessible could be a different story.
 
That sounds like a pretty significant bug. I use a DHCP reservation and my router is pretty stable so I have not run into THIS problem, but my Edimax interfaces don't like to stay connected, hence my interest in this thread and my plan to replace my old pi with a Zero W. Hmmmm.
 
I just installed the latest snapshot release on my HMv4.2.4 with rPi zero-w and both the onboard wifi and the RT5370 reconnected after I reset my wireless router. Not sure if the driver change will cause other downstream issues, but the failure to reconnect after connection drop issue seems to be fixed.
 
It is just consumer hardware. Having to work on wifi connectivity isn't something I ever do so it hasn't been an issue needing a second AP around, also all my important devices are hard wired. My neighborhood is pretty saturated on 2.4GHz too so I didn't want to make it worse by adding another network.

I put in a Ubiquiti Unifi access point a little over a year ago... and there's no way that I will ever go back to consumer hardware, just from a local throughput aspect. However, yeah.... can be a problem in congested areas, it'll run up to 4 WiFi networks on both 2.4 and 5gHz bands (same SSIDs on both, actually.)
 
Thanks for clarifying. I’m having the same (same I guess) problems, too. So, I’m following this thread closely. Anything I can do to help, let me know.
 
Mine seems to be working great. It’s just sitting there right now. Probably smoke something this weekend to make sure all good. But works great when WiFi reboots.
 
I've had the snapshot running overnite, rebooted my wifi router twice, each time the onboard zero-w reconnected and stayed connected. Like Chris, mine is just sitting there doing nothing since this is my spare unit, I'll have to swap it out with my main unit to give it a better test...
Meanwhile I flashed my main unit with the latest snapshot to test, but it's running a model B with an RT5370 wifi. Flashing this one went fine as well...
 
Thanks for reporting back guys! Maybe I was running into some other problem with my testing that was causing some other weirdness. I would delete the ARP entry from my local machine (with either arp -d [heatermeterip] or just arp -d) and then I couldn't connect and saw all those problems with ARPs not being responded to. If I let it just happen naturally by like rebooting the PC or letting it sit until it drops off it seems to work fine. I wonder if the router has some sort of anti-ARP spoof/bomb/poison security that was preventing the ARP requests from being sent? I hate these sort of solutions where after 16 hours of looking at it, swapping one mystery binary blob for a different one sort of makes it work better?
 
I didn't try the arp delete prior to using the latest snapshot, but I can arp -d the HM IP from another pi and I can see it re-arps just fine after when attempting to connect via http.

Code:
$ sudo tcpdump -n host 192.168.1.89
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
10:06:05.577224 ARP, Request who-has 192.168.1.89 tell 192.168.1.85, length 28
10:06:05.579641 ARP, Reply 192.168.1.89 is-at b8:27:eb:0b:0a:96, length 46
10:06:05.580005 IP 192.168.1.85.35928 > 192.168.1.89.80: Flags [S], seq 3685867767, win 29200, options [mss 1460,sackOK,TS val 1363125897 ecr 0,nop,wscale 7], length 0
 
I can confirm as well. Since the snapshot haven’t turned it off yet. My router reboots at 5:00 every morning. Still running strong!! Thanks Bryan
 
You guys are great! Thanks for the updates and extra testing. I'll push the changes back up to github next time I'm on the VM.
 

 

Back
Top