Hello folks.
Yesterday I started getting yelled at by my loving family because their internet was going up and down. (mostly down) The webui showed no uptime more than one minute before it went away and I had to log in again.
Using ssh, I got into the box and looked around. Logs up to about the 43 second mark, then it worked for a while, then a silent reboot.
I ran top with a one-second delay for a few reboot cycles. In every case, when it died the process on top was bwmon_gargoyle. After some googling I tried /etc/init.d/bwmon_gargoyle stop (and disable).
Stability returned.
Archer C7 v2 running 1.9.1. QoS enabled for download and upload - although upload may have enabled itself as I don't remember doing it.
The router gets a daily 4am reboot. Nothing had changed in my config in weeks. The only thing I can think of is that the qos/bandwidth logs had grown to some critical level that instantly killed the box when bwmon_gargoyle was started.
I took the opportunity to update to 1.9.2 with a fresh config. However, I did a backup of my config, so I have a copy of everything in /usr/data/bwmon in the crash scenario.
Has anyone seen this before? Any data I can share to help track this down?
And thanks for everything you do!
Time Warp: Crash 1 minute after boot from bwmon_gargoyle
Moderator: Moderators
Time Warp: Crash 1 minute after boot from bwmon_gargoyle
Last edited by smiller on Wed Mar 29, 2017 12:39 am, edited 1 time in total.
Re: Crash 1 minute after boot from bwmon_gargoyle
You fell victim to the "time warp bug" (that's what i'm going to call it from now on so people can search for it).
BW Usage data got out of sync with real time, and when the module attempted to reload it, it caused a kernel panic.
The cause is unknown and hard to track down.
Purging /usr/data/bwmon causes the reboot loop to stop.
Solution:
BW Usage data got out of sync with real time, and when the module attempted to reload it, it caused a kernel panic.
The cause is unknown and hard to track down.
Purging /usr/data/bwmon causes the reboot loop to stop.
Solution:
Lantis wrote:Disconnect the WAN. Allow it to boot successfully. Purge /usr/data/bwmon/*. Reconnect WAN and reboot.
If disconnecting the WAN doesn't stop the boot loop, drop it into failsafe and do the same thing.
https://lantisproject.com/downloads/gargoylebuilds for the latest releases
Please be respectful when posting. I do this in my free time on a volunteer basis.
https://lantisproject.com/blog
Please be respectful when posting. I do this in my free time on a volunteer basis.
https://lantisproject.com/blog
Re: Time Warp: Crash 1 minute after boot from bwmon_gargoyle
Lantis,
Thanks.
I updated the thread title to put "Time Warp" in it.
Now I'm humming to myself "Let's do the time warp again!"
Thanks.
I updated the thread title to put "Time Warp" in it.
Now I'm humming to myself "Let's do the time warp again!"
Re: Time Warp: Crash 1 minute after boot from bwmon_gargoyle
https://lantisproject.com/downloads/gargoylebuilds for the latest releases
Please be respectful when posting. I do this in my free time on a volunteer basis.
https://lantisproject.com/blog
Please be respectful when posting. I do this in my free time on a volunteer basis.
https://lantisproject.com/blog
-
- Posts: 4
- Joined: Sun Jul 22, 2018 3:16 am
Re: Time Warp: Crash 1 minute after boot from bwmon_gargoyle
Thank goodness for this forum!
Woke up today (day of daylight savings) to my router rebooting every minute.
Was able to get into the web interface (basically copied the "bandwidth.sh" link into my browser and hit refreshed non stop until I was able to get it right when the router rebooted) and hit "Delete Data" under the "Bandwidth Usage Table".
Been 15 minutes an no reboots yet. Did I pretty much solve the issue of needing to clean the data from /usr/data/bwmon (without having to SSH in?)
Woke up today (day of daylight savings) to my router rebooting every minute.
Was able to get into the web interface (basically copied the "bandwidth.sh" link into my browser and hit refreshed non stop until I was able to get it right when the router rebooted) and hit "Delete Data" under the "Bandwidth Usage Table".
Been 15 minutes an no reboots yet. Did I pretty much solve the issue of needing to clean the data from /usr/data/bwmon (without having to SSH in?)
Re: Time Warp: Crash 1 minute after boot from bwmon_gargoyle
If you do this, click Status - Bandwidth Usage - Delete Data in the GUI to solve the problem.michaelmantis wrote:Did I pretty much solve the issue of needing to clean the data from /usr/data/bwmon (without having to SSH in?)
Turris Omnia with OpenWrt 21.02 - Tested
Linksys WRT3200ACM with Gargoyle 1.13.x
TL-WR1043ND v2 with Gargoyle 1.10.0
http://gargoyle.romanhk.cz custom builds by gargoyle users
Linksys WRT3200ACM with Gargoyle 1.13.x
TL-WR1043ND v2 with Gargoyle 1.10.0
http://gargoyle.romanhk.cz custom builds by gargoyle users
-
- Posts: 4
- Joined: Sun Jul 22, 2018 3:16 am
Re: Time Warp: Crash 1 minute after boot from bwmon_gargoyle
Quickly hitting "Status - Bandwidth Usage - Delete Data in the GUI" (within a minute of boot up) solved the problem.
Been up and running for 10 hours with no issue.
Been up and running for 10 hours with no issue.
Re: Time Warp: Crash 1 minute after boot from bwmon_gargoyle
I'm cautiously optimistic... I might have squashed the Time Warp bug.
It finally got me (daylight savings kicked in over the weekend here) and I was able to dump all of the data for analysis and reliably reproduce the bug.
https://github.com/ericpaulbishop/gargo ... th.c#L2364
The break here doesn't increment the index, so the while loop underneath it skips ahead another 8 bytes. This very quickly gets out of hand and we end up with an out of bounds memory read.
https://github.com/ericpaulbishop/gargo ... th.c#L2557
Not strictly related to the crash, it was causing all data to not reload correctly.
If you had a large bandwidth history file it may need to be loaded in 2 (or more parts). The first part would load fine. The second part would indicate it was up to say index 80 out of 100, with 20 in the file (the residual). But 80 > 20 so it never loaded the remaining 20.
I'm testing these fixes on my local before shipping them out in the next 1.15
It finally got me (daylight savings kicked in over the weekend here) and I was able to dump all of the data for analysis and reliably reproduce the bug.
https://github.com/ericpaulbishop/gargo ... th.c#L2364
The break here doesn't increment the index, so the while loop underneath it skips ahead another 8 bytes. This very quickly gets out of hand and we end up with an out of bounds memory read.
https://github.com/ericpaulbishop/gargo ... th.c#L2557
Not strictly related to the crash, it was causing all data to not reload correctly.
If you had a large bandwidth history file it may need to be loaded in 2 (or more parts). The first part would load fine. The second part would indicate it was up to say index 80 out of 100, with 20 in the file (the residual). But 80 > 20 so it never loaded the remaining 20.
I'm testing these fixes on my local before shipping them out in the next 1.15
https://lantisproject.com/downloads/gargoylebuilds for the latest releases
Please be respectful when posting. I do this in my free time on a volunteer basis.
https://lantisproject.com/blog
Please be respectful when posting. I do this in my free time on a volunteer basis.
https://lantisproject.com/blog