QoS stops working after a few days on 1.5.0

Report issues relating to bandwith monitoring, bandwidth quotas or QoS in this forum.

Moderator: Moderators

draga
Posts: 13
Joined: Mon Nov 14, 2011 1:49 pm

QoS stops working after a few days on 1.5.0

Post by draga »

Hello,
I've been running Gargoyle for one week now and it's amazing. It's QoS saved my (Internet) life. Really, impressive work.
I've noticed a problem, anyway. I'm using 1.5.0 on my WNDR3800 and from time to time I can see that the QoS stops working. Then I ssh inside the router and see that qosmon isn't running anymore. No reports on the logs, no evidences on dmesg and everything else is running perfectly. If I disable and reenable the QoS from the webif, I can see a corruption of the /etc/config/qos_gargoyle file: it usually loses all the inbound QoS rules.
So the best way to get it back is to reboot (or maybe a restart of the bwmon_gargoyle?). Anyone else noticed something like that? I've seen it happening the very first day I've been using Gargoyle, then all was right for 4 days, then again, then after 8 hours. No particular load peaks in those moments (I've just a 2 Mbit connection and the router has a massive hardware). If I will see it again, I will run a cron job that will check if qosmon is running and, if not, restart the service. Still, I am curious to know if I'm the only one experiencing it and/or maybe I can help to debug it.

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: QoS stops working after a few days on 1.5.0

Post by pbix »

I have never heard such a report before. The file /etc/config/gargoyle is only written by a web page save so it is hard to understand how it could become corrupted on its own.

The next time it happens check for the file /tmp/qosmon.status. It will contain the last thing written before qosmon exited. There might be a clue there.

I would also suggest that you roll back to v1.4.3 and see if the problem also occurs there. There are no significant QoS differences between 1.4 and 1.4.3
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

draga
Posts: 13
Joined: Mon Nov 14, 2011 1:49 pm

Re: QoS stops working after a few days on 1.5.0

Post by draga »

pbix wrote:I have never heard such a report before. The file /etc/config/gargoyle is only written by a web page save so it is hard to understand how it could become corrupted on its own.
Thinking again about it, I think it happened AFTER accessing the router QoS page from my Android phone. Maybe, for some strange reason, it issued a "save" but didn't work properly. Anyway, I'll have a closer look at it.
pbix wrote:The next time it happens check for the file /tmp/qosmon.status. It will contain the last thing written before qosmon exited. There might be a clue there.
Thank you for the hint. I will and report back, if it will happen again.
pbix wrote:I would also suggest that you roll back to v1.4.3 and see if the problem also occurs there. There are no significant QoS differences between 1.4 and 1.4.3
[/quote]

I was thinking about it, but actually that would make me lose the 5Ghz capability of my router, and I'd like to keep it.
I'll continue to monitor the situation and report back if the problem will occur again.

Thank you

draga
Posts: 13
Joined: Mon Nov 14, 2011 1:49 pm

Re: QoS stops working after a few days on 1.5.0

Post by draga »

It just happened again. Here's the /tmp/qosmon.status:

Code: Select all

root@dragasNG:/tmp# cat qosmon.status 
State: DISABLED
Link limit: 1800 (kbps)
Fair Link limit: 1301 (kbps)
Link load: 555 (kbps)
Ping: off
Filtered ping: 22 (ms)
Ping time limit: 85 (ms)
Classes Active: 1
Errors: (mismatch,errors,last err,selerr): 0,0,0,-1
ID FFFF, Active 0, Backlog 0, BW bps (filtered): 0
ID 8019, Active 0, Backlog 0, BW bps (filtered): 0
ID 801A, Active 0, Backlog 0, BW bps (filtered): 9
ID 801B, Active 1, Backlog 0, BW bps (filtered): 554584
ID 801C, Active 0, Backlog 0, BW bps (filtered): 940
ID 801D, Active 0, Backlog 0, BW bps (filtered): 0
Last edited by draga on Mon Nov 21, 2011 4:32 pm, edited 1 time in total.

draga
Posts: 13
Joined: Mon Nov 14, 2011 1:49 pm

Re: QoS stops working after a few days on 1.5.0

Post by draga »

And here's the logread:

Code: Select all

Nov 21 21:12:51 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a IEEE 802.11: disassociated
Nov 21 21:12:52 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a IEEE 802.11: deauthenticated due to inactivity
Nov 21 21:16:19 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a IEEE 802.11: authenticated
Nov 21 21:16:19 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a IEEE 802.11: associated (aid 1)
Nov 21 21:16:19 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a WPA: pairwise key handshake completed (RSN)
Nov 21 21:16:19 dragasNG daemon.info dnsmasq-dhcp[31108]: DHCPREQUEST(br-lan) 192.168.0.105 00:26:08:e1:aa:6a 
Nov 21 21:16:19 dragasNG daemon.info dnsmasq-dhcp[31108]: DHCPACK(br-lan) 192.168.0.105 00:26:08:e1:aa:6a dragasMBP
Nov 21 21:19:31 dragasNG daemon.info hostapd: wlan0: STA 00:90:4c:c5:00:34 WPA: group key handshake completed (RSN)
Nov 21 21:19:31 dragasNG daemon.info hostapd: wlan0: STA 00:26:37:3f:4a:cf WPA: group key handshake completed (RSN)
Nov 21 21:19:36 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a WPA: group key handshake completed (RSN)
Nov 21 21:21:01 dragasNG cron.err crond[32201]: USER root pid 6041 cmd /usr/bin/set_kernel_timezone >/dev/null 2>&1
Nov 21 21:25:58 dragasNG daemon.info dnsmasq[31108]: exiting on receipt of SIGTERM
Nov 21 21:25:58 dragasNG user.notice dnsmasq: DNS rebinding protection is active, will discard upstream RFC1918 responses!
Nov 21 21:25:58 dragasNG user.notice dnsmasq: Allowing 127.0.0.0/8 responses
Nov 21 21:26:02 dragasNG daemon.info dnsmasq[7464]: started, version 2.55 cachesize 150
Nov 21 21:26:02 dragasNG daemon.info dnsmasq[7464]: compile time options: IPv6 GNU-getopt no-DBus no-I18N DHCP TFTP
Nov 21 21:26:02 dragasNG daemon.info dnsmasq-dhcp[7464]: DHCP, IP range 192.168.0.101 -- 192.168.0.250, lease time 1d
Nov 21 21:26:02 dragasNG daemon.info dnsmasq[7464]: using local addresses only for domain lan
Nov 21 21:26:02 dragasNG daemon.info dnsmasq[7464]: reading /tmp/resolv.conf.auto
Nov 21 21:26:02 dragasNG daemon.info dnsmasq[7464]: using nameserver 62.94.0.2#53
Nov 21 21:26:02 dragasNG daemon.info dnsmasq[7464]: using nameserver 62.94.0.1#53
Nov 21 21:26:02 dragasNG daemon.warn dnsmasq[7464]: ignoring nameserver 192.168.0.1 - local interface
Nov 21 21:26:02 dragasNG daemon.info dnsmasq[7464]: using local addresses only for domain lan
Nov 21 21:26:02 dragasNG daemon.info dnsmasq[7464]: read /etc/hosts - 3 addresses
Nov 21 21:26:02 dragasNG daemon.info dnsmasq-dhcp[7464]: read /etc/ethers - 1 addresses
Nov 21 21:26:02 dragasNG daemon.warn dnsmasq[7464]: not giving name dragasMBP.lan to the DHCP lease of 192.168.0.105 because the name exists in /etc/hosts with address 192.168.0.100
Nov 21 21:26:02 dragasNG daemon.warn dnsmasq[7464]: not giving name dragasMBP to the DHCP lease of 192.168.0.105 because the name exists in /etc/hosts with address 192.168.0.100
Nov 21 21:26:02 dragasNG cron.err crond[7488]: crond (busybox 1.15.3) started, log level 5
Nov 21 21:26:02 dragasNG cron.err crond[7584]: crond (busybox 1.15.3) started, log level 5
Nov 21 21:26:03 dragasNG daemon.notice miniupnpd[627]: received signal 15, good-bye
Nov 21 21:26:03 dragasNG user.notice miniupnpd: removing firewall rules for eth1 from zone wan
Nov 21 21:26:03 dragasNG local5.notice qosmon[483]: terminated sigterm=15, sel_err=-1
Nov 21 21:26:06 dragasNG user.info firewall: adding lan (br-lan) to zone lan
Nov 21 21:26:06 dragasNG user.info firewall: adding wan (eth1) to zone wan
Nov 21 21:26:08 dragasNG cron.err crond[8195]: crond (busybox 1.15.3) started, log level 5
Nov 21 21:26:09 dragasNG user.notice miniupnpd: adding firewall rules for eth1 to zone wan
Nov 21 21:26:09 dragasNG user.info syslog: SNet version started
Nov 21 21:26:09 dragasNG daemon.notice miniupnpd[8353]: HTTP listening on port 5000
Nov 21 21:26:09 dragasNG daemon.notice miniupnpd[8353]: Listening for NAT-PMP traffic on port 5351
Nov 21 21:26:12 dragasNG cron.err crond[9336]: crond (busybox 1.15.3) started, log level 5
Nov 21 21:26:33 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a IEEE 802.11: disassociated
Nov 21 21:26:34 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a IEEE 802.11: deauthenticated due to inactivity
Nov 21 21:26:37 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a IEEE 802.11: authenticated
Nov 21 21:26:37 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a IEEE 802.11: associated (aid 1)
Nov 21 21:26:37 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a WPA: pairwise key handshake completed (RSN)
Nov 21 21:26:37 dragasNG daemon.info dnsmasq-dhcp[7464]: DHCPREQUEST(br-lan) 192.168.0.105 00:26:08:e1:aa:6a 
Nov 21 21:26:37 dragasNG daemon.info dnsmasq-dhcp[7464]: DHCPNAK(br-lan) 192.168.0.105 00:26:08:e1:aa:6a static lease available
Nov 21 21:26:37 dragasNG daemon.info dnsmasq-dhcp[7464]: DHCPDISCOVER(br-lan) 00:26:08:e1:aa:6a 
Nov 21 21:26:37 dragasNG daemon.info dnsmasq-dhcp[7464]: DHCPOFFER(br-lan) 192.168.0.100 00:26:08:e1:aa:6a 
Nov 21 21:26:38 dragasNG daemon.info dnsmasq-dhcp[7464]: DHCPREQUEST(br-lan) 192.168.0.100 00:26:08:e1:aa:6a 
Nov 21 21:26:38 dragasNG daemon.info dnsmasq-dhcp[7464]: DHCPACK(br-lan) 192.168.0.100 00:26:08:e1:aa:6a dragasMBP
Nov 21 21:27:33 dragasNG authpriv.info dropbear[10685]: Child connection from 192.168.0.100:62299
Nov 21 21:27:37 dragasNG authpriv.notice dropbear[10685]: Password auth succeeded for 'root' from 192.168.0.100:62299
Nov 21 21:29:24 dragasNG cron.err crond[12026]: crond (busybox 1.15.3) started, log level 5
Nov 21 21:29:27 dragasNG cron.err crond[13052]: crond (busybox 1.15.3) started, log level 5
Nov 21 21:29:31 dragasNG daemon.info hostapd: wlan0: STA 00:90:4c:c5:00:34 WPA: group key handshake completed (RSN)
Nov 21 21:29:31 dragasNG daemon.info hostapd: wlan0: STA 00:26:37:3f:4a:cf WPA: group key handshake completed (RSN)
Nov 21 21:29:36 dragasNG daemon.info hostapd: wlan1: STA 00:26:08:e1:aa:6a WPA: group key handshake completed (RSN)


pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: QoS stops working after a few days on 1.5.0

Post by pbix »

Your logread output show the following:
Nov 21 21:26:02 dragasNG cron.err crond[7488]: crond (busybox 1.15.3) started, log level 5
Nov 21 21:26:02 dragasNG cron.err crond[7584]: crond (busybox 1.15.3) started, log level 5
Nov 21 21:26:03 dragasNG daemon.notice miniupnpd[627]: received signal 15, good-bye
Nov 21 21:26:03 dragasNG user.notice miniupnpd: removing firewall rules for eth1 from zone wan
Nov 21 21:26:03 dragasNG local5.notice qosmon[483]: terminated sigterm=15, sel_err=-1
A cron job starts and then several processes including qosmon receive the signal to terminate. What is in your /etc/crontab/root file?
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

draga
Posts: 13
Joined: Mon Nov 14, 2011 1:49 pm

Re: QoS stops working after a few days on 1.5.0

Post by draga »

Here's my crontab:

Code: Select all

0 3     * * *   wifi down
45 6    * * *   wifi up
0,1,11,21,31,41,51 * * * * /usr/bin/set_kernel_timezone >/dev/null 2>&1
0 0,4,8,12,16,20 * * * date -u  +"%Y.%m.%d-%H:%M:%S" >/usr/data/time_backup
0 0,4,8,12,16,20 * * * /tmp/do_bw_backup.sh

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: QoS stops working after a few days on 1.5.0

Post by pbix »

Remove the wifi up/down commands.
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

draga
Posts: 13
Joined: Mon Nov 14, 2011 1:49 pm

Re: QoS stops working after a few days on 1.5.0

Post by draga »

I can try it, but the problem seems to occur at different times (as the log I've sent), in which the wifi up/down weren't executed at all.
Do you think that there might be a problem with the cronjob?
Thank you very much for your help.

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: QoS stops working after a few days on 1.5.0

Post by pbix »

You can remove them all and then see what happens. Then add them back one a a time until you find the offender.
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

Post Reply