WRT54GL OOM, dnsmasq and crond killed

If your problem doesn't fall into one of the other categories, report it here.

Moderator: Moderators

jdmulloy
Posts: 12
Joined: Thu May 27, 2010 5:36 pm

WRT54GL OOM, dnsmasq and crond killed

Post by jdmulloy »

I just installed Gargoyle 1.3.5 on my WRT54GL and installed it as the router for my apartment. The main reason I did this so I could use the QoS feature so that bittorrent wouldn't kill the connection for everyone. So far it seems to be working well but I experienced a crash, sort of. The OS was still up and I could get to the router through the web config and ssh just fine, but I couldn't get out to the internet. I took a look at the output of dmesg and found that dnsmasq was killed disabling DNS and DHCP. I think it ran out of memory. ip_conntrack also seemed to have some trouble. Are there any parameters I should adjust like "Max Connections" on the "firewall->connection limits" page? Do I simply not have enough RAM in my router for what I'm trying to do?

I saved the entire dmesg output but I'll just post the interesting bits here. Let me know if you want me to post the full output or upload it somewhere.

Code: Select all

ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
ip_conntrack: table full, dropping packet.
NET: 24 messages suppressed.
ip_conntrack: table full, dropping packet.
NET: 30 messages suppressed.
ip_conntrack: table full, dropping packet.

Code: Select all

__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
VM: killing process crond
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
VM: killing process dnsmasq

jdmulloy
Posts: 12
Joined: Thu May 27, 2010 5:36 pm

Re: WRT54GL OOM, dnsmasq and crond killed

Post by jdmulloy »

Code: Select all

root@gateway:~# free   
              total         used         free       shared      buffers
  Mem:        14336        13684          652            0         1536
 Swap:            0            0            0
Total:        14336        13684          652
I just rebooted the router and it already low on memory. Maybe I just need a router with more RAM. I just bought a couple of the Gargoyle pocket routers, so I could rig something up but I'd hate to put all my internet access through wifi, although most of the traffic is already over wifi.

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: WRT54GL OOM, dnsmasq and crond killed

Post by pbix »

Yeah there appears to be issues running Gargoyle on the WRT54GL. I have one and am investigating myself. In the short term make sure you disable the active congestion controller and all your Quota rules. Then I think the rest of QoS should work OK.

Now if you could do a backup of your router settings and send them to me perhaps I could reproduce your results on my GL. In the testing I have done its been difficult to maintain a reproducible problem. You can also send you dmesg output and logread output. My email is pbix at bigfoot dot com.

My main router is a WRT54-TM which has twice the memory and no problems that I have found.

How many users and connections do you have when you get the OOM errors?
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

jdmulloy
Posts: 12
Joined: Thu May 27, 2010 5:36 pm

Re: WRT54GL OOM, dnsmasq and crond killed

Post by jdmulloy »

pbix wrote:Yeah there appears to be issues running Gargoyle on the WRT54GL. I have one and am investigating myself. In the short term make sure you disable the active congestion controller and all your Quota rules. Then I think the rest of QoS should work OK.

Now if you could do a backup of your router settings and send them to me perhaps I could reproduce your results on my GL. In the testing I have done its been difficult to maintain a reproducible problem. You can also send you dmesg output and logread output. My email is pbix at bigfoot dot com.

My main router is a WRT54-TM which has twice the memory and no problems that I have found.

How many users and connections do you have when you get the OOM errors?
This is the first time it's happened. One of my roommates told me the internet wasn't working and this is what I found. It's me and 2 other guys, one of them is a heavy bittorrent user and while he usually throttles and schedules his torrents to be courteous his torrents will still occasionally kill the bandwidth. The last straw was from the WOW downloader which killed our connection for hours during the day. I don't think I have active congestion control enabled, is that the QoS thing that was just released with 1.3.5? I also don't have any quota's setup. I'm pretty much just using QoS. I'll email the config and dmesg output.

Thanks for the quick reply.

jdmulloy
Posts: 12
Joined: Thu May 27, 2010 5:36 pm

Re: WRT54GL OOM, dnsmasq and crond killed

Post by jdmulloy »

I emailed you the router config backup and dmesg output. Unfortunately I did not know about logread so I didn't grab the log before the reboot.

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: WRT54GL OOM, dnsmasq and crond killed

Post by pbix »

From the looks of things in your dmesg file it appears that someone was trying to open a boatload of connections while the router was still booting. Maybe the router was rebooted while the torrents were active.

I am not sure the booting part really matters though since there will be a limit on connections in an case. If there is a way to configure that torrent client to limit connections that might help you. Other than that buy a router with more memory so you can support more connection simultaneously.

It would be interesting to know how many connections are common and how much memory remains in your router under heavy load.

How is QoS working for you other than this one incident?
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

jdmulloy
Posts: 12
Joined: Thu May 27, 2010 5:36 pm

Re: WRT54GL OOM, dnsmasq and crond killed

Post by jdmulloy »

I actually swapped in one of the routers I bought from this website. Tonight I think I'm having the same problem. No way to know for sure as I can't connect to it when it starts crashing. I did notice on the status page almost all of my memory was used. Any chance there's some sort of memory leak in the QoS code or does it just need a lot of memory?

For now I've turned off QoS since my roommate was getting pissed that the connection kept going down. I'll try to get you logs if it happens again.

EDIT: Just noticed that 1.3.6 has some memory usage fixes for QoS

EDIT2: I'll try upgrading the firmware later when the connection is not busy. I managed to pick the right image when I first got the router but I'm not sure which is the correct one. It would be nice if the instructions on the site told people how to upgrade the hardware sold on the site. I tried searching the forums for the answer but it's impossible because I'm searching for "gargoyle router".

Is gargoyle_1.3.7-atheros-combined.squashfs.img the right image?

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: WRT54GL OOM, dnsmasq and crond killed

Post by pbix »

I am not aware of any "memory leaks" at all in QoS code. Also in view QoS does not use much memory but I am aware that WRT54GL routers are struggling to run Gargoyle since they have only 16MB of memory and memory is tight in general. Other routers with 16MB may also have issues. Over the past few versions I have tried to reduce the amount of memory that Gargoyle needs to alleviate this problem. If you have multiple users using bit torrents or have a WAN speed higher than 10mbps then you need to upgrade to a router with more memory and processing speed.

The router your bought on the website should not be suffering any such problems since it has 32MB memory. Its unfortunate that the specification listed for the router do not say what its CPU speed is or suggest which firmware image should be used but I am sure Eric will fix that as soon as he reads this post so please wait a few days.

Gargoyle v1.3.7 QoS has a few issues with Class control editing on the UI but functionally there are no known problems with the code. There should be a new release soon which will fix these minor issues.

In your case its hard to believe that you are running out of memory since you have 32MB. Are you saying that when QoS is disabled that the memory never gets close to 100%. I am interested to know if you are using the active congestion controller or not? If you are try disabling that but leaving QoS enabled. Are you saying that you cannot get command-line access to the router using SSH when it is out to lunch?

If you can then logread and dmesg output would be interesting to post. Also top -n 1 >/tmp/topout would create a file which would also be interesting.
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

jdmulloy
Posts: 12
Joined: Thu May 27, 2010 5:36 pm

Re: WRT54GL OOM, dnsmasq and crond killed

Post by jdmulloy »

I am not using active congestion management. I probably could have gotten to it with ssh but I didn't have much time to debug because my roommate was getting upset. I'm going to try to setup a script on my computer to automatically download the log once a minute.

Are there any routers that work well with Gargoyle that have at least 32MB of memory?

I know this has probably already been requested, but are there any plans to allow for non destructive upgrades for minor version bumps? It's annoying to have to change wires around to fix the router settings after an upgrade.

I upgraded my router to 1.2.7 and when I tried to turn on QoS for upload I got a kernel oops.

Code: Select all

CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 00000000, ra == 801c7558
Oops[#1]:
Cpu 0
$ 0   : 00000000 10009c00 80240a70 00000000
$ 4   : 818de780 81a9bb10 00000000 00000000
$ 8   : 8027fda4 800450d0 00000000 00000000
$12   : 00000000 00000000 00000014 00424380
$16   : 81a9bb80 80da5700 80da5700 8194ba10
$20   : 81900460 81a9bb80 802b0000 81a9bc80
$24   : 00000000 8194ba24                  
$28   : 81a9a000 81a9bb00 00000000 801c7558
Hi    : 00000265
Lo    : 00026cee
epc   : 00000000 (null)
    Tainted: P          
ra    : 801c7558 0x801c7558
Status: 10009c03    KERNEL EXL IE 
Cause : 10800008
BadVA : 00000000
PrId  : 00019064 (MIPS 4KEc)
Modules linked in: ath_ahb ath_hal(P) ebt_arpnat ebt_redirect ebt_mark ebt_vlan ebt_stp ebt_pkttype ebt_mark_m ebt_limit ebt_among ebt_802_3 ebtable_nat ebtable_filter ebtable_broute 
ebtables xt_IMQ imq ipt_weburl ipt_webmon ipt_timerange nf_nat_tftp nf_conntrack_tftp nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp xt_iprange xt_HL xt_hl xt_MARK ipt_ECN 
xt_CLASSIFY xt_time xt_tcpmss xt_statistic xt_mark xt_length ipt_ecn xt_DSCP xt_dscp xt_string xt_layer7 ipt_bandwidth ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE iptable_nat nf_nat xt_CONNMARK 
xt_recent xt_helper xt_conntrack xt_connmark xt_connbytes xt_NOTRACK iptable_raw xt_state nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack pppoe pppox ipt_REJECT xt_TCPMSS ipt_LOG xt_comment 
xt_multiport xt_mac xt_limit iptable_mangle iptable_filter ip_tables xt_tcpudp x_tables ppp_async ppp_generic slhc ts_fsm ts_bm ts_kmp crc_ccitt
Process tc (pid: 1928, threadinfo=81a9a000, task=81842508, tls=00000000)
Stack : 81a9bbd0 81a5cb54 81943660 80055adc 00000000 00000000 00000000 801c7a88
        81900460 80da5700 00000000 818e5c00 818de780 801c75dc 81a9bb88 80d70100
        818f4000 81a4cec0 81a9bb80 00000000 818e5c00 8194ba10 80da5700 81900460
        00000000 801c76d4 00000001 00000002 8194b600 8194b600 81a9bb80 00000000
        00000000 801a42b8 00000020 81900460 80da5700 8194b600 8194ba00 81835400
        ...
Call Trace:[<80055adc>] 0x80055adc
[<801c7a88>] 0x801c7a88
[<801c75dc>] 0x801c75dc
[<801c76d4>] 0x801c76d4
[<801a42b8>] 0x801a42b8
[<801cf3fc>] 0x801cf3fc
[<801d05a8>] 0x801d05a8
[<800674d0>] 0x800674d0
[<801c7654>] 0x801c7654
[<801bff60>] 0x801bff60
[<801bfe5c>] 0x801bfe5c
[<801d2604>] 0x801d2604
[<801bfe4c>] 0x801bfe4c
[<801d1db4>] 0x801d1db4
[<801a976c>] 0x801a976c
[<801d21e0>] 0x801d21e0
[<801cf8b8>] 0x801cf8b8
[<801a976c>] 0x801a976c
[<801a1838>] 0x801a1838
[<801a142c>] 0x801a142c
[<8009aee8>] 0x8009aee8
[<80079728>] 0x80079728
[<8009b364>] 0x8009b364
[<80079728>] 0x80079728
[<801ac7b8>] 0x801ac7b8
[<801a1a10>] 0x801a1a10
[<80055adc>] 0x80055adc
[<801a0d8c>] 0x801a0d8c
[<801a0f5c>] 0x801a0f5c
[<801a677c>] 0x801a677c
[<800431d0>] 0x800431d0


Code: (Bad address in epc)

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: WRT54GL OOM, dnsmasq and crond killed

Post by pbix »

I have the WRT54G-TM.

It has 32 Megs of memory and works well for me.
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

Post Reply