Positive user experience with Gargoyle

Report issues relating to bandwith monitoring, bandwidth quotas or QoS in this forum.

Moderator: Moderators

Post Reply
daleq
Posts: 10
Joined: Wed Jul 18, 2012 5:25 pm

Positive user experience with Gargoyle

Post by daleq »

Summary
Using Gargoyle's QoS and Active Congestion Control,
one can go from 18% packet loss to less than 3% on VoIP calls,
even on saturated satellite ISP connections.

Overview
I first heard of Gargoyle via a LWN.net article
and was immediately intrigued by the Active Congestion Controller.
I volunteer as a network administrator for a camp in a remote location
that has a satellite ISP. The network there is generally "open" to
many users.
Concerns about VoIP call quality varying wildly from "OK" to "unusable"
caused me to think that Gargoyle could possibly save the day.

Experiments

First step
Read the excellent Quality of Service wiki page.
Go ahead and read it twice, it's that good.

Second step
Determine download and upload capacity of internet connection without
QoS or ACC. This is somewhat tricky, but using the Bandwidth
Usage page while downloading a large file like Ubuntu and simultaneously
uploading large files seems best to me. I don't recommend "speed test"
sites since they are too short to give a more accurate, long-term
representation of bandwidth.

First experiment
Without QoS or ACC, measure VoIP packet loss on call when
simultaneously uploading and downloading.

- Configure Wireshark to capture packets to/from Vonage VoIP device
- Start upload and download
- Start wireshark capture, make call including upstream and downstream
content (talking and listening), end call, stop capture
- In wireshark capture, scroll midway down capture, find an RTP packet
from local IP to remote IP, and select Telephony | RTP -> Stream Analysis...
- Observations show 12% to 18% packet loss on call in both directions
"terrible"

Second experiment
Enable QoS and ACC

- For QoS enter 95% - 98% of recorded bandwidth
-- we actually used about 90% in an attempt to account for "rain fade"
during winter
- Ensure traffic to/from VoIP device is put into "VoIP" QoS class
-- assign VoIP device MAC address to an IP,
-- assign traffic to/from that IP to "VoIP" class
- For ACC, leave ping limit on "Auto"

While looking at the bandwidth graphs during a call with QoS and ACC on
auto, I could see that ACC was "clamping" non-VoIP traffic to try to meet
ping times. It appears that "Auto" has a maximum of 500ms.
Since our satellite connection normally has ping times between
600-700ms, the ACC was "starving" non-VoIP traffic.
Chart showing non-VoIP traffic being "starved" when ACC on "Auto" ping time (500ms) and best ping time of connection is 600-700ms (satellite)
Chart showing non-VoIP traffic being "starved" when ACC on "Auto" ping time (500ms) and best ping time of connection is 600-700ms (satellite)
gargoyle_qos_active_congestion_control_at_500ms.png (66.41 KiB) Viewed 11249 times


When ping limit was set to 700ms, bandwidth shaping looked more
appropriate.
On normally 600ms ping connection (satellite ISP), show ACC function when ping limit set to 700ms.
On normally 600ms ping connection (satellite ISP), show ACC function when ping limit set to 700ms.
gargoyle_qos_ping_limit_700.png (68.08 KiB) Viewed 11249 times

Third experiment
Enable QoS and ACC, set ping limit to 700ms. Also, vary total bandwidth
to see affect on VoIP performance.

On the QoS upload page, we set the following
Name % BW Min BW Max BW
Voip 40% 165 nolimit
Fast 41% zero nolimit
Slow 1% zero nolimit
Normal 18% zero nolimit

For VoIP
- we set minimum bandwidth for VoIP to a little more than used by
2 simultaneous phone calls (80 kbps each)
- we also set VoIP % BW to 40% in a effort to minimize other
upload traffic while calls were in progress
- call quality is higher priority than general internet upload
capacity for us
- we assigned all traffic to/from our Vonage VoIP device's IP
address to the Voip class

BTW: My jaw dropped with amazement when I saw Load (kbps)
updating in real time for all the classes.
Wow!!! Nice work!

On the QoS download page, we set the following;
- same values for Voip in upload
- Enabled Active Congestion Control (ACC)
- Use custom ping limit: 700 ms

With this setup, we got less than 3% packet loss even while downloads
and uploads were "hogging" the internet connection.
Excellent!

Enhancement requests

- Increase maximum ping time in "Auto" mode from 500ms to 700ms
I don't know if the 600-700ms ping time is common to all satellite ISPs,
but that has been pretty consistent for the camp

- Limit upload bandwidth in same manner as download with ACC.
I'm actually kind of surprised that upload bandwidth is not being limited.
My understanding is that limiting upload is "easy" while limiting
download is "hard". Of course, I don't really know what is involved.

Conclusion
Thank you Eric, Paul, and all. Gargoyle is a very nice tool and works very well.

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: Positive user experience with Gargoyle

Post by pbix »

I appreciate your report on your experience with Gargoyle QoS over a satellite link. I am very interested in this type of link because I do have some experience with them and designed Gargoyle QoS with them in mind. However, I have no means of testing with such a link so it would not surprise me if some problems occur.

Regarding your measurement of bandwidth. Measuring download bandwidth as you did is OK but remember that when ACC is used it continuously adjusts the download limit based on the ping response. As a result you can actually enter a number which is a bit higher than you measure and let the ACC seek the correct number. There is not need to use 90% as you did. This is ideal for changing conditions like weather. You can monitor the status section of the ACC to see what download limit it is using at any time.

For measuring upload bandwidth you should upload a large file while the downlink is NOT active. One way to do this would be to upload a massive file to www.dropsend.com. This gives you a good measure of what this speed truly is.

Regarding the auto measurement function. You did not mention what version of Gargoyle you are using or what download/upload speeds you entered into your QoS screens. ACC switches between two limits, based on if a MINRTT class is active or not. So in auto mode it should for example switch between 500ms and 700ms automatically. Please comment on how that was working. I would like to see that the auto function is made to work well with satellite links.

Upload QoS works well in my experience. Since we have direct control of the upload bandwidth I do not feel using an indirect control like ACC would have much benefit. On the upload side entering 90% of the measure value should work well I would think. What problems have you had that lead you to request the the uplink control be the same as the downlink control?
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

daleq
Posts: 10
Joined: Wed Jul 18, 2012 5:25 pm

Re: Positive user experience with Gargoyle

Post by daleq »

Thanks for your reply. I'll try my best to answer.
...but remember that when ACC is used it continuously adjusts the download limit based on the ping response. As a result you can actually enter a number which is a bit higher than you measure and let the ACC seek the correct number. There is not need to use 90% as you did. This is ideal for changing conditions like weather.
OK. This makes sense and matches what I was hoping for from Gargoyle.

However, I think what prompted us to set the lower than capacity values is that when we did testing with higher down/up-load limits, the packet loss on calls was higher (and voice quality lower). Thinking about it now, I realize that most of our test calls were approximately 40-60 seconds long and this probably did not allow enough time for ACC to "react". I see from the first chart in my original post that it appears ACC takes about 4 minutes to completely "react" to conditions. This reaction time does not seem unreasonable, but I didn't consider it at the time of my experiments.

In my experiments, I got the following results;

QoS download - VoIP packet loss
1000 kbits - 12.8%
750 kbits - 9.6%
650 kbits - 9.4%
550 kbits - 2.2%
(all while simultaneously downloading Ubuntu .iso)
For measuring upload bandwidth you should upload a large file while the downlink is NOT active.
I'll remember this next time.
ACC switches between two limits, based on if a MINRTT class is active or not. So in auto mode it should for example switch between 500ms and 700ms automatically.
This does not seem unreasonable for most users, but the satellite ISP at the camp seems to always run in the 600-700ms range. Therefore, the 500ms default maximum with MINRTT active will "starve" the other classes. For us, a better default maximum for MINRTT would be 700 and 900 ms. But, I don't know if that is way too much for most Gargoyle users. Allowing the value to be set manually was smart and worked fine for us. It just took a while for me to figure out what was happening.

FYI - We were running 1.5.5
Upload QoS works well in my experience. Since we have direct control of the upload bandwidth I do not feel using an indirect control like ACC would have much benefit. On the upload side entering 90% of the measure value should work well I would think. What problems have you had that lead you to request the the uplink control be the same as the downlink control?
Reviewing my notes, I'm a little fuzzy on the details, so take these comments with a grain of salt. It seemed that uploads were disrupting the VoIP calls, but I'm not sure we had specified minimum bandwidth of 165 kbits/s on uploads for the VoIP class when the experiment was run.

The last experiment we ran, we had minimum bandwidth for VoIP set to 165 kbits/s (both up and down), ran two VoIP calls while simultaneously downloading and uploading. Here are the results;

Call #1
forward RTP: 1.3% packet loss
reverse RTP: 3.5%
Call #2
forward RTP: 2.7% packet loss
reverse RTP: 2.0%

So, this was great and we were really happy with this.

I guess my comment about ACC working with upload is that way you don't have to be quite as worried about measuring upload bandwidth and then setting your max to 90% of that value.

Thanks again.

pbix
Developer
Posts: 1373
Joined: Fri Aug 21, 2009 5:09 pm

Re: Positive user experience with Gargoyle

Post by pbix »

daleq wrote:Thinking about it now, I realize that most of our test calls were approximately 40-60 seconds long and this probably did not allow enough time for ACC to "react".
My testing shows that ACC should react much faster than 4 minutes. I would think 30 seconds would be plenty of time. You watch the ping times come into line in realtime on the ACC status section. Do you really see it take 4 minutes for the ping times to get controlled under the ping limit?
daleq wrote:This does not seem unreasonable for most users, but the satellite ISP at the camp seems to always run in the 600-700ms range. Therefore, the 500ms default maximum with MINRTT active will "starve" the other classes. For us, a better default maximum for MINRTT would be 700 and 900 ms.
After your report I looked at the ACC source code. Currently the ACC clamps all measurements at no more than 500ms. This was an arbitrary limit I put in worrying that the measurement algorithm would arrive at an invalid result. This clamp needs to be moved up in your case. Its a pretty simple fix but I am out of the country right now and cannot compile anything to send you. If you can download Gargoyle and compile I can tell you what line to change if you want to try and help make Gargoyle work better on satellite links. Moving the limit up will result in something link 650ms and 1500ms as the limit I would guess.

If you set the upload link at 90% of the measured value you should get a good result for all your traffic. Even 95% would probably work. I do not think the satellite is saturated on the uplink the way it can get on the downlink. Using the min bandwidth as you are doing is perfect for VoIP traffic.
Linksys WRT1900ACv2
Netgear WNDR3700v2
TP Link 1043ND v3
TP-Link TL-WDR3600 v1
Buffalo WZR-HP-G300NH2
WRT54G-TM

bawjkt
Posts: 32
Joined: Tue Jul 03, 2012 1:11 pm

Re: Positive user experience with Gargoyle

Post by bawjkt »

Great post and subsequent discussion !

daleq
Posts: 10
Joined: Wed Jul 18, 2012 5:25 pm

Re: Positive user experience with Gargoyle

Post by daleq »

My testing shows that ACC should react much faster than 4 minutes. I would think 30 seconds would be plenty of time. You watch the ping times come into line in realtime on the ACC status section. Do you really see it take 4 minutes for the ping times to get controlled under the ping limit?
I guestimated a 4 minute reaction time for ACC from the first graph in my original post. This seemed to be the time it took for all non-VoIP traffic to be minimized. Your test results are likely more accurate. Assuming the ACC does react in 30 seconds, I think that still could have been enough to cause the packet loss we were seeing on the VoIP calls since our tests were only 40-60 seconds.
After your report I looked at the ACC source code. Currently the ACC clamps all measurements at no more than 500ms. This was an arbitrary limit I put in worrying that the measurement algorithm would arrive at an invalid result. This clamp needs to be moved up in your case. ... Moving the limit up will result in something link 650ms and 1500ms as the limit I would guess.
I recommend 700ms for the maximum default MINRTT time. The basis for this number is from looking at ping times during normal, but not overloaded usage;
816 ms
612
631
657
644
732
679
621
716
592
579

I believe this is a fairly representative sample of normal ping times from our satellite ISP.


Its a pretty simple fix but I am out of the country right now and cannot compile anything to send you. If you can download Gargoyle and compile I can tell you what line to change if you want to try and help make Gargoyle work better on satellite links.
Thanks, but inserting the manual value seems to have resolved the problem for now. I'm normally only at the camp once a year, so if you could make the change when you get back, that will be plenty soon.
If you set the upload link at 90% of the measured value you should get a good result for all your traffic. Even 95% would probably work. I do not think the satellite is saturated on the uplink the way it can get on the downlink. Using the min bandwidth as you are doing is perfect for VoIP traffic.
Yes, min bandwidth for VoIP seemed to be the key setting to ensure that other uploads didn't "trash" the connection.

Thanks!

daleq
Posts: 10
Joined: Wed Jul 18, 2012 5:25 pm

Re: Positive user experience with Gargoyle

Post by daleq »

Here's some background for where I'm going in the conversation now.
daleq said
Enhancement requests
- Limit upload bandwidth in same manner as download with ACC.
pbix said
Since we have direct control of the upload bandwidth I do not feel using an indirect control like ACC would have much benefit.
daleq said
Reviewing my notes, I'm a little fuzzy on the details, so take these comments with a grain of salt. It seemed that uploads were disrupting the VoIP calls, but I'm not sure we had specified minimum bandwidth of 165 kbits/s on uploads for the VoIP class when the experiment was run.
pbix said
Using the min bandwidth as you are doing is perfect for VoIP traffic.
daleq said
Yes, min bandwidth for VoIP seemed to be the key setting to ensure that other uploads didn't "trash" the connection.
I'm so impressed with ACC that it seems to me you may want to consider using it to control uploads also. The reason I say this is because before I had properly set the minimum bandwidth for VoIP at 165 kbps (and possibly before I enabled QoS on upload at all -- perhaps making this whole post irrelevant) I saw the following;

A VoIP call was in progress, I started a download (which ACC seemed to keep in check) and then started an upload. The upload seemed to cause the ping times greatly increase ( 800 - 2500! ms) and I believe this effectively "killed" the VoIP call in progress.

Here's a picture of what I saw (keeping in mind that I'm not sure I had upload QoS enabled...)
gargoyle_qos_upload_can_still_cause_latency.png
gargoyle_qos_upload_can_still_cause_latency.png (230.82 KiB) Viewed 11185 times
I'm sorry that I'm not able to reproduce the conditions or give more precise feedback, but I _think_ that upload QoS was enabled when this happened. I'm almost certain that the minimum bandwidth of 165kbps was not enabled.

Anyway, if ACC worked on uploads, then the need to have class based minimum bandwidth and correct upload bandwidth settings are reduced. I see this as a "good thing". ACC is so cool (and unique) that I'd say "If a little is good, a lot must be better". :D Also, I just saw some VoIP traffic that was using 40 kbps bandwidth, so I assume that l7-filter had done it's work to identify some skype or other non-Vonage VoIP traffic. Again, wow! this is great that Gargoyle handled this "automagically". But, I was thinking that since I set minimum bandwidth to 165 kbps for VoIP (both up and down), that I was leaving some bandwidth "on the table" with the single Skype call that perhaps could be used more effectively if ACC worked both up and down.

If ACC worked up and down, then it seems to me that configuration could be greatly simplified. The user will give a "best guess" for available bandwidth (and now it's not so critical if the given value is off by a bit), assign % bandwidth values for the classes, and then set the order/priority of the classes. No need for minimum/maximum bandwidth per class since the class priorities take precedence and if a higher priority class is not using all it's assigned allotment, then other classes can "borrow".

I'm likely missing something that makes this impossible/impractical, but if it could work, it seems like it would be another huge win/feature of Gargoyle.

Thanks!

bawjkt
Posts: 32
Joined: Tue Jul 03, 2012 1:11 pm

Re: Positive user experience with Gargoyle

Post by bawjkt »

Re daleq "If ACC worked up and down, then it seems to me that configuration could be greatly simplified. The user will give a "best guess" for available bandwidth (and now it's not so critical if the given value is off by a bit), assign % bandwidth values for the classes, and then set the order/priority of the classes."

I agree with daleq on the concept of simplifying configuration.

Many people will not be sufficiently diligent to properly configure. If it were automatic, it would mean far more systems running properly configured.

I agree that we have total control of upload bandwidth unlike download bandwidth. But still curious to see pbix's response.

One other note is that on shared networks, often the bandwidth is not reliably static, so fixed amounts may not serve as well as auto-regulators.

Dynamic and not static controls is one flagship item that makes Gargoyle stand out to begin with.

bawjkt
Posts: 32
Joined: Tue Jul 03, 2012 1:11 pm

Re: Positive user experience with Gargoyle

Post by bawjkt »

One last note. Autoconfiguration could also eliminate misconfiguration by people who don't know how to measure bandwidth or who simply make mistakes.

there is a certain percentage of gargoyle installations out there where people entered the wrong numbers for the bandwidth and as a result their networks are running at a fraction of the speed they should.

dumas777
Posts: 14
Joined: Wed Oct 31, 2012 4:41 pm

Re: Positive user experience with Gargoyle

Post by dumas777 »

Wow :shock: those ping times to 4.2.2.1 remind me why camp sucks. After playing with QOS to keep mine consistently under 30ms (21 average) those pings remind me of dialup in the mid 1990's or so. Glad to see excellent Gargoyle has saved the day because based on that connection being able to use it for anything means Gargoyle has worked miracles.

Post Reply