Re: [nznog] [Cerowrt-devel] ping loss "considered harmful"

5 Mar 2015

      I had spoken to someone at nznog that promised to combine mrtg +
smokeping or cacti + smokeping so as to be able to get long term
latency and bandwidth numbers on one graph. cc added.

On Thu, Mar 5, 2015 at 12:38 PM, Matt Taggart <matt(a)lackof.org> wrote:
...
Dave Taht writes:
...
wow. It never registered to me that users might make a value judgement
based on the amount of ping *loss*, rather than latency, and in looking back in time, I can
think of multiple people that have said things based on their
perception that losing pings was bad, and that sqm-scripts was "worse
than something else because of it."
This thread makes me realize that my standard method of measuring latency
over time might have issues. I use smokeping
http://oss.oetiker.ch/smokeping/
in sqm-scripts's case, possibly, all you have been collecting is
largely worst case behavior, which I don't mind collecting as it tends
to be pretty good. :)

However, I have been unclear. In the main (modern - I don't know what
version you have) sqm code, IF you enable dscp squashing on inbound
(the default), you do end up with a single fq_codel queue, not 3, no
classification or ping prioritization. (it is the default because of
all the re-marking I have seen from comcast)

So if you are, as I am, monitoring your boxes from the outside, there
is no classification and prioritization present for ping.

do a tc -s qdisc show ifbwhatever (varies by platform) to see how many
queues you have. Example of a single queued inbound rate limiter +
fq_codel (yea! packet drop AND ecn working great!)

root(a)lorna-gw:~# tc -s qdisc show dev ifb4ge00
qdisc htb 1: root refcnt 2 r2q 10 default 10 direct_packets_stat 0
direct_qlen 32
 Sent 168443514948 bytes 334370551 pkt (dropped 0, overlimits
143273498 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 110: parent 1:10 limit 1001p flows 1024 quantum 300
target 5.0ms interval 100.0ms ecn
 Sent 168443514948 bytes 334370551 pkt (dropped 17480, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 1514 drop_overlimit 0 new_flow_count 125872421 ecn_mark 1044
  new_flows_len 0 old_flows_len 1

root(a)lorna-gw:~# uptime
 12:45:35 up 54 days, 22:33,  load average: 0.05, 0.05, 0.04

dscp classification in general, is only useful from within your own
network, going outside.
...
which is a really nice way of measuring and visualizing packet loss and
variations in latency. I am using the default probe type which uses fping
(ICMP http://www.fping.org/ ).
I LOVE smokeping and wish very much we had a way to combine it with
mrtg data to see latency AND bandwidth at the same time.
...
It has been working well, I set it up for a site in advance of setting up
SQM and then afterwards I can see the changes and determine if more tuning
is needed.  But if ICMP is having it's priority adjusted (up or down), then
the results might not reflect the latency of other services.
Fortunately the nice thing is that many other probe types exist
http://oss.oetiker.ch/smokeping/probe/index.en.html
So which probe types would be good to use for bufferbloat measurement? I
guess the answer is "whatever is important to you", but I also suspect
there is a set of things that ISPs are known to mess with.
HTTP? But also maybe HTTPS in case they are doing some sort of transparent
proxy?
DNS?
SIP?
I suppose you could even do explicit checks for things like Netflix (but
then it's easy to go off on a tangent of building a net neutrality
observatory).
On a somewhat related note, I was once using smokeping to measure a fiber
link to a bandwidth provider and had it configured to ping the router IP on
the other side of the link. In talking to one of their engineers, I learned
that they deprioritize ICMP when talking _with_ their routers, so my
measurement weren't valid. (I don't know if they deprioritize ICMP traffic
going _through_ their routers)
I do strongly recomend deprioritizing ping slightly, and as I noted, I
have seen many a borken
script that actually prioritized it, which is foolish, at best.

I keep hoping multiple (many!) someones here will go have lunch with
their company's oft lonely, oft starving sysadmin(s), to ask them what
they are doing as to firewalling, QoS and traffic shaping. Most of the
ones I have talked are quite eager to show off their work, which is
unfortunately often of wildly varying quality and complexity.

I find that an offer of saki and sushi are most conducive to getting
that conversation started.

I certainly would like to see more default corporate
firewall/QoS/shaping rules than I have personally, for various
platforms. Someone's got to have some good ideas in them... and it
would be nice to know how far the bad ones, have propagated.
...
--
Matt Taggart
matt(a)lackof.org
-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

Dave Taht

tags

participants (1)