Re: [nznog] [Cerowrt-devel] ping loss "considered harmful"
I had spoken to someone at nznog that promised to combine mrtg +
smokeping or cacti + smokeping so as to be able to get long term
latency and bandwidth numbers on one graph. cc added.
On Thu, Mar 5, 2015 at 12:38 PM, Matt Taggart
Dave Taht writes:
wow. It never registered to me that users might make a value judgement based on the amount of ping *loss*, rather than latency, and in looking back in time, I can think of multiple people that have said things based on their perception that losing pings was bad, and that sqm-scripts was "worse than something else because of it."
This thread makes me realize that my standard method of measuring latency over time might have issues. I use smokeping
in sqm-scripts's case, possibly, all you have been collecting is largely worst case behavior, which I don't mind collecting as it tends to be pretty good. :) However, I have been unclear. In the main (modern - I don't know what version you have) sqm code, IF you enable dscp squashing on inbound (the default), you do end up with a single fq_codel queue, not 3, no classification or ping prioritization. (it is the default because of all the re-marking I have seen from comcast) So if you are, as I am, monitoring your boxes from the outside, there is no classification and prioritization present for ping. do a tc -s qdisc show ifbwhatever (varies by platform) to see how many queues you have. Example of a single queued inbound rate limiter + fq_codel (yea! packet drop AND ecn working great!) root(a)lorna-gw:~# tc -s qdisc show dev ifb4ge00 qdisc htb 1: root refcnt 2 r2q 10 default 10 direct_packets_stat 0 direct_qlen 32 Sent 168443514948 bytes 334370551 pkt (dropped 0, overlimits 143273498 requeues 0) backlog 0b 0p requeues 0 qdisc fq_codel 110: parent 1:10 limit 1001p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn Sent 168443514948 bytes 334370551 pkt (dropped 17480, overlimits 0 requeues 0) backlog 0b 0p requeues 0 maxpacket 1514 drop_overlimit 0 new_flow_count 125872421 ecn_mark 1044 new_flows_len 0 old_flows_len 1 root(a)lorna-gw:~# uptime 12:45:35 up 54 days, 22:33, load average: 0.05, 0.05, 0.04 dscp classification in general, is only useful from within your own network, going outside.
which is a really nice way of measuring and visualizing packet loss and variations in latency. I am using the default probe type which uses fping (ICMP http://www.fping.org/ ).
I LOVE smokeping and wish very much we had a way to combine it with mrtg data to see latency AND bandwidth at the same time.
It has been working well, I set it up for a site in advance of setting up SQM and then afterwards I can see the changes and determine if more tuning is needed. But if ICMP is having it's priority adjusted (up or down), then the results might not reflect the latency of other services.
Fortunately the nice thing is that many other probe types exist
http://oss.oetiker.ch/smokeping/probe/index.en.html
So which probe types would be good to use for bufferbloat measurement? I guess the answer is "whatever is important to you", but I also suspect there is a set of things that ISPs are known to mess with. HTTP? But also maybe HTTPS in case they are doing some sort of transparent proxy? DNS? SIP? I suppose you could even do explicit checks for things like Netflix (but then it's easy to go off on a tangent of building a net neutrality observatory).
On a somewhat related note, I was once using smokeping to measure a fiber link to a bandwidth provider and had it configured to ping the router IP on the other side of the link. In talking to one of their engineers, I learned that they deprioritize ICMP when talking _with_ their routers, so my measurement weren't valid. (I don't know if they deprioritize ICMP traffic going _through_ their routers)
I do strongly recomend deprioritizing ping slightly, and as I noted, I have seen many a borken script that actually prioritized it, which is foolish, at best. I keep hoping multiple (many!) someones here will go have lunch with their company's oft lonely, oft starving sysadmin(s), to ask them what they are doing as to firewalling, QoS and traffic shaping. Most of the ones I have talked are quite eager to show off their work, which is unfortunately often of wildly varying quality and complexity. I find that an offer of saki and sushi are most conducive to getting that conversation started. I certainly would like to see more default corporate firewall/QoS/shaping rules than I have personally, for various platforms. Someone's got to have some good ideas in them... and it would be nice to know how far the bad ones, have propagated.
-- Matt Taggart matt(a)lackof.org
-- Dave Täht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
participants (1)
-
Dave Taht