Linux 3.3: Finally a little good news for bufferbloat

While I was out chasing computer history last week, the Linux 3.3 kernel was released. And a very interesting release it is, though not for its vaunted re-inclusion of certain Android kernel hacks. I think that modest move is being overblown in the press. No, Linux 3.3 appears to be the first OS to really take a shot at reducing the problem of bufferbloat. It’s not the answer to this scourge, but it will help some, especially since Linux is so popular for high volume servers.

Bufferbloat, as you’ll recall from my 2011 predictions column, is the result of our misguided attempt to protect streaming applications (now 80 percent of Internet packets) by putting large memory buffers in modems, routers, network cards, and applications. These cascading buffers interfere with each other and with the flow control built into TCP from the very beginning, ultimately breaking that flow control, making things far worse than they’d be if all those buffers simply didn’t exist.

Bufferbloat was named by Jim Gettys of Bell Labs, who has become our chief defender against the scourge, attempting to coordinate what’s become a global response to the problem.

Linux 3.3 isn’t the total solution to bufferbloat but it’s a big step, particularly for servers.

Prepare for technospeak.

One issue is the very large ring buffers described above. A typical device driver has these buffers set at 200-300 packets, a figure derived a decade ago as a worst case to allow devices to drive Gig-Ethernet flat-out using small packets. But not all packets are small, and there’s the rub.

Because these rings are necessarily expressed in packets, rather than in bytes, the length of time to transmit the packet can be radically different and this meant the arbitrary buffers can be up to 20 times larger than they need to be when sending big packets. These rings are often constrained to be powers of two in size, and the size can’t easily be changed at runtime without dropping packets.

So the Linux 3.3 kernel now implements Byte Queue Limits (BQL) which controls how many bytes, rather than how many packets, go to the ring buffers at once. The buffer size can now depend on the size of the packets — many for small packets, or only a few for large packets. Buffers get smaller and life gets better as a result.

BQL is currently implemented for Ethernet where the buffers can be sized so we can take advantage of smart hardware at high bandwidth. But BQL is not available for wireless networks, nor is it likely to be, simply because wireless bandwidth varies too much to express limits in terms of bytes.

According to Jim Gettys, the ultimate answer to bufferbloat is Active Queue Management (AQM), which isn’t yet ready for prime time but may be soon. Here’s Jim’s explanation from a message he sent me last week:

The purpose of buffers is to absorb bursts of traffic, which often occur in a network. You’d prefer to keep buffers almost empty even when running the link at its full speed to minimize latency; in fact, TCP running ideally can deliver packets close to “just in time” for the next transmit opportunity, so if TCP is running at the link speed, even though there is a buffer, the buffer could conceivably be kept (nearly) empty, and impose little delay.

But any size of unmanaged buffer can and will fill, and stay full. After all, TCP is designed to run “as fast as possible”. So no matter what size “dumb” buffer you have, it can/will add latency; how much depends on its size.

In the face of variable bandwidth, and today’s internet with CDN’s, the traditional 100ms rule of thumb sizing of buffers (already excessive for good telephony) is nonsense. You don’t know the delay, you don’t know the bandwidth, so you really, really don’t know the bandwidth-delay product to size the buffers with…..

What AQM does is monitor the buffer, and signal the end points to slow down any time the buffer starts to fill, either due to that one transfer or competing transfers, by dropping or marking packets. So the buffer is kept (almost) empty, except when it is handling a burst of traffic. So the steady state latency of the buffer, rather than being the size of the buffer, is set by the size of the bursts in traffic. The size of the buffer becomes almost irrelevant.

Any link without AQM at a bottleneck is really defective; we must use an AQM algorithm everywhere….

The classic AQM algorithm is known as RED. It, however, is defective and requires manual tuning and can hurt you if it is mis-tuned. As a result, it’s not present in most edge devices, and not even turned on in many ISP networks where it should be.

What we need is an AQM algorithm that does the right thing without manual tuning, capable of dealing with varying bandwidth. I’ve seen simulations of an algorithm that apparently works, but it’s not yet available and has not been tested or fully simulated.

This is Bob again. So Linux 3.3 with BQL is good but not good enough. AQM is required but that’s still two years away from shipping in devices while the mystery algorithm is tested and the network diplomacy begins.

By Robert X. Cringely|March 25th, 2012|2012|39 Comments

39 Comments

Steveorevo March 26, 2012 at 12:09 am

Hopefully adoption will appear in devices sooner then later. I wonder which carriers and end devices are the biggest offenders? Most home routers feature firmware upgrades, but rolling that out doesn’t sound very practical.
- lucas dicioccio March 28, 2012 at 7:44 am
  
  In our PAM paper using Netalyzr and HomeNet Profiler data, we could build a small list of “likely big buffer” home gateways. We hoped to have more names but UPnP identifiers are not accurate enough. Still, the results are interesting.
  http://pam2012.ftw.at/papers/PAM2012paper9.pdf
JJ March 26, 2012 at 1:45 am

2nd … Best I could muster in all the years
Scott March 26, 2012 at 8:21 am

The picture (intended to convey bloat?) looks suspiciously like the OpenBSD logo. 🙂
- ozmark March 26, 2012 at 11:31 am
  
  http://pixar.wikia.com/Bloat – not sure of licensing….
Scott March 26, 2012 at 8:24 am

Stevorevo: If you buy the right device, you can adopt countermeasures now: https://www.bufferbloat.net/projects/cerowrt

Cheap home routers turn over every few years… It’s probably easier just to wait for people to buy new ones than to try to change working setups in the field.
Wayne March 26, 2012 at 10:25 am

Why is it that the ONLY people who talk about this stuff are Linux people? Where are Microsoft, Apple, IBM, Cisco, HP, etc? Why are they completely silent on this issue? I find it interesting that Linux, the platform that everyone loves to hate, is the only one that takes network performance seriously.
- Pekka Buttler March 26, 2012 at 12:47 pm
  
  Wayne said: “I find it interesting that Linux, the platform that everyone loves to hate… ”
  Interesting notion.
  I have friends who love to hate MS and Windows.
  Likewise, I have friends who love to hate Apple and MacOS/iOS.
  I personally love to hate android.
  But I’ve yet to encounter someone who’d love to hate Linux. Sure, they like to label it as suitable for geeks, belittle it’s economic impact, while talking out of experiences on the levels of “I once installed debian, it was a pain…” or My nephew’s using Linux and MS Office does not run on it…”
  Thenagain, maybe that’s just because openly hating Linux over here (Finland) would be somewhat sacrilegeous.
  
  On a totally different topic, I know some people who hate to love linux (as well as some who just love it), these being those people who’d do nothing more gladly than support the OSS-movement, in it’s every form, but not being real hackers/geeks/nerds and having been brought up using Windows or Mac OS (but mainly windows) just can’t make the paradigm shift.
  
  Cheers.
  
  P.S. Bob, I won’t try to badger you into not retiring. You’ve had a long career, you’ve done more than your share and deserve the time to be with your kids and teach them fly-fishing (or whatever). I’ll respect your decision, whatever it will be (but I’ll admit to feel a loss if you do retire).
  ’nuff said.
  - Linux Rulex September 2, 2012 at 9:18 am
    
    Your argument is invalid. Android is Linux with Java VM.
    Microsoft and Apple is bullshit brands.
- JJones March 27, 2012 at 1:33 am
  
  “Why is it that the ONLY people who talk about this stuff are Linux people? Where are Microsoft, Apple, IBM, Cisco, HP, etc?”
  
  I was surprised to see a section in the Staples flyer for Cisco’s Linksys routers with the caption “Win the War on Buffering” !
  
  Cisco’s blog doesn’t actually reference bufferbloat although the single comment does directly link the two:
  http://blogs.cisco.com/consumer/win-the-war-on-buffering/
  
  The comment therein points to a project with linux-based firmware for Linksys routers to deal with bufferbloat: https://www.bufferbloat.net/
  
  so, yeah.. it’s comes back to Linux again 😉
  - Dave Taht March 27, 2012 at 6:59 pm
    
    I can think of a few reasons why appears the Linux community that seems to have responded the most, thus far.
    
    But first: Academia responded, too, at the pace it works at. The FCC has funded a few studies. Much of the recent caida workshop was about bloat-related issues and end to end network studies of other problems we are seeing on the edge.
    
    ( https://www.caida.org/workshops/isma/1202/ )
    
    As for the vendors…
    
    While we have done our best to alert other communities such as BSD, and have talked with many vendors, there are long lead times involved in developing embedded products, which those companies don’t talk about until fully baked.
    
    With multiple, unpredictable theoretical breakthroughs needed on various fronts, as well, to add uncertainty to their schedules… core confusion as to the theory… no obvious customer demand (as yet), etc, it’s no wonder there’s been little noise from vendors.
    
    I like to think that linux 3.3 is a start towards seeing a dramatic reduction of latencies net-wide, and that products will appear using it, soon.
    
    BQL is a breakthrough, no doubt about it. I hope that multiple vendors adopt it quickly, and the core concept in it, is made available on multiple operating systems. The aformentioned SFQ and SFQRED and QFQ are modernized enhancements to older ideas (with a few new twists!) that do wonders on hard rate limited devices (e.g. ethernet).
    
    Despite all the work that’s taken place over the past year, another theoretical breakthrough seems required to make a big dent on the wildly variable wireless problem, and a similar one is needed to handle the soft-limited networks most end-users are connected to, as well. Bob and Jim are alluding to one, but I am not convinced we have it yet.
    
    Anyway, moving back to ‘why Linux?’
    
    Because (in addition to bob) Jim convinced a lot of key network giants, and then a bunch of Linux people that we had a real problem here. The entire core networking team (netdev) now ‘gets it’. The work that they do is very public, so people hear about it from them first, unfiltered by marketing-speak.
    
    Also, linux is pushing into 10GigE and 40Gig networking, which to handle efficiently in the server, induces bloat-related problems downstream on the network. (TSO and GSO offloads, notably). So the people working on that sort of stuff were noticing bufferbloat related weirdness, and not understanding the cause.
    
    And, on the low end, Linux is also used on a lot of home routers, where with the introduction of soft bandwidth limits, wireless-n and faster rates to the home, and no AQM at all had made the pain become noticible to multitidues of home users.
    
    As for me, well… after helping invent the future we now live in, feel compelled to lend a hand in holding up the sky. ( http://esr.ibiblio.org/?p=4196 ) I think a lot of us old timers feel that way, actually.
    - Ronc March 28, 2012 at 2:24 pm
      
      “no obvious customer demand (as yet)”… One should recognize that everyone who has ever used the Internet for video or audio has seen how bad it is at times. So it would not occur to them to switch from cable (or the limited over the air content) to IP based video or even audio for that matter. The expensive alternatives like wired AT&T long distance and cable TV are still the only high quality reliable solutions. When the Internet becomes as good for a little less cost there will be a massive switch over.
  - saldinger March 28, 2012 at 10:37 am
    
    Cisco’s “war on buffering” is not the same thing. Cisco is talking about throughput, where Bufferbloat is all about latency.
    - Ronc March 28, 2012 at 2:42 pm
      
      Yes, the Cisco “advice” is just an ad to get people to buy their currently on-the-market products and divert attention from the wps security debacle.
paul allen March 26, 2012 at 12:11 pm

youtube mobile is one of the worst casualties of bufferbloat. youtube mobile is virtually unwatchable on my otherwise fast home wifi network, which is able to stream hd video to my PS3 yet youtube is unable to play a 1 minute long sd clip without insane buffer times.

anyone else have this issue?
- Ronc March 26, 2012 at 2:25 pm
  
  Isn’t youtube mobile only installable on a smartphone? If so, perhaps it’s bad because of the phone itself instead of the Internet and the buffering. The PS3 or any computer is much more powerfull than a cell phone for processing video.
  - paul allen March 27, 2012 at 10:08 am
    
    True, although other video apps such as Vevo have no trouble buffering on my 4s.
    - Nigel March 27, 2012 at 12:19 pm
      
      … interesting because YouTube on my Android phone suffers no such problems, and I have a slow broadband connection…
      - Ronc March 27, 2012 at 2:08 pm
        
        Could it be that Youtube is able to use Flash on Android but on iOS they must resort to some other method?
Kevin Marks March 26, 2012 at 2:21 pm

There is a widely deployed Active Queue Management solution out there in µTP, the protocol that BitTorrent now uses by default in preference to TCP.
Have you noticed that all those ‘BitTorrent is destroying the internet’ articles went away recently?
See http://tools.ietf.org/html/draft-ietf-ledbat-congestion-09 or http://bittorrent.org/beps/bep_0029.html
- Dave Taht March 27, 2012 at 6:24 pm
  
  Ledbat appears to be a decently scavaging protocol against drop tail queue systems. It can still induce bloat. Secondly as the most visceral feeling you get from bloat is during big uploads usually only long term torrent-ing causes problems, and most people feel it on downloads on interactive stuff, or during streaming video.
  
  Thirdly… researching how ledbat works in non-drop-tail, active queue managed systems is a hot topic right now, at places like the lincs.fr lab. Some aspects of ledbat may need to be rethought as a result. We’ll see…
Links 27 Mar « Pink Iguana March 27, 2012 at 3:41 am

[…] 3.3: Finally a little good news for bufferbloat, here. In a world filled with big network packets and lots of little packets, BQL Byte Queue Limits […]
Nigel March 27, 2012 at 12:25 pm

This article made me think about cache technology as well. The difference being – I guess – that buffering bloat as mentioned is a software issue whereas a cache is generally a hardware issue. (?)

It was recently recommended I use the mSATA slot on my new motherboard to fit an SSD to act as a super-duper cache. Reading up on the subject it seems to work just like any other hardware cache and it gradually fills with commonly accessed files until… Well then I guess you’re back to square one unless you have some form of dynamic buffer control – which I guess (again!) is the subject of the article.

Those little 16mb / 8mb caches on our internal hard-drives certainly speed things up. Does a form of buffer-bloat occur there and what’s the outcome?
- saldinger March 28, 2012 at 10:44 am
  
  NO, my understanding is that your typical hard drive cache is addressable. As long as that remains the case you will never have a bufferbloat issue on hard drives… unless of course the disk is failing and is experiencing a large number of write errors.
  - Nigel March 28, 2012 at 12:40 pm
    
    Ok, interesting, thanks. So could an HD cache actually be more useful for trapping miscellaneous read/write errors and thus speeding processes up, rather than actually aiding data throughput?
    
    I know this isn’t network tech but I’m surmising both are facets of the same problem. If a bottleneck on the Internet slows down the movie I’m streaming that’s way more unpredictable than a local one where I can perhaps see the cause more easily. If a juddering video is down to millions of viewers grabbing it at once or corrupt data being recent packet by packet I would imagine that’s harder to cater for and correct…
    - saldinger March 28, 2012 at 2:11 pm
      
      You seem to almost be there…
      
      What you’ve got there are a number of issues. If you have too many users getting a streaming movie from a server, then you have a throughput issue and you need more bandwidth.
      
      If between you and the server is another router that is being over loaded every other minute. Then YOU need buffering on your end to mitigate the effects of packets from the movie server taking to long or never arriving.
      
      Now if between the server and you the a packet gets corrupt, then the TCP protocol will recognize this and try to get the other end to resend the data.
      
      Now the bufferbloat issue really comes into play is when say, all the buffers on all 3 devices are full and you get an error. So you’d have server out (10 secs) -> Router in/router out (20 secs) -> you (10 secs) OH NOES THE PACKET IS BAD PLEASE RE-SEND you +10 -> router +20 -> Server +20 -> router +20 -> you +10
      
      These numbers a waaay fictional but I’m sure you get the idea. A single retransmit of a packet with all devices having full buffers can take MUCH longer time to happen then if they where empty and taking 3ms at each interface.
John March 27, 2012 at 12:47 pm

The concept of keeping queues (or buffers) low or at zero is not new. Kanban and Just-in-time manufacturing techniques operate in a similar way. In the manufacturing world when one built up huge queues, it often masked quality problems. When they reduced the queues the problems were spotted faster and could be fixed quicker. The result was higher quality products, less rework, less waste, and lower cost.

The same principles apply in networking. Behind every big buffer is a problem that needs to be fixed. One of the problems in network is pacing data flows to the available bandwidth, AQM can help a lot there. There ARE other problems though. When there is a disciplined effort to look at the statistics and start fixing the network problems, then the Internet will run a lot better.

If someone really wants to simulate this stuff, contact me. I’ve done this stuff before. With the right tools it is not that hard.
Ronc March 27, 2012 at 2:16 pm

Bufferbloat: https://www.youtube.com/watch?v=0YGF5R9i53A
- Nigel March 28, 2012 at 12:33 pm
  
  Ha!! Excellent!!! Suddenly it’s much clearer – and more fun – than I expected! 😀
Dave Taht March 27, 2012 at 6:02 pm

In addition to BQL, there is a seriously improved implementation of “SFQ”, with a variant of RED, in it. Add SFQ or QFQ on top of BQL, and…

https://www.teklibre.com/~d/bloat/pfifo_fast_vs_sfq_qfq_linear.png

log scale: https://www.teklibre.com/~d/bloat/pfifo_fast_vs_sfq_qfq_log.png

add SFQRED, and queue management improves quite a bit (but it’s hard to show on the graph like the above, coherently, it’s better than SFQ or QFQ alone for many bandwidths)

I look forward to people trying and tuning BQL and trying qfq, sfq, and sfqred in the Linux 3.3 release.

I look forward to the successor algorithms very much also. Fixing wireless in general, and home routers in particular, is going to take a lot more work. But until then, BQL, SFQ, QFQ and SFQRED are a hint of things to come that’s worth implementing now that 3.3 is released.
Dave Taht March 27, 2012 at 10:57 pm

Correction for your article:

“this meant the arbitrary buffers can be up to 20 times larger than they need to be when sending big packets”

It’s far worse than that. With TSO or GSO enabled each tx descriptor can have 64k on it – Superpacket streams – 1000 times larger than the smallest packet. I’m pretty convinced at this point that A) modern cpu hardware handles normal gigE and lower just fine without TSO/GSO and B) the net effect of TSO/GSO on down stream buffering is very bad.

BQL makes a dent in this too. So does sfqred, but even the combination results in overall buffering 4-12 times greater than what BQL estimates as correct with TSO/GSO off. TSO/GSO are on by default, because it made sense a decade ago, when gigE was only in data centers and not in every laptop.

Sadly, presently, in-hardware solutions are needed with TSO on to the solve the aqm problem + 10GigE. Simply can’t pump the data fast enough otherwise. But on everything running at 1GigE or below, turning off TSO/GSO is a win on latency and overall network performance, not just for lowered buffering, but the quality of the native tcp/ip stack is generally far greater than what’s on the ethernet card.
Karel March 28, 2012 at 9:19 am

yes I wonder of our multi-tier JEE applications do not suffer the same : There is the ‘legacy’ RDBMS doing its own caching of pages to memory, the we put in a ORM layer to do persistence that does its own 1 or 2 levels of caching…
Links 29/3/2012: Red Hat’s Results, GNOME 3.4 Released | Techrights March 29, 2012 at 11:51 am

[…] Linux 3.3: Finally a little good news for bufferbloat While I was out chasing computer history last week, the Linux 3.3 kernel was released. And a very interesting release it is, though not for its vaunted re-inclusion of certain Android kernel hacks. I think that modest move is being overblown in the press. No, Linux 3.3 appears to be the first OS to really take a shot at reducing the problem of bufferbloat. It’s not the answer to this scourge, but it will help some, especially since Linux is so popular for high volume servers. […]
dr dre solo hd April 1, 2012 at 1:17 am

From time to time, beats by dre find that even with no beats by dre studio router, they’re still picking up a wireless signal. that is the price we could accept about louis items, their wireless alerts may be broadcasting via your louis vuitton men bags space. An answer here might be ask the neighbor to show off the facility to their router before they go to mattress, to your monster beats by dre studio and theirs. the dr dre headphone features simple silver trim and closely resembles the classic handset style beats by dre solo is known for and for those who crave a little louis vuitton shoulder bags and more colour. As a common rule, limit cell cheap beats by dre phone usage, particularly lengthy calls.Driving for hours can really be boring so having some louis vuitton denim entertainment could really help to break the louis vuitton monogram handbags monotony. Think movies or music or crayons and drawing pads, even books!
Manu April 2, 2012 at 3:39 am

A potential alternative solution :
http://arxiv.org/abs/1103.2303
Ashwin Rao April 9, 2012 at 11:32 pm

Are there any more details available of the algorithm that apparently works, but it’s not yet available and has not been tested or fully simulated.
abercrombie deutschland August 2, 2012 at 1:17 am

Ich glaube, die Fakten in Ihrem Zuschreibung geschrieben ist wirklich super. Ich habe seit der Arbeit auf einer vorläufigen Analyse Mission zu diesem Thema und Ihr Weblog wirklich mit einer Vielzahl von Überlegungen, die ich hatte geholfen. Ich erstelle eine Hausarbeit für die Schule und ich? M folgen derzeit viele Blogs für die Bewertung.
hollister uk August 13, 2012 at 7:21 pm

I believe the facts written within your write-up is really superb. I’ve been doing work on a preliminary analysis mission regarding this topic and your weblog really helped with numerous considerations that I had. I’m creating a term paper for school and I?m currently following many blogs for assessment.
Coach Factory October 26, 2012 at 1:05 am

Do you want to add rich texture to your garden? Do you want to display trophies or other awards in a manner befitting their importance?

https://www.coachfactoryonline2012onsale.com/ Coach Factory