Monday, January 23, 2012
Let's make TCP faster
By Yuchung Cheng, Make The Web Faster Team
Transmission Control Protocol (TCP), the workhorse of the Internet, is designed to deliver all the Web’s content and operate over a huge range of network types. To deliver content effectively, Web browsers typically open several dozen parallel TCP connections ahead of making actual requests. This strategy overcomes inherent TCP limitations but results in high latency in many situations and is not scalable.
Our research shows that the key to reducing latency is saving round trips. We’re experimenting with several improvements to TCP. Here’s a summary of some of our recommendations to make TCP faster:
1. Increase TCP initial congestion window to 10 (IW10). The amount of data sent at the beginning of a TCP connection is currently 3 packets, implying 3 round trips (RTT) to deliver a tiny 15KB-sized content. Our experiments indicate that IW10 reduces the network latency of Web transfers by over 10%.
2. Reduce the initial timeout from 3 seconds to 1 second. An RTT of 3 seconds was appropriate a couple of decades ago, but today’s Internet requires a much smaller timeout. Our rationale for this change is well documented here.
3. Use TCP Fast Open (TFO). For 33% of all HTTP requests, the browser needs to first spend one RTT to establish a TCP connection with the remote peer. Most HTTP responses fit in the initial TCP congestion window of 10 packets, doubling response time. TFO removes this overhead by including the HTTP request in the initial TCP SYN packet. We’ve demonstrated TFO reducing Page Load time by 10% on average, and over 40% in many situations. Our research paper and internet-draft address concerns such as dropped packets and DOS attacks when using TFO.
4. Use Proportional Rate Reduction for TCP (PRR). Packet losses indicate the network is in disorder or is congested. PRR, a new loss recovery algorithm, retransmits smoothly to recover losses during network congestion. The algorithm is faster than the current mechanism by adjusting the transmission rate according to the degree of losses. PRR is now part of the Linux kernel and is in the process of becoming part of the TCP standard.
In addition, we are developing algorithms to recover faster on noisy mobile networks, as well as a guaranteed 2-RTT delivery during startup. All our work on TCP is open-source and publicly available. We disseminate our innovations through the Linux kernel, IETF standards proposals, and research publications. Our goal is to partner with industry and academia to improve TCP for the whole Internet. Please watch this blog and http://code.google.com/speed/ for further information.
Yuchung Cheng works on the transport layer to make the Web faster. He believes the current transport layer badly needs an overhaul to catch up with other (networking) technologies. He can be reached at ycheng@google.com.
Posted by Scott Knaster, Editor
Subscribe to:
Post Comments (Atom)
This is awesome work. I'm glad that Google is putting their vast engineering resources to such good use.
ReplyDeleteRich, Gun.io
It is impressive to see how simple changes (justified by extensive research) can lead to a great improvement of the protocol. Well done!
ReplyDeleteIt would be nifty if someone with the know how made an opensource exe or msi that set all of these settings that are able to be set client-side.
ReplyDeleteI wrote a program that you're now requesting a few months back: for the Microsoft stuff anyway. The Google stuff would have to be developed into a driver for Windows to work properly.
Deletehttp://unquietwiki.com/programming/ (look for NetToggle)
I'd be interested in hearing if these experiments have been reproduced in less-privileged connectivity situations, such as 3G, satellite, or even dial-up. As one who still, of necessity, lives in that environment most of the time, it's my impression that researchers often operate under the assumption that these constraints are rapidly going away, but they remain the only options in many rural areas even in the US today. It's all too easy to imagine that what optimizes the high-bandwidth experience, such as shortening the initial timeout from 3 seconds to 1, might actually make things worse for those of us who already have it worst.
ReplyDeleteThats thinking outside the box....(I concur)
DeleteGreat question. We do take into account the impact of our changes to TCP on users with low access bandwidth, such as dial-up. A couple of examples: In the IW10 work we explicitly analyzed the large-scale experiment results for dial-up and mobile users. This is in addition to also doing more controlled testbed experiments to reproduce what we were observing in our cluster wide experiments. A second and more recent example, is the experiments with proportional rate reduction (PRR) in India datacenters where bandwidth is typically on lower end.
DeleteI don't think these changes were meant for widespread consumer use on low-bandwidth connections, but rather to reduce the latency of the network traffic within their datacenters on high-bandwidth connections. TCP's congestion control mechanisms currently don't scale well to high-bandwidth connections, the recovery time takes too long after triple duplicate ACKs.
DeleteWhat Nathanial said. We need default algorithms in stations which respond to conditions that they experience, and remember this system wide for a period of time. (HappyEyeBalls is an example of this, but is browser specific, not system specific). TCP Fast Open (what happened to TTCP?) likely fails HORRIBLY on bank's CISCO PIX firewalls.
ReplyDeleteAnd it's not just connectivity which is an issue, but also the fact that TCP now runs on very small devices, where these things are not relevant, and things have to work. How does TCP Fast Open fail when the responder does not get it?
These are not too new of items. What is needed is a tool that will automatically discover the best settings given the users current context. As Nathanial indicates not everyone is on a high-bandwidth link. What this likely is made up of is code that looks at the current connnections to tweek future connections settings. Generally each connection starts with some configured default value. Where the default values are static. I suggest the default settings should be close to a current-consensus.
ReplyDeleteAny thoughts on putting payload into the SYNACKs? (server and client both) This would be the ultimate speed-up...
ReplyDeleteThe buffer bloat issue needs to be resolved first, otherwise you will be wasting a lot of time.
ReplyDeletehttp://en.wikipedia.org/wiki/Bufferbloat
David, either participant still could not pass the data to the socket client before the handshake was complete. They would have to buffer the data, and that would just amplify the severity of syn flood attacks.
ReplyDeleteI can agree that those initiatives are positive things, but ISP's buffer bloat, which is defeating the TCP's existing congestion algorithms and introducing latency, make them moot. One may consider buffer bloat orthogonal to these issues, but ISPs do it for a reason. Hearing these initiatives phrased in a manner so that ISPs see solutions to their problem (that is poorly solved by buffering) would be useful.
ReplyDeleteHave you looked at UDT? There must be some lessons that can be taken from it...
ReplyDeleteThanks for the questions and comments.
ReplyDelete@Nathaniel/mcr: compatibility is the key part of Fast Open design. Our draft has more details to deal with firewalls/syn-data drops: http://www.ietf.org/id/draft-cheng-tcpm-fastopen-02.txt
@John: tuning the initial parameters based on history is certainly helpful. we are working on this now.
@DavidBond: Fast Open allows data in SYN-ACK packet as well.
@harjuo: Both our paper and draft discuss a new socket interface and the syn-flood issue extensively:
http://www.ietf.org/id/draft-cheng-tcpm-fastopen-02.txt
http://research.google.com/pubs/pub37517.html
@stealth/haapl: we are experimenting new algorithms to lower the queuing delay of TCP connections. Please stay tuned for more updates.
Please allow my ignorance, but how will aes block cypher help mitigating man in the middle tfo cookie interception? wouldn't stream cypher make better sense with short-time live cookie exchange after 2nd phase?
ReplyDeleteInteresting article on Bufferbloat. There has been a trend towards Layer3 switches with reduced buffers so I am surprised that this issue has been getting worse. I hope we can use these enhancements on load balancers quickly once the tech is finalized. Using generic Linux servers as load balancers works up to a point. :-)
ReplyDeleteAre there instructions or a patch I can apply to my Ubuntu and my servers to adjust these settings? If so, I can apply it to my servers.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteAre there kernel patches available for the items listed above?
ReplyDeleteI've been able to find/backport for #1 and #4 but have not seen any code anywhere for #2 or #3.
#2 could be a simple one-liner change but IIRC that leads to other issues.
https://github.com/vrv/linux-microsecondrto
http://www.pdl.cmu.edu/PDL-FTP/Storage/sigcomm147-vasudevan.pdf
For #3, code was promised but I have not seen it publicly posted anywhere..
Linux kernel versions of the changes were applied:
Delete1. Initial Congestion Window of 10 packets: 2.6.39-rc1
2. Initial Retransmission Timeout of 1sec: 3.1-rc1
3. Proportional Rate Reduction: 3.2-rc1
4. Fast Open: not yet. It's the most complicated change. We are still testing the patches internally and will upstream when it's ready.
Page 14 of this page says Fast Open is in kernel 2.6.34 : http://conferences.sigcomm.org/co-next/2011/slides/Radhakrishnan-TCP_Fast_Open.pdf
DeleteIs it the same "Fast Open" ?...I think it is...
Do you have any statistics yet for what the combined effect of implementing all of these changes might be for the "average" internet user?
ReplyDeleteYou would probably get a lot more attention and be more likely to see your changes adopted in the TCP standard if you published a post that said "Google Researchers Discover Way To Speed Up Internet By 30%".
Such a headline would lose some specificity, but since most people don't know what TCP is, it would be an effective way to spread the news and promote change.
Another work on improving TCP is the development of Multipath TCP that is being finalized within the IETF. See http://www.ietf.org/id/draft-ietf-mptcp-multiaddressed-05.txt for the latest draft. The Linux kernel patch developed at UCLouvain is completely functional and provides good performance on servers and in lab environments. We'd love to be able to perform more tests in real wireless networks where Multipath TCP would provide many benefits as well. See http://mptcp.info.ucl.ac.be/
ReplyDeleteMultipath TCP is on our radar. Google has in fact sponsored some of the projects through Google Research grant.
DeleteIs it really time to attack TCP? Especially since the way the article is worded it seems like HTTP is a driving factor for establishing results.
ReplyDeleteThe reason I ask is maybe focus should continue to be on ideas like SPDY, which sort of turns HTTP into a reusable persistent connection type protocol instead of the current stateless beast it is now. There's also still the idea of turning the entire web to ssl that's been floated about. That suddenly makes securing sessions easier, especially when combined with SPDY if developed with those technologies in place.
Then, once you have the more optimized and secure higher level protocols in place, then tune TCP for them. What's the risk of decisions made now.
Also, item 2 on the timeout. I think you might be generalizing a bit. I personally work for an organization where we literally have to support that office on the island with a satellite uplink that only works for 12 hours a day. It has high latency, low bandwidth and I bet if you cut timeouts by 2/3 their experience will be degraded. Sure, the current way TCP is designed it probably isn't optimial for 99.9% of the internet, but that doesn't mean the world is ready to lose that .1%.
Actually the core developer of SPDY, Mike Belshe, is the major driver to our TCP changes. See his comments about TCP/SPDY at:
Delete1. http://www.ietf.org/proceedings/80/slides/tsvarea-0.pdf
2. http://www.ietf.org/mail-archive/web/tcpm/current/msg06121.html
Are there any guidelines for working with WebSockets? What should I use when working with Google Protocol Buffers - HTTP, SPDY, HTTP+WebSocket, TCP? I'm talking about massively multiplayer online game.
DeleteWhat about offloading the transmission control to software? I've used software that does this and I've seen 100% increases in throughput.
ReplyDeleteDid you guys look at losses in the last 3 packets that can't be recovered from using SACK? IMO that is a big reason why HTTP sessions hang.
ReplyDeleteEarly Retransmit did not help as much as we expected. See the PRR paper for details. Given the relative complexity of the changes and the limited gain, other TCP improvements were higher priority.
DeleteIf the last three packets get lost, the only way to recover is via an RTO. We do find that a significant portion of the retransmissions from short HTTP responses are due to RTOs (as opposed to fast recovery). We are working on solutions to address the issue of long latency due to timeouts in short flows.
Deleteand how can i change these values of " Increase TCP initial congestion window to 10 (IW10)" and "Reduce the initial timeout from 3 seconds to 1 second" on Windows 7?
ReplyDeleteyou can't do that in windows 7. the article is about "Server Configuration" which where administrators can reduce the execution time or even add database configuration settings, read about web.conig and asp.net :)
DeleteWhy don't you actually work on SCTP? With its ability to carry multiple independent substreams, multiple HTTP could be issued with a single connection.
ReplyDeleteYep, all very true. Google also sponsors SCTP research. All of our changes (or similar algorithms) can be ported to SCTP when the need arises.
Deletewell,I would love to add that (using None Post back strategies) can indeed reduces the amount the page takes to be served to the client, for example... using a server side code, will hold the page till the server side code is fully compiled... which may cause time out or even a deadlock in the database may happen!!! in this case you have no business reducing the page execution time out to 1 second...
ReplyDeleteinstead...
1-Enhance your database performance and always rebuild and reorganize your indexes.
2-Always Maintain your code so the execution time is shorter/faster.
3-Cash your pages.
4-Use AJAX :)
that will indeed maximize your webpage performance and will keep your users happy.... that's how I do it anyway.
Wouldn't it be wiser to implement this as a separate protocol with a distinct IP Protocol Number value? You could call it Google Transport Protocol rather than TCP.
ReplyDeleteThere are a lot of TCP stacks out there, some of them in embedded systems. If even one of them is badly written all kinds of unpleasant havoc could occur if that stack interacts with your new version of TCP.
The main reason I suggest this is that you can't just update the stack in an embedded application. It's frequently burned into ROM.
LOL.. you guys should've seen UDT's performance.
ReplyDeleteGo chk it out; its on sourceforge :)
It is better on long connections. But on quick - is not.
Delete"An RTT of 3 seconds was appropriate a couple of decades ago, but today’s Internet requires a much smaller timeout"
ReplyDeletebad advice. what if my clients are using slow internet connection? perhaps, from 3rd world country through mobile operator, even excellent ISP can have some problems sometimes, or may be user is downloading something heavy, etc... what will they think about my service? It is not so critical to choose 3 sec instead of 1 sec, but it will be a better service.
Yuchung, I've seen similar behavior in wireless adhoc networks, where the impact of bigger window sizes is a much better improvement than here.
ReplyDeleteAs for the timeout reduction, I would argue that the 21st century is not about big centralized systems but decentralization. And I'm not so sure that increasing the timeout in mesh networks is in fact such a good idea.
before this you must working on new independent structure of the network ,we refuse to another sopa or pipa or other censorship like megaup
ReplyDeleteWhere's the RFC? :)
ReplyDeletethere are reason ms hardcode now!cause in the past silly people like google and a lot of other messed with standard without the whole picture!the web sadly isnt just google server or microsoft if it was trust me microsoft would have adopted way better solution but compromise had to be made so overall web was smooth,instead google should ask for variable.
ReplyDeletei agree that most of the items listed in this article are the pains in the neck for todays TCP implementations. that said, TCP optimization is a bigger problem than merely tweaking the parameters. we, www.appexnetworks.com, have been working for years on that. besides the points listed here, there are issues such as how to improve the accuracy in detecting a packet loss from, say, reordering; how to filter out bogus signals from crappy real-world TCP implementations (you'd be surprised to see how some TCP reacts, esp thru the filter of some 'intelligent' firewalls). and most importantly, you dont want to contribute to the chances of congestion, on both directions, even if your traffic is mostly unidirectional. and there are more.
ReplyDeletewe've done most of it, though not all of it. for linux and windows, if you'd like to try, you are welcome to get it from our download site.