12th May 2020
HANS PETTER HOLEN: Good morning everybody. So, let's see if I got the technology to work now.
So, welcome to RIPE 80. It's very unusual meeting, not only is it the 80th RIPE meeting that we have, but it's also the first virtual‑only meeting. So, the world is a strange place right now, with the Covid‑19 virus and the shutdown of societies in many countries, and we really don't know how long this will go on, and what the implications will be on the world in the long run.
So, I'm pleased to see that we're actually able to use the Internet to hold meetings like this and continue our business in a very difficult time. And it's also quite clear that the changes that we have had during the last month in society in the way we are working is probably, some of them are coming to stay, so the Internet is just going to be much more important than it has been in the past.
So, first virtual RIPE meeting, and we have 1,745 registrations, and that is a record high. I think we have been in the 700 to 800‑and‑something range in the past, so we have now more than doubled the number of registrations, so of course lowering the barrier to participate, no meeting fee and no need to travel is clearly really interesting. I just held a newcomers' session, and there were record high participation in that session. So, this is going to be really interesting to see how this meeting plays out.
So, I am Hans Petter Holen, I am the interim Chair, so normally it's the RIPE Chair that opens this meeting, but since the RIPE NCC Executive Board has appointed me the RIPE NCC Managing Director as of 1st May, I step down as RIPE Chair but then the Working Group Chairs appointed me as interim Chair until the nomination committee has selected the next RIPE Chair. So, I'm Norwegian. I am planning to relocate to Amsterdam, but during the circumstances stay here in a summer house outside Oslo in Norway. And I'm not quite alone in this cabin, so this is going to be very exciting to see; my daughter's dog is here, so, I hope she behaves and let us get on with the meeting.
So, the RIPE meeting principles:
We want this meeting to be open to anyone. We want to bring people together from different backgrounds, cultures, nationalities, beliefs and genders, and we want the RIPE community to be safe, supportive and respectful. These are principles that's been here for a long time, and we have really strived to live by these principles, and that's really important for me to convey that message to you all participating in this community, that this is a shared responsibility. This is not something that I, as RIPE Chair, or the RIPE NCC can facilitate. You need to contribute to this.
And in these times, some of you may have seen some of the discussions that we have had on the mailing lists lately. And my response to that is that we need to remember that the RIPE community is built on trust. Trust can be misused. But it's important that we stay true to our values and we do want to be open, transparent and inclusive, but that's a shared responsibility on everyone, to make sure that we can stay that way.
Now, following this meeting, we may have to look into some of the ways that we are doing things on mailing lists and in other places to make sure that we can stay open, transparent and inclusive, and stay focused and get our job done.
So, the RIPE meetings have a meeting Code of Conduct. So, we have met for over a quarter of a century and the strength comes from the breadth of experience, The diversity of youth and open exchange of ideas ‑ values that we want all of our RIPE meetings attendees to uphold. Please treat each other with tolerance and respect. Free speech and open exchange of ideas are encouraged and celebrated.
Free speech means that you can say whatever you want within some limits. You need to take particular care to make sure that you don't alienate or violate any of the basic principles of interaction so that you don't want to be demeaning, intimidating or harming anyone, and that is wrong if you do that. And we are especially sensitive to behaviour that offends based on gender, sexual orientation, religion, race or ethic origin or other perceived social culture or differences.
Living by this Code of Conduct may seem easy but when the heated discussions are here, it may become harder, so please, all of you be aware.
Now, this Code of Conduct has been here for a while and there has been work in progress by the Diversity Task Force who has had a broad mandate to increase the diversity at the RIPE meetings and has been working on a more detailed Code of Conduct. In order to focus on this effort and make sure that we get progress, I'm proposing in the Community Plenary on Thursday to make a dedicated Code of Conduct Task Force with a chair or co‑chair, with two co‑chairs, so that we can bring this forward, because this is not the work of one person, and it may be difficult to shepherd the whole community but please join the Community Plenary on Thursday, and contribute to that and share your ideas.
We do have RIPE meeting trusted contacts. So if you experience any violations of the Code of Conduct, you can contact Mirjam, Rob or Vesna; they have all been trained in conflict resolutions and they will handle all your requests confidentially, and, if requested, they can schedule a Zoom meeting with you.
Now, the content of this meeting, if you look at the meeting planning you will see on Wednesday we have Working Groups, on both Wednesday and Thursday, and the content of these groups is thanks to the Working Group Chairs, and you can see them all here. Some of the Working Groups now have selections of new chairs, and join in the Working Groups and take part in that.
For today's programme, we have the amazing RIPE Programme Committee chaired by Franziska and Maria and Brian, with representatives that are elected by you, so there is a PC election going on, so, please pay attention to that and take part in that.
The PC also has representatives from MENOG, SCE and ENOG, and Brian, as well as being a RIPE vice‑chair, he is also a liaise between the PC and the Working Group Chairs.
Looking at the meeting plan. We had the newcomers' introduction this morning, we are now in the Opening Plenary and we have Plenary sessions later today, and you can see here when the different Working Groups are and we round it all off with the Community Plenary on Thursday.
So on Wednesday evening you have the RIPE Services Working Groups and the RIPE General Meeting, which is just for members. All the presentations normally in general meetings are done in the Services Working Group, so you can follow what's going on in the membership, even if you are not a member, by attending the RIPE Services Working Group.
Later today, at the end of the day, there will be a virtual 'Meet the RIPE Chair and Managing Director' ‑‑ that's me ‑‑ and the RIPE NCC Executive Board, and at Thursday we are going to have a RIPE dinner, virtual dinner, and I'm really excited to see what that will be.
How to participate: You are now all connected to the Zoom webinar, or the live streaming page, so then you know that there is a Q&A function in Zoom and also a chat room and there is an RFC chat if you have on the web stream that you can also use. There is also a live transcription window on the RIPE 80 meeting website, so you need a separate browser to open that and you can see the transcript of what I'm saying.
Normally this transcript is behind me so I can't see what the stenographers are writing. Now, this time, I have a screen in front of me so I can see what they are writing, so hopefully there won't be any big surprises of what I'm saying.
Housekeeping. Please state your affiliation.
Zoom. Use the Q&A window if you want a question asked and use the RFC channel on the live streaming page, and in the Zoom chat, always select panelists and attendees, otherwise panelists ‑‑ only panel list will see your message. And these sessions are recorded and will be published on the RIPE 80 website archives.
How to network?
No corridors, no coffee machines this time, so there is a social room, or you have the networking app and you can go into that and list, and who you have registered the contact details there.
I have already mentioned the socials, so, today, chat with me or the RIPE Board, and, on Thursday, the RIPE 80 virtual dinner.
So, the nominating committee has asked me to share a bit about their status. They are here to select the RIPE Chair and vice‑chair in June. And the nominees so far are Filiz, Niall O'Reilly, Nigel and Mirjam, and you can read more about who they are and their background on blog.ripenomcom.org. While the NomCom is tasked with making selection for final approval by the RIPE NCC Board, they do want your input. They will give a report in the Community Plenary on Thursday, and they have office hours all meeting days and they also have ‑‑ are available on Friday by appointment. So if you want to share your thoughts about who should be the next RIPE Chair and Vice‑Chair, please get in touch with them and you can do that by looking up their details.
Here is the actual process slide, so you can find that in the archives and the Chair of the NomCom, Daniel Karrenberg, was appointed by the RIPE NCC Executive Board, so he is guiding them through these policy and procedures.
And that was all I had from me. And, of course, as any RIPE meeting, we do have our sponsors, and I am really grateful for Twitch, NTT, Brenac and Carrier Co‑Lo for sponsoring this virtual meeting. It's something that hasn't been done before, so it's really great to see that you have the trust in us to sponsor this meeting and making this possible.
So, by that, I hand over the floor to the session Chair.
FRANZISKA LICHTBLAU (CHAIR): Thank you. Also from me, welcome to this first virtual RIPE meeting. My name is Franziska Lichtblau and I am the Chair of the RIPE PC, and yes, it has been interesting for us too. Hans Petter already said a little bit about our work. These are all the great people who ‑‑ my video got disabled ‑‑ now, you can also really talk to you. As I have already been saying, it has been really interesting for all of us, you see now a slide of all the people who are helping to make this programme come to pass. In terms of workload, it was very interesting for us because, of course, we have a very limited programme these times. As you shall ‑‑ compared to what you are usually used to, so we had way less talks we needed to choose but we also got way less submissions so we really tried to motivate the whole community in order to actually participate. But we are aware that you were all not really sure how this will come to pass, but it was actually good work, the whole team was supportive, the work with the NCC was great and we ended up with I think what is quite a good Plenary.
We have four 45‑minute slots plus the Opening Plenary which are chaired by the PC, and I want to remind you of one thing, please:
We are quite tight on timings, so, the sessions are shorter than we are used to so we also needed to change the talk length as to what you are used to. We always prepare a full presentation with a short presentation, and so please keep your questions to the point. State your name and affiliation, and, for the Q&A, please only use the Q&A window to actually ask questions that are related to the talk, because our tasks as session Chair is to actually read out these questions to you and make sure that all the questions that I ask are actually get some floor time, so please be precise on that.
And, as usual, rate the talk. We try to keep this as regular a RIPE meeting as possible so we also want feedback on the talks we selected, so please click and do the rating.
Here is, again, a summary for the PC elections. You can nominate yourself until 3:30 today, Amsterdam time of course. Just send a mail, together with a short bio, your name and a picture and we will consider you.
The candidate biographies can be looked at at the website and I have included a link here. The voting starts today at 4:00 and goes on till Thursday 5 o'clock.
So, that was it from me, and now we will kick off the session and start off with Geoff Huston.
JAN ZORZ: Hello everybody. Welcome from Slovenia, we all wish we would be in a meeting like this in real life, but what can we do? Maybe for the next RIPE meeting.
And while Franziska is sharing her presentation, I would like to welcome Geoff Huston and he will be talking about buffers and protocols. Please, Geoff, the floor is yours.
GEOFF HUSTON: Thank you, and I hope you can all hear me, right? I hope so.
Zoom just disappears off the screen here.
I don't know about you, but I've been in deep fodder starvation mode for the last month or two and I am seriously missing out on heavy‑duty gig stuff, so this talk is unashamedly right down there in gig land. My name is Geoff Huston I am the chief scientist at APNIC, and this is a talk actually arising from a workshop I was fortunate enough to attend at the end of last year at Stanford University; remember the time when we could travel? So nice. But, yes, it was a really stimulating workshop and it was actually only about two days about network buffers. There is a lot to actually learn and understand about the way we provision networks, because I think there are some hard lessons coming along that we perhaps don't quite fully appreciate.
The Internet is, as usual, prodigous, but just how much? Oh, my God, in the 1980s, if any of you remember that far back, the a kilobit per second was really fast. In the 1990s, we upped the speed by three orders of magnitude. From kilobits to megabits, 1,000‑fold increase in speed. Over the next ten years we went further and up the speed that we to achieve out of our networks into gigabits per second. Something went wrong in the last ten years, because the achievable speed we're getting out of our networks as a protocol speed is actually much the same, it's still just gigabits per second.
But that's not really the limiting issue. Like the optical fabric is just going gang busters and we're edging into terabit capacity. But our silicone and, more to the point, TCP itself isn't keeping up. Now, why?
Now, when you search around this, it's not really you know the engines at either side. It looks to be that the buffers inside the network are playing an important limiting role. And maybe that's the real issue here, that we're building our networks kind of all wrong, and if we really want speed, maybe we should think about protocols and capacity and buffering inside one big package,
To put this into context I am going to bore you a little bit with TCP. Some of you know this, some don't; let's look at what TCP, the work horse of the Internet is actually trying to do.
Underneath the Internet is just raw packets; every packet is an adventure, it may or may not get there, it may be re‑ordered, all kinds of things befall single packets. TCP's job is to tame that beast and, you know, take these packets that are just all over the place and form it into a reliable stream protocol.
But it's not just that, it's actually going to achieve two more really important goals. It's got to be efficient and it's got to be fair.
But TCP, idle capacity is a dastardly sin. Never ever leave unused bandwidth on the table, because you are not going fast enough. Because what you are trying to do is to actually maximise the good put of your sessions. What you want is speed. What you also want is fairness. You don't want to crowd out all the other sessions and you don't need the network to enforce that. You have got to self‑impose yourself. So, over time, you have got to regulate yourself, to mostly take your fair share of the network.
Think of it as fluid design, where what you have got is a whole bunch of individual fluid flows all sitting inside the one pipe. And each of them have got to gently elbow the others to exert a fair share on the others, and then someone else is elbowing you, you have got to resist a little bit but respond to it so that it becomes a dynamic equilibrium where fair sharing happens.
Now, what TCP does is actually ACK pacing. Packets go off from the sender to the receiver and they send back ACKs. And oddly enough the timing of the ACKs is actually the same as the timing when packets left the network. So, if the sender sends a new packet for every ACK, then it's maintaining a constant rate, as net packets leave the network over the receiver, the ACKs go back and new packets go in at the same rate. If you know what the right rate is, ACK pacing can make it stick there.
But, of course, the question is: What's the right rate? Now, you don't know. We have no idea, you know, you have a car, there is a road, I wonder what, you know, the available speed is?
It's not predetermined. It has no idea of what available network capacity means. So the way TCP does this is actually like, you know, a teenager in a car, rate adaption, you keep pressing on the accelerator until you crash; when you crash, that was too fast and then you back off because obviously that doesn't work.
So, the way it works in TCP, oddly enough, is almost analogous, that you probe into the network space, increasing the sending rate again continually, until you get packet drop. And that crash is the signal, oops, you pushed the network into congestion, you need to back off. And so that's the basis of the way TCP works.
There is one more little twist to this. The ACK signal is actually back in time. Because the ACKs had not been generated ‑‑ sorry, the receiver is generating the ACK that receives the packet but the sender doesn't get them until half a round trip time later. So the vendor signalling is showing what's happened in the past, not now.
So if this data loss, that was a little while ago and it's probably getting worse. So, bad news, packet loss actually means disaster happening, better back off quickly. TCP should react really quickly to what we call bad news.
But also, the fact that you are okay doesn't mean you're okay now. It means you're okay half a round trip time ago. So don't assume that life is still fine. You should be a little more conservative.
So, this creates this sort of classic TCP that we all know about: Additive increase, multiplicative decrease. While there is no packet loss increase the sending rate by 1 segment, every round trip interval, just gently increase the pressure. If there is packet loss, half the sending window, just dramatically reduce the rate. And that halving is an important number. So, decrease the sending rate by 50%, half the sender's window. And by the way, the starter, it's called slow start because it isn't. Every round trip time interval just double the sending rate until you get into collapse, because you are doubling exponential growth it will happen pretty quickly. Slow start because it isn't.
So here is the typical thing that we're used to with the way TCP looks, in this mode, what they call the congestion avoidance, which is actually congestion causing if you really want to put the right name on it, you gently increase the sending rate one additional packet in the window every single round trip time. Eventually the queues will not only form, but collapse; you will fill up the queue, packet loss will ensure.
When you rate half, though, the sender has to wait for the window to drain. And so the sender is not going to send anything for one round‑trip time. And during that one round‑trip time that's idle, the queue should drain completely, so by the time you start sending again you have reset the condition the queue is empty.
So, when a sender receives, you know, a loss signal, it repairs the loss, halves the sending window, the sender will wait, TTL queue will drain and off you go.
How long should it wait? Well, one round‑trip time. That means the queue has to drain in one round‑trip time. How big should the queue be? The delay bandwidth product of the link. So this leads to when you have got a single queue and a single flow, you have to provision your buffers the size of the delay bandwidth product. So, if you have got a gigabit of capacity and you are sending over half a second of round trip time, the delay bandwidth product is half a gig, that's the size of the buffer that you need.
So that's the standard rule that you all buy routers for. The standard size is bandwidth size times round trip time. It dates back to this time back in 1994. And it's the rule of thumb. It worked for begabit [phonetic] and when you buy a router today for gigabit, that's what they'd like to sell you, heaps and heaps of memory.
Of course, you can overkill it. You can make the buffer too big. And that's bad too. Because when you stop and wait for one round‑trip time, if the queue is too big, it will never drain, and by the time you start sending, there is still stuff in the queue. The queue is always occupied. If your buffer is too big. And if that's the basis, that additional amount of queue that never drains, is just delay.
So, if you over‑provision memory in your routers, all you're doing is making life crap for your users and making your network slower, oddly enough.
So, if the queue is too big, the buffers are too big, users get more delay. Bad idea.
What about too small? Or we mentioned efficiency. If the queue is too small, rate halving takes you not only to a drained queue, but it takes your sending rate below the actual link capacity and because the rate increase is incredibly slow, one MSS per RTT is equally quite a slow rate of increase, it will take sometime to get back up to link capacity. If the queue is too small, you actually have the network being under‑utilised and you get idle capacity.
So getting that just right was always important for RENO.
Now, the other way of doing this is not to get it just right because that's just really hard. You change the protocol. So, there have been lots and lots and lots of variations around this theme. Almost every Ph.D. student studying flowing control came up with their own favourite one. I always like, if you will, TCP, its fairness taken to unfair proportions. If I act as if I was in simultaneous TCP sessions, like ten, I won't just get my one 'nth of the link, I'll get ten times that, so MulTCP tries to be unfair by increasing by end segments and doing rate dropping by one 'nth. Lots of other variants.
What about administrative increase? We could change that too and one of the most commonly used flow control algorithms, because I think most of the Linux distributions went this way, is CUBIC, and, rather than doing straight linear, it was an order three equation to bring the sending rate up to where it thinks there is going to be the link capacity and then slow it down and gently ease into where they think the sweet spot is.
If you analyse that, what you actually find is that CUBIC doesn't actually work the way you think. Because although it can react quickly to available capacity, it actually tends to sit for extended periods of time in queue formation, so it works quite well on long fat pipes. If your buffers are big, CUBIC will tend to occupy that queue. It will exacerbate a whole bunch of buffer below provisions and it will be slower oddly enough than you want because the delay increases.
So CUBIC isn't really the right answer. But maybe that's the wrong question.
Because, the world is not, oh, sorry, you can go, and then when you stop you can go. It's not just one flow anywhere. There are millions of flows, all through. And when you get buffers with more flows, lots and lots of flows, you get an entirely different property.
If two flows absolutely synchronise and work on the same round trip time and operate in exactly the same way, then you are going to need effectively twice the buffer size, because the buffer still needs to be delayed bandwidth size, right. What if they are out of phase? If they exactly have the opposite phase from each other?
Oddly enough, you only need, if you will, the same buffer space to get double the capacity; in other words, the common buffer requirement is halved.
Now, you can generalise this and this work was done in 2004 for the team at Stanford, where using basically a little bit of statistics, worked out that you can get away with smaller buffers, because with N flows you can divide your memory by the square route of N. So if you are taking 1 million flows, you can actually have your buffer one thousandth of the bandwidth delay product.
So, this is kind of interesting. And it sort of gets down to why do we have buffers anyway?
And there are two reasons for this. One is, senders are very naughty people. They are uncontrolled, they tend to burst at local rate. And so, they are not bursting at the network speed, they are pushing the packets out, they are draining the window as fast as they can. That is remarkably impolite and if we could smooth that burstiness, and, instead of peaking and then being idle, spread the packets over time, you'd actually don't need as many buffers.
And, of course, the other role of buffers is inescapable. If you have ten packets all arriving to go to one output line, you need a certain amount of buffer because you can only put one packet a time on the output. So multiplexing and smoothing.
What if you didn't need to do smoothing? What if you just did sender pacing and you distribute that data that you had in the congestion window across the entire round trip window? Oddly enough the buffer will thank you dramatically, because the most recent work in this space if we all did pacing, you'd hardly need any buffer at all. The buffer size is around the order of the log of the average flow window size, which is tiny.
Now, why am I talking about this? Because I want to get back to the original issue, that we can't go as fast as optics would let us. We can't build terabit systems using the existing flow control algorithms. Why? Because we can't make memory go any faster. Memory speeds haven't increased for the last 20‑odd years now, they are still the fastest memory you can get, isn't that quick. And no matter how you go in parallel, DDR4, or whatever, you still don't really lift that memory speed.
So, if we are going to make things go faster, we can't do it with more memory; we have got to do it with less. So let's go all the way down to the other side of this and let's get into chip design.
On chip memory, right inside the switching chip works as fast as a memory clock, it's really fast. But there is not a lot of it. Most switching chips will only give you between 16 and 100 meg of capacity inside the chip. You can say, well, that's okay. I'll put an external interface, external memory banks. But that's not quick; that's slow.
So, between 20 to 60 errors of chips real estate is actually devoted to memory and if you really want faster and higher capacity switches on a chip, you need less memory. Here is a typical example of state of the art today. This is fast. 1.8 terabits, single chip ethernet switch. I'm like, wow, that is so quick. How much memory on this 1.8 terabit switch? 16 meg. And the challenge is to make our protocols work across this kind of chip. To make our protocols work with effectively almost no memory, because if you add memory to this, no matter how you do it, you're not going to get the same switching capacity.
So now let's look at protocols and what we're doing with that. The kind of three states when you look at the interaction between a network and the flow control protocol. When I'm sending at less link capacity, no queues, easy.
When I am just going greater than the link capacity, then I'm going to have to put that excess sending rate somewhere and that's going to send in a queue, because when I go way too far the queue fills up and I am saturated, I am lost.
Where do I want to sit? Well, these two kind of show two properties about the flow and the network. The first one shows the round‑trip time, and, as we get to the onset of queuing, that optimal operating point, if we push more packets into the network, they just sit in the queue and the round‑trip time starts to rise, while the actual throughput remains the same. And so the optimal operating point is just at the start of when the round‑trip time tries to increase.
If we also look at the delivery rate over at the receiver, you get to link capacity, and, no matter how much additional packets are put in the network by the sender, the upwards still the same, link capacity hasn't changed so the bandwidth is constant.
Loss‑based systems actually only know when they have gone too far. When they have gone too far, loss‑based systems only work at loss. But you don't want to actually know loss. What you really want to know is the onset of queue, not when the queue is exhausted.
How do you do this? How do you detect the onset of queuing? Does everyone remember ICMP source quench? It was bad then and bad now. What it does say is help, you are pushing me too hard. That's an interesting signalling. ICMP should never be used for this. They are a DOS attack investigator, they get filtered, don't go there. But we have experimented with slightly different mechanisms that actually look at more promise. One is explicit congestion notification, ECM, where you mark the packet and send it on and the ACKs contain the mark. Single bit. Not bad. But everyone needs to be ECN‑aware. It's like the v6 problem all over again. Everybody needs to do it, and the problem is, getting everyone to do anything in today's Internet is kind of an Herculean task, so difficult.
How about adding to this and not just sending the lot? No. Not for real networks.
There is another way. We just measure delay. And the other way of doing this is rather than making small adjustments continuously, you make periodic adjustments at regular intervals, and do estimates of that, and this leads us to a protocol we call BBR. Now, BBR, developed originally by Google, actually works on the interesting property that you only probe every 8th round‑trip time interval and you don't double, you don't only by one segment, you increase the rate by 25%, which is actually a relatively hefty increase. But the next interval you drop it by 25%. So what you are doing is doing a relatively significant probe into available capacity and then back off the pressure. What does it look like?
That's what you see, the sending rate in blue, you probe in. Interesting, the network queue, you only spike at the network, they are not big spikes, so what it allows is BBR exerts very gentle pressure on network buffering.
Now, the efficiency and fairness, BBR is efficient. Is it fair? Well, I don't know. If you are running CUBIC and BBR comes along, CUBIC in red, BBR in green, you're going to die, because BBR just exerts an extraordinary amount of pressure on the buffers, CUBIC can't Hack it. In some ways, BBR isn't that fair.
But in other respects, it's a much better model. Why? Today's network is unsustainably diverse. We have CUBIC, we have Nureno, we have BBR, we have fast, and they are all competing for the same underlying network resource in different ways. The results are chaotic and variable. We have the mix of traffic models, mix of media, all of this stuff. So, what's the process? I actually suspect that this is Darwinian. This is actually survival of some kind of fittest.
But what is fitness? What wins in this Darwinist of protocol evolution? Fairness? No. I actually think efficiency. Protocols that assume less about the buffering in the network make minimal assumptions about how you build your networks, and actually work on the fact that I don't need to stinging memory, it's too bloody slow, I am just going to use light touch on the memory. It's actually a successive protocol that's going to win in this kind of, if you will, Darwinian struggle. And equally rate halving is a massive fix, it's probably not the right way of doing it when we want to go really fast.
Protocols that operate with these regular feedback mechanisms like BBR that constantly probe and constantly adjust are much better at the protocols that probe upward and then just lop off half of the sending rate at regular intervals. That's just too violent if you want to go quick.
What's all this telling us?
One way of saying is we still don't know what we're doing. The nice way of saying is a lot of unimportant unsolved problems out there and they are very important. As the fibre optic systems push into higher and higher speed I think we need to drop some of our assumptions about equipping our routers with loads of memory and thinking more about how to do switching well and how to do flow control systems that work with sparse memory and high density switching. We need to do things differently. BBR is not maybe the answer, but it's a step in a different direction, particularly the very, very high speed networking.
And maybe that's an interesting way of going about this. What we don't know about the fine grain behaviour of large scale networks is way, way much more than the tiny bit we do.
So, yes, more research, more testing, more looking at this problem, seems like a lot of fun and very interesting too.
Thank you very much. I'll hand it back for anyone who has questions.
CHAIR: Thank you, Geoff, very much. An interesting presentation. There is no workshop applause, but just imagine the applause.
JAN ZORZ: Okay, there are a couple of questions, and it seems to be started five minutes later I think we can have a couple of questions now. And the first one is coming from Ayane Satomi from Batangas State University, and he is saying: "If we ever want to hit terabit in TCP, we have to reinvent the very protocol from the ground up."
GEOFF HUSTON: Memory clock speeds haven't increased. There is a terrible graph I think Intel put it out as to the speed of memory, and literally, the faster we want to go, we have got to start ganging it up in parallel and that means it gets really expensive. But chip design, the chip fabric is only the same size, and so, in some ways, we can't build really fast high memory protocols. If we want really fast protocols, at the immediate to change the way we think about memory and buffering and the way we do feedback control. So yes, the answer is we have got to change TCP totally.
JAN ZORZ: Thank you. And I see one ‑‑ we are just trying to figure out how to deal with the questions because there are too many and I see one really interesting question from Aleksi Suhonen from TREX, and he is asking: "How does dBR react to packet loss? Is it any better than RENO in that aspect?" And in related question: "Is it worth it to improve TCP when it will fail in fast networks very soon anyway because there is a sequence number space is so small?"
GEOFF HUSTON: Two questions. Let's start with the first one. BBR repairs at speed. Loss is not the signal. Loss is not a constraining signal. They are actually more sensitive to variation in delay than loss. I have used BBR at enormous speeds across our networks today with a 25% packet loss. It just doesn't slow down. And so the next part of your question was: I'm trying to pick it up again on the screen. Do you remember it?
SPEAKER: The related question was: "Is it worth to improve TCP when it will fall in fast networks very soon anyway?"
GEOFF HUSTON: I don't know about you, but I have yet to have terabit to my home. Horses and courses. Akamai, for the moment, is currently running on fast. There are a whole bunch of protocols out there. I suspect that Darwinism is taking hold and when you find a particular protocol tends to exert pressure on the others, the only way that you can put the pressure back is to run the same protocol yourself. And so I suspect that BBR at this particular point, or that kind of protocol, is actually going to be more and more common over the coming years, even on lower bandwidth than terabit, because it the just so much better at elbowing everyone else out of the way.
JAN ZORZ: Thank you. We are getting more and more interesting questions here. One coming from Randy Bush. It's sort of like as it was said at the second Stanford meeting with bandwidth doubling and use doubling, more sloc cannot deal with 4X, so he claimed pacing is the only path forward. Comment?
GEOFF HUSTON: Okay, so in a kilobit network with 1.5, 1 kilobit packets, every time you increased your sending rate by 1 MSS on a kilobit network you are actually exerting an extraordinary amount of pressure. In a kilobit network, Renos sort of acceleration curve was really high. In a terabit network, increasing your sending rate by 1,000 octets every round‑trip time is kind of like adding another atom to the scale. It just doesn't cut it. The whole TCP algorithm of 1 MSS per RTT doesn't work when your MSSes are microscopically small, which they are these days. So the answer is, you either run with extraordinary high MSSes of a Gigabit bit packet, or something like that, which we don't know how to do because of bit error rate, or you have to change the algorithm. But, you know, what we are doing right now is this weird system of using a 1980s flow control system that worked in the kilobit range and applying it to a gigabit network thinking, well, God, it works. Well it only works badly, and part of the reason is there is such a dramatic speed misfit that you're doomed. This is just not going to work in the long run.
So, yeah, we need to rethink this.
JAN ZORZ: And there is a question from a geek called Daniel Karrenberg: "Can we change TCP in a backwards compatible way, e.g. use the same port number?"
GEOFF HUSTON: I actually think that the future is now quick. The future is now hiding all of TCP from the network. Because there's been so much middleware out there that does so much damage that quite frankly applications are now sick of this. Equally, applications are way more agile than operating systems stacks. So what we're now finding in the current networking architecture is the entire flow control system is now an attribute of the application, Chrome does it differently as an app on your phone or in your operating system than the operating system TCP. So, what we're actually going to find these days is it's no longer change TCP. It's how does your application control its flow? And I think that kind of setup where applications assume more and more responsibility for its networking pressure is actually going to be on the increase in the coming years, and this old model where applications were done, the operating system looked after everything is now kind of, you know, backwards, backwards, it's yesterday. We have moved out of that and we're never going to come back.
So, it's not that we're going to change TCP. I think applications are simply going to go in different directions and hide it. Because inside an encrypted quick session you have no idea what the application is doing. It's just sending packets.
JAN ZORZ: Since you mentioned quick, I would like to spend two last minutes because we're way over time, there are two questions: Is http 3 then considered the good new solution and he is wondering how http 3 will change its previously ‑‑ I am quite afraid that quick plus TLS over UDP will change the whole behaviour.
GEOFF HUSTON: There has always been this pressure these days between networks and applications. Twenty years ago, we were all on the same side. But we're all doing networking and writing applications at the same time. These days, we find ourselves not only in two different camps but we're on two different sides of a really really big issue. And the issue by the network providers in trying to ration the network, trying to enforce fairness, trying to control traffic and all that metal wear they devoted, particularly in mobile systems, to look inside what's going on and control the way applications use the network, is sort of their thinking, this is the way you add value to a network. On the application side, the answer is ha‑ha‑ha, not playing your game and the way you don't play that game is you take all of that overtly visible control system that TCP had and hide it behind the available of encryption in UDP. This is an irrevocable step. We're never coming back. The applications have said we're not sending the network anything. You gave us a hard time. No forgiveness, nothing, we have taken the step, this is it. We're hiding everything. And it's going to be a tough few years, there is the 5G folk try and figure out how they are going to manage traffic when the end systems become more and more capable and, more importantly, the protocols become less and less controllable by the network and more controllable by the application dynamic itself. So it's going to be a brave new world, and, quite frankly, I'm on the side of the application. I think they are going to win this.
JAN ZORZ: Yeah, thank you very much, Geoff. I think we need to stop in five minutes more. The next session is coming with more great content. Just a suggestion to the people who would like to ask the questions at the next sessions; ask them in a very short format, otherwise we will have to read the novels here and it is not great use of everybody's time.
Anyway, thank you very much and see you at the next session. Cheers.
GEOFF HUSTON: I wish I was there in person. Thank you very much all. Cheers.
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC