Archives – RIPE 80

Plenary session
12th May 2020
4 p.m.

CHAIR: Okay. Good afternoon everybody. I hope you can see me. Good afternoon. It's four o'clock, so we're starting this session. I am Maria Isobel Gandia and I will be chairing this session with my colleague from the RIPE. Let me start with some housekeeping.

If you are in the Zoom session and please use the chat window for discussion and general comments, selecting panelists and attendees so that everybody can see your comments. Otherwise, only the panelists will see your message.

And use the Q&A window only to ask questions to the presenters but not for technical stints. You can also use the channel on the left streaming page on RIPE 80 if you are there and you can write the questions before the presenter finishes the presentation if you want but please try to keep them short because I will have to read them, so we don't want to read long things because we are not English native speakers.

And remember, the sessions will be recorded and published. We currently have around 500 attendees. This is what we have now in the Zoom room and we also have lab streaming. So, we're fine. Many people around, even if cannot clap and hear each other.

The second thing I would like to mention before we start this session is that we have an election for the RIPE PC, with two seats available and we have six candidates. You can find their biographies in the RIPE 80 websites and the names of the candidates are Antonio Prado, Peter Hessler, Melchior Aelmans,
Radek Zajic, Stefan Wahl and Wolfgang Tremmel.

The RIPE PC online voting systems will be open from now until Thursday at 5:00 in the afternoon and the results will be announced on Thursday, too, at 5:30, Central European Summertime.

The third thing I wanted to say is that, please, you can rate the talks. At the end of each session, you can rate them from the Plenary main page.

Okay. So, now that the housekeeping is done, please let me introduce our first speaker. Our first speaker in the session is Raffaele Sommese from the University of Twente.

RAFFAELE SOMMESE: Thank you. I am a PhD student of the second year, and I am going to talk about DNS delegation inconsistency in the DNS. This was a work collaboration between me and these people here on the slide.

This work was a paper accepted at PAM conference and was part of the NWDHS project.

Let's go into and introduce the argument. Everyone of you knows what the DNS system is. The DNS, it's one of the most critical thing in the Internet nowadays. It's a distribution of the database. The principal role of DNS, it's maps, hosts and service and obligation to IP address, and serving various types of records.

One of the key mechanisms that enables DNS to be a hierarchical system and a distributed system is the delegation mechanism. In fact, TTL DNS here is organised in part and the children's own typically manage it by different entities, and the different zones need to be ‑‑ share common information which are DNS records about which are the authoritative name servers.

So what we ask in this work is, if this common information is consistent among parent and child. The RFC 1034 states that the NS records of both parent and child should be consistent and remain so. We ask is this, in practice, the case?

The contribution to provide, provide a broad characterisation of the inconsistencies in the DNS delegation. So, here is a picture of the DNS inconsistency in the system. An investigation on the consequences of using inconsistency in terms of resolution.

Let's look at the first example, is a welcome configured delegation.

In a case of welcome configured delegation in the case of this example, we have in the parent zone, which is a .org authoritative zone. We have an example.org and DNS record related example.org. These are here. If we look at the child zone, so the authoritative name server, what do we find is the same record, same NS record in both zones and these records, as we can see is consistent with the recording.

So, what do we do? We study the delegation consistency between the parent and child zones for all the active second level domain of dotcom.net and .org. We analysed more than 166 million domain names around the 50% of the total DNS name space. And what we discovered is that 80% of the domain exhibit consistency between parent and children zone. While the 8%, around 13 million of the domains, do not.

The remaining 12% are basically domains that are not ‑ for a child so we are not able to determine if they are consistent or not.

We categorise this inconsistency in four types. In particular, the first one is the one where parent and children are completely disappointed. The second one is where parent, it's a subset of the children and the NS set. The third one it's a super set of the children. And the fourth one is where the parent and children have some element in common and some element different.

So let's look at the first case. The case where parent and children are completely disjointed. This was the most relevant case because 55% of the domains with delegation inconsistency we have parents and children which has a disjoined NS set.

What we discover is that half of these domains are consistent at IP level, so the IP address matches but others are not. And I report here an example in parent and child zone. In the case of delegation inconsistency which are completely disjoint NS set. What we have in the parent is we have a DNS legacy server.net and ‑‑... we found these inconsistency also in top level domain found in the route zone, so between the name server in the parent zone, name server in the authoritative zone in the top level domain, but replacing the part of this domain we find all the case, all these domain names NS consistency at a certainty level, that means that NS records either they are different names, they point to the same IP address.

We checked the consequences of this. We have different servers. Some of these could be lame delegation, some of these could be unresponsive and so on. Even if the IP level is coherent right now, so even if the names points to the same IP records, this makes a sort of misconfiguration easy because we have to keep two records in sync to see the resource while we can just keep one single record with names. And also the behaviour of resolver is not predictable. We don't know if there is over stored to the parent necessarily.

This was ‑‑ as I stated before, we found some cases of these in root zone and was the case of India .in registry, but it's this is it an historic record. While in a child, in the India, the AS and this is it. Both point in the same AAAA records. We notify them of this situation on 30th October 2019 and they fixed some days later and they also fix some of the 15 cases that they used to manage.

So, all the domain with this kind of existing root zone we had fixed.

The second case is where the parent NS set is a subset of the children NS set. We found these in 30% of the cases, And 18 top level domain in the root zone exhibit this kind of inconsistency. In this case, the parent we found, for example, these. And in the child we have this.

Which are the consequences of these case? We have a false sense of redundancy because basically the name server in the child are notified in the parent gives us a sense we have more authoritative in the name server that would be queried. But if they are not learned by the resolvers, these will never be queried. So it's just there and no one used it, so the load is not well balanced and we have less resilience in total.

This was the case of another important case that we found out from this misconfiguration is the case of AT&T website. The parent had this one and the child had this. We notified them and they fixed this adding the fourth name server to the parent.

The third case that we saw was where the parent is a super set of the children. This is just the 8% of the inconsistency, and 10 TLDs. What we have is in the parent we have what's here and while in the child we have what's here.

Which are the consequences of these cases? If you have the additional name server, for example, provided in the parent are unreachable, we have a higher resolution time and random failure in resolution.

If the additional name server is defined in the parent. We remove from the parent the resource because they expire to the point that the DNS service that we don't use any more and so on. We are running the risk of rejection from our register that resources.

The remaining category that I define is basically the categories. 7% of the delegation inconsistency and 8 TLDs. Basically all the risks that I mentioned before are applicable in this case, and we have that. In the parent, we have these. And in the child we have these.

So, we have some element in common and some element different.

What we do, we understand what are the implications. These inconsistency are quite a spread in the DNS, which are to understand which are the implication of these inconsistencies in terms of the resolution, so, we tried to in‑rate these kind of NS set mismatches, these kind of inconsistencies, and we use RIPE Atlas for measuring each unique resolver, how they treat these kinds of inconsistencies. Our goal is obviously to study the consequences in terms of the distribution in our environment where we put all the authoritative name servers related to these four experiments with NS set mismatch behind the same network, so we are not affecting by load balancing for a network reason.

Before introducing these I want to introduce this concept. This is important for understanding the following slides. Basically, minimum response in DNS, it's a mechanism where the other name server, if we enable response, will not provide authoritative section and the additional section. What authoritative section and the additional section. This contains the authoritative name server related to a certain answers, which are the authority name server for getting the valid information for that answer, and the additional sector contains the blue record for that authoritative name server. In case ‑‑ obviously, we do not provide this information so we do not give this information to the resolvers.

This is usually done for using the amount or the site of the answer to an aspirer ‑ for example, for avoiding the amplification of facts.

So, let's look at the result that we have. We tried to run the experiment in the first case over the disjoined NS set. And what we see is that minimum response enabled, if the resolver doesn't have any way to know about the existence of the children, what we have is that all the priorities go into the other name server defined by the parent. Why? In case we use normal responses, so we disable the number responses. Some resolver we learn the information from children NS set and we start to query also the other name server defined by the children.

In case where parent is a subset of the child, what we have in this case is NS1 and NS3 are defined only in the parent and NS1 in the stream is in the parent and the child. So again minimum response, where it all goes to NS1 or 3, they don't know the existence of NS2 and NS4, while in the case of normal responses what we have is some query goes to NS2 and NS4 because they learn through the authoritative section this information about the existence of NS2 and NS4.

The third case where the parent is a super set of the children and what we have here is that NS1, 3 and 4 are a new parent, while NS2 and 4 are defined only in the child. In case of minimum response the query will be load balanced between the NS1, 3 and 2 and 4, while in the case of normal responses we have more query goes to NS 2 and 4 because, basically, some resolver will start to prefer this information announced by the children but not all of them will do that.

In the end, in the final case where we have this rest experiment and in the case of result with ‑‑ okay, sorry, so, NS1, 3, NS2 and NS4 are defined in the parents while the others are defined in the child. In case of my response, we have that traffic goes only to the one defined in the parent so only to NS1 and 3 and 2 and 4 and it's load balanced between the two name servers. While in the case of normal response, some recover learned information from the child and send it out in the traffic to NS 5 and 6.

So, which consequence we can draw up from these experiments?

We can learn that adding inconsistency in NS in the parent and child impacts on how they are distributed among the name servers.

For all the evaluated cases, queries will be unevenly distributed among the authoritative name server.

Usually the server defined in the parents are preferred and received more queries than the child. And we learned that minimal response has an impact on how the resolver behaves.

Another thing that we do, we focus on working out the specific DNS resolver software behaves in case of NS inconsistency. In particular we pay attention as whenever the resolver ‑‑ how resolver should rank in case of inconsistency. The RFC in particular states that child authoritative data should be preferred if this information is available. And we will wait ‑‑ that are BIND, Unbound, powered and DNS, and the Windows server resolver.

In the first place we basically asked for an A record of a sub‑domain in our test zone. The second we asked for a NS record in the zone.
Then we asked for A query followed by an NS query. And the fourth case we do the opposite. So we invert the DNS query and then we get the query for seeking the information from the parent and the child in the cache.

What we have in the result is we have basically a KNOT, Unbound and Power DNS comply with RFC 2181.

In case of BIND, we found that BIND package for this. The information does not override with information from the child.

In the case 1 and 3, so in the case of the query and a query followed by a (something) BIND from source sent to the BIND ‑‑ these are good, because they explicitly get information from the child, getting more information. And the last case, so in the case of a query followed by NS query, what we see is that Power DNS, the older version of Power DNS for CentOS and Xenial, an older version of Windows resolver use the cached non‑authoritative information obtained by the parent to NS query and this is forbidden by RFC 2181.

On the version that we used for this version for CentOS are completely ‑‑ by Power DNS. We might operate the package manager distribution to update this package.

What we do also is we set up our own site some days ago. That is the super dns.nl where we reconfigure, we replicated the experiment of NS ‑‑ and you can try your resolver against this website, and you can see a paper on this website. If your resolver behaves well or behaves in a bad way in resolving your ‑‑ the DNS records.

What will be a solution to this problem? . The would be implementing the RFC 7477. This RFC introduced a method for automatically keeping in sync the records between parent zone and children's zone. The performance to a political polling of the child using SOA records and any type of records that is a CSYNC record.

Unfortunately, RFC 7477 lacks deployment. We know there are not a lot of domains with DNSSEC enabled, and also we don't find many domain misconfigured with DNSSEC enabled, it means that usually people that enable DNSSEC manage it a bit more well, the zone. So, basically, we don't know if this will be an available solution.

In general, to conclude, I will say that this RFC are really say that the information the parent of the child should be consistent and remain so. We discovered that this was not the case and this is not the case that if the RFC is out for a long time, and we strongly advise operating a zone and follow the RFC 2181.

We also recommend that resolver vendors to confirm the authoritative information ranking RFC 2181.

And with this, I thank you for your attention and I ask if you have any questions?

CHAIR: Thank you. I'd like to check our questions. Okay, we have already got five. I hope it's all clear.

So the first one is Sam Miller asking: "You said 80% were consistent and 8% were inconsistent. Where is the missing 12%?"

RAFFAELE SOMMESE: I stated actually the missing 12% is domain that are not answering to our query. We are not able to determine if these domains are consistent or not because basically they are not answering so we don't know which are the DNS record of the child. Basically the authoritative name servers, they are either off line or maybe misconfigured, so we don't know which are the NS record in the child so cannot provide an answer. So we give the percentage for the total, so for the responding and not responding. The remaining 12 are not responding.

CHAIR: I mean, I am a ccTLD operator, I love your project,

So second question: Carsten Schiefner: "Did you check with them, for example, the .in registry, how it came to detect a mismatch. Sorting it out technically is good but it might be better to understand how it got to this in the first place."

RAFFAELE SOMMESE: We send them ‑‑ actually for fixing the problem we sent them an e‑mail. They answered us. So, we think that ‑‑ okay, we think that basically this kind of inconsistency is raised because the procedure for updating records in the parent is quite complicated in the case of root zones. It's a separate process. In the child you go to the zone file and you are done. For the record in the parent, you need to go to the register parent, you need to go to the root zone and ask for updating these records. So this process sometimes in this kind of inconsistencies, sort of operational process is the problem. So, people that need to go manually to the parent zone and ask for update.

CHAIR: Thank you. So, we have a few more.

One is that ‑‑ I am trying to sort them. Some of them are similar.

"Is there any difference of uptake from the resolvers when the domains are DNSSEC signed?"

And then Patrik Faltstrom said: "Have you looked at DNSSEC signed zones and behaviour? I guess these are kind of two of the same things?"

RAFFAELE SOMMESE: Yes, we looked ‑‑ this does not apply in the DNS record provided by the parent. It applies only to the DNS record provided by the child. What we see ‑‑ we have done a check on the behaviour of DNSSEC, but what the standards say or what the RFC say is that is that information is way more authoritative by the child without the (something).

So, there should be trusted more. And DNSSEC enabled resolver should trust. So, yes, DNSSEC will impact on these and, for sure could sort of mitigate the situation where the information provided by the child should be accepted by the resolvers. And yes we looked to the DNSSEC enabled domain for this kind of thing. We found some, but the percentage are lower. This is a sort of ‑‑ they manage a better way their zone if they do DNSSEC usually.

CHAIR: In fact, I think you can do internal research if you do the DNSSEC inconsistency check but you know we'll wait for you next year. That is one more question I see here. Actually, there is more than one.

So one is kind of long. Tom Hill from British Telecom: "I suspect that a substantial portion of these inconsistencies go unnoticed because of disparity between the NS sets used by SLDs hosted on the same infrastructure; some may be configured properly, some may have ‑‑ leading to a smoothing of an authoritative load. Have you considered expanding the research to identify this?

RAFFAELE SOMMESE: We didn't look explicitly to consider that. Actually, it's sort of a work in progress stuff. We didn't check which are the operators responsible for these NS set inconsistency. That could be the case that there are larger operator that, for example, misconfigure their server in parent and child and this type of inconsistency. But, we didn't go in deep in this research. We are work in progress and trying to extend anyway. So, we are willing to work more on this.

CHAIR: Thank you.

I think two more and then we're done. Carsten Schiefner. I apologise to everybody who I have misspelled their name. "RFC 7477 seems to assume that the information in the child zone is it better than in the parent. It is a fair assumption?" And by the way we did have another research done, I think, a few RIPE meetings ago about the TTL mismatch in the parent and child.

RAFFAELE SOMMESE: Yes, the information in the child should be prepared and this is something that should be more authoritative because it's basically provided by the child. About the TTL, for the TTL, the story is a bit different because there is in past there was an attack where ‑‑ with the fact that the TTL preferred should be the one of the child, a second point there are people that register the child with the larger TTL and try to keep the records either the record in the parent was deleted, there are some paper about gross domain, so, what the resolver developers we come up and I think also they come up for TTL, we should keep the TTL that is sort of minimum TTL between parent and child. So if the TTL in the parent is less than the TTL in the child we should keep the TTL defined in the parent. In fact, it's the opposite.

CHAIR: Great. Thank you.

And one last, I believe:

"Have you looked at differences in TTL that's done ‑‑ are the RIPE Atlas scripts used to pass the data available? If not, could you please make them available?" Sometimes the name mismatches.

RAFFAELE SOMMESE: I don't understand. You mean the RIPE Atlas that we use for extracting the data?

CHAIR: I guess so.

RAFFAELE SOMMESE: We just used, basically, the Python library. I need to check if we're planning to make them publicly available because actually we use also the data from ‑‑ we correlated the data from open DL, but we have some closed data from there. I will check, and again we will update the website if we can publish the code in a public way.

CHAIR: All right. Thank you. And then I guess that's the last one, is, Roy Arrins from ICANN asks: "Have you looked at the inconsistencies among children?"

RAFFAELE SOMMESE: Yes, we ran a small sample test for that actually. And we discovered that we are planning this work for other top‑line domain and we discovered that consistency among child usage is usually consistent among them because basically there is the zone trust enabled, so it's difficult that they go out of consistency. And we found in a reasonable amount of cases like 0.1% of cases in where the children are inconsistent between them. So, it's a really really small amount where we found child inconsistency.

CHAIR: All right. Thank you. I think that's ‑‑ we are just on time for the next slot and let me switch to our next presenter.

MARVIN GAUBE: Hello. I would like to present you something about routing on satellites. At first, about me. I am a corporate student at Tesat‑Spacecom, and in my free time I do much of community networking which initially brought me to the web community.

We do optical links which means satellite links and direct to earth links. We are part of the Airbus group and we have huge experience, meaning we have some people already applying for a longer time.

In my last year, I had a practical thesis how to archive packet forwarding on a satellite constellation with a mesh topology? So at the basic, you need to know is traditional communications satellite is in layer 1 satellite, which means that basically only relays data on layer 1. It's also called bent pipe, usually. And yeah, as soon as you got inter‑satellite links you need some form of routing or switching.

And that's what I looked into.

So, what are these LEO satellite constellations? LEO means low earth orbit. So many satellites, a few hundred kilometres, above, their main purpose is to provide global connectivity which means nowadays they provide Internet access. The satellites have ‑‑ some of these constellations have optical inter‑satellite links. You see it in the pictures down there which means basically they form a mesh network.

Some examples is that LEO is planning to launch 117 satellites. Pace X star link constellation a few more and Amazon has a similar project.

But, what are the use cases and how to implement them? The primary use case of such a constellation is upstream or transit. You have a customer anywhere where you have no terrestrial network and you want to provide with relatively low latency and high bandwidth, yeah, activity.

One example for this could be a Trans‑Atlantic flight. You just have no good choice other than use satellite technology if you want to provide good activity above an ocean, so our potential users are also here. We have intercontinental flights. We have the shipping industry, but also residents in remote areas. So you already see we have both relatively fastly moving and fixed position clients. The satellite network could be abstracted as a mesh network Cloud into a single ASN and the data should be routed from each customer to the gateway or point of presence as best suited for the destination.

You already see it in the picture. You have both the choices, if you have a Transatlantic flight from Frankfurt to Washington, you use both to a ground station in Washington and in Frankfurt.

So, at first, the question was how do realise the internal routing. There is ‑‑ well, we'll probably be using a resolution called tempo spatial SDN, there are some papers about us. The basic things for satellite constellation is that the ground speed of satellites is relatively higher, especially lower orbit satellites which means that we are talking about 90 minutes plus, minus for one orbital period. That also means we have frequent link changes. For example, the link between the gateways and the satellites and the use centre satellites relatively often changes, some kind of, like, handover. So, in classic reactive ITP would not be feasible at all, because it needs some time to re‑establish our routes and could easily become unstable.

So, the advantages, we know the position of the satellites in advance. So, we know the state of the links in advance and we can pre‑compute them, push them already in advance into the satellite, into the data protection on the satellite and install them just at the right time. OpenFlow has a scheduled bundle feature which could be used for this.

Also as a fallback, in reactive IGP would be helpful, because if you lose activity connectivity to the controller you want to get at least this part of the connection back.

But how do realise the external routing? That's a, I would say, much more interesting question, because back to our scenario, the Transatlantic flight, the outbound routing is relatively simple. As the end controller knows the full table from the presence, and if, like shown in the picture, the plane moves near Washington, it knows it can calculate exactly which POP to use for an outbound traffic.

And the inbound traffic is a little bit more, a little bigger challenge, because TTL plane starts in Frankfurt. So we want our inbound traffic preferably in Frankfurt, but if we move around, for example, the Atlantic ocean, and we want the inbound traffic to move with us to improve the latency, improve the utilisation, etc. So, what would be a solution?

If we ran the work we could only use IPv6. We could use the mobile customer and a long term virtually. That would solve this problem because we could announce them with propended paths or medium access discrimination. That would work. But that had not another draw back. We have, in 2018, we had around 25,000 airplanes around the world. So, if we assume that around one quarter of these airplanes uses some kind of satellite constellation for Internet uplink, we would just, out of this number, would get initially around 6,000 additional routes in our default free zone, so, it comes as a kind of drawback.

Is summary:

With LEO constellations it could really each remote areas, oceans with a link not possible now. That's meaning the low latency and that's meaning the relatively high bandwidth. But we need on satellites, intelligent routing to use these inter‑satellite links and to ‑‑ and the benefits.

Internal routing. Outbound routing could also be served through the controller knowing the routes from the DF set. Inbound routing is a bit more challenging, maybe we need only the FS sets per moving client.

That's it basically from me. Thank you for your attention. If you have any more questions, or detailed discussions, you can reach me in the in the mail shown here. And are there any questions?

CHAIR: Thank you very much. I imagine a big applause from the audience. We have several questions and only three minutes left, so that's something that ‑‑ or if we go too quick or maybe some of the questions will need to go off‑line or in the chat later.

First question is from Michael Davids and it's: "What is your opinion about other initiatives such as SpaceX Starlink or Google Loon Balloons?"
And then George Michaelson is also interested in similarities to Loon routing model?

MARVIN GAUBE: Especially SDN, initially came from Project Loon, as far as I know, but then the concept was adapted to also work for LEO constellations because from the routing which has the position of the router, these are both if pretty similar, I would say.

CHAIR: Thank you. Then, Alexis Suhonen from TREX would also ask: "Does all satellites cover the full table or is it more like a MPLS LSR core?"

MARVIN GAUBE: Definitely I would say more like MPLS core, because you have very limited resources on the satellite regarding power and regarding computational capacity. So you want to not do a full table on a satellite.

CHAIR: Okay. Understandable. Next question from George Michaelson: "LEO close to ‑‑ I believe I've read that for ‑‑ there are issues with pole abrasions?"

MARVIN GAUBE: It depends on which or bits they use. For example, Tesat has less satellites so it would be a little bit harder to ‑‑ you need to ‑‑ it depends on the orbit for the configuration of the constellations. There are all kinds of proposals out there.

CHAIR: Thank you. Anita Nikolich says: "For remote area network operators, satellite has been expensive. Do you imagine the costs becoming more reasonable?"

MARVIN GAUBE: Yes, because to not use Geostationary satellites but moving to LEO orbit satellites, probably they much more beneficial, much more use cases which are ‑‑ which are such a constellation could benefit from, so probably it would gain more use.

CHAIR: Thank you, Marvin. Thanks for answering so quickly.

Next question is from Peter Hessler from Clio connect: How much computation power do you expect will be available in the satellites for calculating routing tables?

MARVIN GAUBE: That's an interesting question. I would like to know the answer but at the moment I would say I have no really a good idea about which is the case we talked.

CHAIR: Okay. Finally, last question: Wolfgang Tremmel from DE‑CIX asks: Would you consider to put some RIPE Atlas probes into the satellites?

MARVIN GAUBE: It sounds like an interesting idea. But, maybe in a cube satellite it could be a nice project. If somebody would like to do this, why not?

CHAIR: Okay. Thank you very much. That leads us to the end of our session. I hope you enjoyed it as much as I did.

Thanks for attending and now we have a break. Remember, we start at five o'clock. Remember also to rate the talks and both for our PC candidates, our next session will be at 5, with the Hans Petter Holen virtual session. You know there you can join him and meet him and ask questions about his role, about the RIPE community. Thank you very much for your attention. See you at 5 p.m.

(Coffee break)

LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.