Open Source Working Group
MARTIN WINTER: Hello everyone, time for the next ‑‑ we have a very short Open Source Working Group session this time actually because of all the virtual, a lot of speakers didn't see much sense for having a long talk. I hope we meet in person again at the next time in Milan, but otherwise we'll figure out and make a longer session there. So we have a very short session today only.
So, before we go into the talk, let's just have ‑‑ I say welcome to you and show that we have a quick talk there. Finalise agenda, I don't know if there is anything ready to change but we don't have that much time. And I also hope that you all saw the minutes from the previous Working Group.
While you thing about the agenda I want to thank the chat monitors, and the scribe. So, thank you for their work.
And with that, over to Ondrej.
ONDREJ FILIP: Hello everybody. As Martin said, today's agenda is really, really short, but everybody has to bring some sacrifices to this strange disease and we hope to have a much broader agenda during the next meeting. So let me invite to the floor Luca Deri, he is going to speak about the traffic analysers in Open Source tool called DPI, so it's going to be about deep pocket inspection in Open Source. So Luca the floor is yours.
LUCA DERI: Good afternoon, my name is Luca Deri, and I am going to talk about an Open Source library I have developed and maintained called a DPI. In particular today, after a short introduction, I would like to give you some hints about analysing encrypted traffic, because this is going to be the most interesting traffic for the coming years.
Let me start with a quick introduction. The software is developed by ntop that is an Open Source company so you can and you will find what we have done. In that sense we are basically playing with the Open Source and network traffic. So we have developed some tools for visualising network traffic, for analysing network traffic and from doing other things such as you know looking DOS attacks. I am also a software innovate err from Dell and former member of the Italian dot IT.
So let's talk about the requirements. So, in essence, people today need to understand what is going into the network. They need to understand the protocol, application proposal in particular, what that means, because today, for instance, we are using Zoom, which is not an ETF proposal but just TLS with some special information on it. So we have to ‑‑ look at application protocol saying particularly the most popular ones that today are Facebook, Google, and, you know, streaming, NetFlix, and so on.
Because of that it is very important to understand that the nature of the traffic and also if we are doing analysis, especially for security. We don't have to take into account the traffic encryption as I know somebody is doing, because unfortunately this is not an option; first of all, because it violates the user's privacy. Also, because it is not possible to decrypt all encrypted protocols. Mainly people are talking about TLS, but this is just one out of many. So we really want to stay in our passive, we want to look at the traffic and this is necessary for many reasons. Because, you know, it is important, for instance, to prevent unwanted traffic to flow in our network. So let's say if you want to keep our network safe, we have to block malicious traffic, for instance.
Also, we need to evaluate the type of traffic that is flowing to provide a better service. In particular, for home users, it is very important to give priority to the traffic not based on the IP flags but based on the content, if it's interactive or not interactive, it is very relevant.
And also, in particular, it has been at home, it is very important to identify malware, because in the past, how networks were attached to the Internet with slower links, but they are there are very fast links so something happening at home can cause a big trouble to the whole network.
So it's necessary to think of the traffic to understand its behaviour. Of course today we don't have much time but I want to give you an introduction to this and to invite you to have a look at the tool.
What is the packet inspection? I believe most of you know that it is basically the inspection of the packet pay load so we look at the ‑‑ content.
Because because it is a concern in terms of privacy because we are inspecting the pay loud, it is very important the tool kit to be Open Source because we want to show people that what we are doing is good, it's fair. Also that we don't rely on Cloud based services like many companies are doing for traffic analysis, it means that whenever is something that they need to analyse the traffic is sent to the Cloud and back, this is not what we want.
In particular, it is compulsory not to have false positive because if you are blocking the wrong protocol, the network will be disrupted. So it's very challenging.
About eight years ago I started to develop this tool kit called nDPI. Since then, I have expanded, I have integrated with other tools such as Wireshark, so if you use Linux‑based routers you will probably find firewall based, you know, application‑based protocols based on N filter and nDPI. This is the idea, to create an Open Source layer able to do the packet inspection, which support the most famous protocols today including Zoom, NetFlix, WebEx, and so on.
What is a protocol nDPI? In essence, it's identified as major and minor protocol. Again, protocol not in the IETF term, but the protocol in the user term, which is what we're talking about application protocol. So, for instance, when you upload you will use QUIC as a protocol as way to send data to your tool. But, in essence, what you are doing is you are upload. And based on that you can define your own protocol either strategically, but because, every day, most of the protocols are based on HTTP and even more on TLS, you will see that many times the protocol is in essence a configuration on top of an existing desector. For instance, for NetFlix, it means that we are doing NetFlix traffic sent to this domain names, so we inspect the initial echo, and we find with whom a certain client wants to connect and, based on that, we do it.
The traffic classification lifecycle is basically very simple. So, in essence, we tried to guess a certain protocol based on the content, and of course starting from the master can be most ‑‑ if you have TCP port 80, it might be that this is TCP so we start from HTTP. If it's not HTTP, we pass from other desector. This is the idea. Again, we don't rely on portions. This is important. If malware stops flowing in the standard port, we can detect it anyway.
If you are wondering how fast is the software. On the low end, as you can see we are able to monitor real traffic about 10 gigabit with just one core. So 1.28 million packets per second. It is efficient, particularly because DPI is analysing only the first few packets of the communication. This is important.
Now, let's give us some words about the main topic of this talk, once it's clear what DPI, and hopefully to meet you again with a longer session.
So when we have TLS, in essence we have two options: first of all, is to find the traffic, to understand what is happening, what IP, too, of communication is exchanged. And second, to understand the behaviour. Let's start from the fingerprint.
So, from the fingerprint, in essence, we need to identify the traffic from the statistical point of view. We have to dissect the initial part of the communication to understand the certificate, to realise if this traffic what we have done. What we expect. So, for instance, we heard about IoT. So how my IoT devices are talking with a certain service, so we expect them always to talk with the same server in the same way with the same certificate in the same communication time. So it should behave in a certain way all the time. I'm talking about IoT.
In malware has a totally different approach. Because in malware we have to figure out whether there is something wrong with the traffic in terms of traffic distribution in terms of fingerprinting, in terms of certificate they use. And also we have to check the content. Of course, TLS is putting a lot of pressure on us because somehow it's hiding many of the details. For instance, I want to show you an example with this encrypted communication.
So let's do an SC P. So basically a transfer. If we take that is supported by nDPI like the entropy, you see that the way you transfer a PDF or PNG or text file. The entropy is very different in this, but again, every protocol I had some typical entropy, so this is an interesting value for traffic behind simple...
Just to give you an example before I complete. I want to show you this. So this is an example taken from Instagram. This is Instagram traffic. This nDPI is a tool that is part of the tool kit that shows you the report of the automatic IX that they can find out on a data file, for instance.
So the first part is, one on the top is behaviour. So, in essence, the IP and port you will see things like how bytes are distributed, what is the good put. What are the type of extensions, ALPN, etc., what is the right ratio. In this case we are down logo traffic. Again, any protocol is especially based on this type of systems such as Instagram, on typical behaviour. But also, as in a typical fingerprint because as you can see here, we detect that this is Instagram simply because dissecting the initial echo, client echo request, so we find out this is Instagram. Also we can fingerprint the client side and server side. We can find out whether we are really talking with Instagram from the certificate point of view and also whether the client has been changed from the past so if we keep a history of the client in and the same for the server. Also if the stypher is still the same or something has changed.
Of course you still have things like the entropy or otherwise distributed.
If you want to go further, so we can analyse, for instance, a malware with nDPI. As you can see this is analysis of TrickBot. There are many of them. You simply go and find one. As you can see, it's possible to understand whether there is something wrong from the point of view, the certificate, things like this, and of course for traffic, it's also possible to detect immediately when, for instance, we are, it's changing, binary application that they are not supposed to be changing on HTTP, especially with things like, if you want to further analyse the behaviour, if you want to analyse, let's say, malware behaviour, it's possible with nDPI to generate something called traffic beams, we group traffic based on packet length and time in beams and as you see this arises from TrickBot. Just look at the part that are in colour. As you can see, they are all similar, they have the same entropy, this 2.4, they have the same packet distribution in terms of length, in terms of beams, and this is the same thing for the time. As you can see, nDPI is the basis for many applications that are using traffic analysis that go beyond the simple header and encryption. But can also understand if an accepted communication is still acceptable with respect to the expected behaviour over time.
So because we are running out of time I invite you to have a look at the code. I would like to thank you everybody. I hope to have a longer session in the future. I am here to answer any questions.
ONDREJ FILIP: Thank you very much. There are two questions in the queue so I will read them and please answer them.
Jim Reid is asking: "Any plans to extend this to DNS over HTTP or TLS DHTOT?"
LUCA DERI: We can dissect that, yeah.
ONDREJ FILIP: The second question is from of Rafal Suchecki. "How much space it needs on the server to keep data from one year?"
LUCA DERI: It depends on the amount of traffic. It's hard to answer this question. So you can contact me, I have some linking that I can give you ideas on this.
ONDREJ FILIP: Since you were brief we can process the last one from ‑ from RIPE NCC. He is asking: "Does the fingerprinting work with TLS 1.3 for every service certificate is transferred encrypted?"
LUCA DERI: Yes, it does, of course, we are not able to decode the certificate but we are able to decode the client part. This is an example, this is Instagram, so there is 1.3, as you can see there is no server certificate of course but we have everything else. So yes, it is possible to do that.
ONDREJ FILIP: Thank you very much. Thank you for your contribution. Martin, do you want to have some final words?
MARTIN WINTER: Thank you everyone for attending the short session. We will have a longer session again in Milan or, if it's virtual again, even then, we will have a longer session, so you can present your findings at the next meeting.
ONDREJ FILIP: We both hope to see you live and we will see your favourite Working Group next year. Thank you for that.
(Coffee break)