IGF 2022 Day 2 Lightning Talk #30 What is the DNS Abuse Institute? - RAW

The following are the outputs of the captioning taken during an IGF intervention. Although it is largely accurate, in some cases it may be incomplete or inaccurate due to inaudible passages or transcription errors. It is posted as an aid, but should not be treated as an authoritative record.

***

>> ROWENA SCHOO: I'm going to kick off and hope that you can hear me. My name is Rowena Schoo. I'm from the DNS institute. So generally, I'm going to cover what the institute is, a little bit about DNS abuse and focus on two of our -- netbeacon -- and DNSAI compass. And take any questions we have. So the institute was created and funded by public interest registry who are the people that run dot org. This is in pursuit of their non-profit mission. Abuse is a really complicated global problem that requires coordination across a whole bunch of actors. We structure our work into three pillars, education, collaboration and innovation. And we aim to identify areas of friction and complexity and move those into the institute to find solutions to some of those problems and bring people together. So firstly, what is DNS abuse?

So there's a number of different definitions on this. Some of those start to be very narrow in terms of technical abuse. So Malware, botnets, pharming, phishing and spam. Others like to take a broad view on this. And they consider anything bad that happens on the internet that uses a domain name. There are different views on this. In terms of why those different views, domain name registries and registrars have a limited set of tools they can use to act on abuse. They tend to align to a narrower definition as a result of that. In terms of other views on this that are broader, often the DNS is looked at as the only centralized component of this global internet infrastructure. So people wanting to put more things into that bucket of DNS abuse. So I think it's really important when we're thinking about the definition to not get too tied up on exactly what DNS abuse means. It can be used sometimes as a shorthand for saying that mitigation is appropriate at the DNS level. And when we sought to look at examples of category of harm that could fit into DNS abuse definition, we find that even within those categories we do need to get more specific. If you take phishing as an example, it's not really enough to say there's fishing happening. You also need to look at whether there's sufficient evidence of that phishing, whether the domain name has been registered purely for the purposes of FUSHing in which case we consider a malicious registration or whether it's a domain name being used for another legitimate purpose but compromised. In which case we need to think about collateral damage and other types of harm that could take place if a registry or registrar operator were to act on that domain name. If that was being used to offer essential services, could disrupt access to the services which may, in some cases, be worse than the initial harm they are trying to tackle. So at the institute, we like to think about the principles around mitigation. So whether mitigation at the DNS level is effective, simple, quick, cheap, proportionate to the harm and whether it's a precise tool for the issue you are looking at. And we tend to use it as a way of structuring adults around this.

That said, we focus mainly on technical abuse. So when we talk about DNS abuse, we are thinking about phishing, Malware, botnets and spam when used as a delivery mechanism for those other types of abuse.

I put a few definitions on the screen in case people aren't familiar. Phishing is around attempting to collect sensitive information or important personal information through means of deception. Malware is about software that has a malicious purpose. And botnets are a connected group of devices or networks that are being controlled for malicious purposes. And spam is unsolicited email. Can be used in combination with those other types of abuse.

So I talked a little about this distinction between malicious and compromised. Ma -- a compromised is benign and compromised in some way and used for malicious purpose. The point I was talking around collateral damage. There is potentially a victim at the heart of that domain registration in terms of the register STRANT who may not know their domain is being used for this purpose. So roughly, around 25 to 45% of all phishing and Malware are from compromised web sites. So it's a really important piece of that puzzle when we're thinking about mitigation and what action should take place and by who.

So I am going to focus now on two of our key initiatives at the institute. The first one is Netbeacon. It's designed to address two problems. The first is reporting abuse on the internet is really hard. Requires a certain amount of technical knowledge, there's no consistent standards, it's not really something scaleable to the size of the internet. So it's tricky for reporters. The second part of the problem is on the other side. People who receive the reports often registrars, they don't always get reports that are for them. Some of them are unevidenced. Some of them are unactionable or don't contain all the information they need to make a decision. So those are the reasons we put together Netbeacon to design a centralized place for abuse reporting that is easy to use and automatically routes those reports in a standardized format to registrars. So important to note that it's free. Free for registrars and reporters. Everything we do is openly accessible. Allows anybody who finds an issue of phishing, Malware, botnets or spam to access the portal, put in the evidence that's needed and that report will be sent to the correct registrar. The reporter doesn't need to look up where to send that. We do that hard work of addressing that for you. And on the registrar side, they'll receive reports automatically. They can also access more functionality around customizing how they receive those reports, where they go and also select which enrichments they like. When we receive a report, we cross-reference that with some other databases that give information and context in that report so they can choose which ones they want to apply. And Netbeacon was developed from clean DNS who provided the mark for producing this tool.

So other pictures. There's an API that can be used to speed up this process. Registrars can embed this form on their web site. Important to note what Netbeacon is not. Not an abuse management tool, doesn't make determinations of what happen. We don't store reports permanently. And it doesn't provide access to customer information for registrants or customers. Some of the things we're thinking about for the future. We'd love to expand the harms that are covered. Love to integrate other technical operators. So moving beyond registrars and registries into hosting and other providers. Still need to integrate top level domains. This is around TLDs. And we'd love to do more around reporter reputation. That's an area that is very interesting and a lot of desire to understand which reporters are consistently submitting well evidence reports.

So the second initiative that we have is called DNSAI Compass. So this one is all about measuring DNS abuse. We have this mission to reduce DNS abuse at the institute and it's really hard to know if we've done that if we don't have a sufficiently robust way of measuring that over time.

So what we did here was partner with external academic in France. He's done a lot of work in this space. And we basically briefed him to find the best possible way he could to measure DNS abuse. And we wanted that measurement to include evidence of the abuse. And we had certain principles we wanted to see through in this project. So one is around transparency. One's around credibility and independence. And the other's around accuracy and reliability. So part of the work that we did to ensure transparency was we published a detailed methodology you can see a screen shot of on the screen there. That's available at our web site at that link so you can go through and see exactly how we're getting to the numbers we're getting. Some things to note up front is that we have optimized for accuracy. Evidence collection is really important. We're not trying to measure everything bad that happens on the internet. We're trying to have a robust way of measuring the things we think we can measure reliably over time. We're looking at whether mitigation has occurred. And also looking at how many of the domains that we find are compromised or malicious. To give you a sense of the data that come in. This is available on our web site. This is interactive.

What you are looking at here is the account of the unique domain names our methodology have identified are involved in phishing or Malware. This project focuses on just phishing and Malware. That's where we felt there was sufficient evidence we could implement. We are considering other harms in the future. And this is something we were expecting. Consistent with other attempts. One thing to note about our reporting is we count unique domain names. We're not trying to count harm. We're trying to count how many unique domains are identified as being involved in phishing and Malware. And one of the reasons for that is because we're coming from the registrar and registry perspective of what they can do about that harm. And from their perspective, there's one domain name they can take action on or not. So that's why we're not counting and not trying to estimate total harm.

This next one focuses omit gas station. So the green bar you are seeing there is when our methodology identified some mitigation has happened. So that means that we believe the harm has stopped. We're not distinguishing exactly why it stopped or who has taken action. It means we think the harm has ceased. Could be the registrar took action. Could be the registry. Might even be the person using the domain name for malicious purpose has stopped using it and done what they intended to do and disabled the domain name. Starting to give us a sense of how many of those unique domain names we identify are being mitigated. There's also a category for not mitigated. That means according to our methodology, we believe the harm has not been mitigated for the period of measurement. That period is up to 30 days after the domain name is identified. So we could check this continuously. We had to put a deadline on how long we would go back to visiting that domain name and checking whether any mitigation has occurred. And the time we chose was 30 days. So another reason why our reporting tends to be delayed. We have a domain appear and go back of minutes and extend out to 12 hours up to 30 days. If something comes on to a list at the end of July, still need the month off of that to check. We have this category of uncategorized. Something we included in pursuit of our transparency principle. We wanted to be clear about areas where we weren't able to determine whether we thought the domain had been mitigated or not mitigated. And there is text in our methodology that provides reasons why we think that might be. And we have unprocessed category. That is due to things like server areas and technical access. So this next shot is looking at the speed of mitigation. So this one is slightly different in terms of how it's structured. Measuring the count of registrars. We don't know for sure who took action. What it indicates here is the domain is under their management. Here, we're counting how many registrars have a median mitigation time within each of these buckets. So 0 to 24, 24 to 48, 48 to 7 and more than 7. Up to 30 days when our mitigation measurement stops. So trying to give a sense how quickly mitigation is happening. There isn't industry standard to this. Generally, trying to understand what is happening, how quick it's happening and how it varies across the industry.

And finally, this is looking at the distinction between malicious and compromised. You can toggle between phishing and Malware. I split them up here because they are different. Is still a decent proportion around 30% as opposed to maliciously registered. With Malware, it ranges quite a bit. It's 57 to 85% over this reporting period.

So in summary to wrap up, granlater matters. Quite specific about what we're talking about. The type of harm and type of mitigation that might be appropriate. That's a really helpful way to get around discussions about what is and is not DNS abuse if we get specific about what we're talking about. Reporting, we hope we've made this easier. Netbeacon is up and functioning. We have a bunch of things we want to do with that. We're hearing great things already in terms of how registrars are receiving those reports. We're keen to raise awareness about this and drive traffic for people to use that as the go-to place for reporting. Compass is also up and running. We release reports every month. You can access the data online. We also publish a PDF where we give more commentary. And get in contact. We will talk to anybody who is genuinely interested in tackling this issue. We'd love to hear from you. Contact form on our web site. Reach out and either myself or Graham will respond and we'll have a chat. All right. I might leave it there and see if we have any questions. Hoping there was sound throughout.

>> It's Adam. Yes, we have one question. I will give the microphone to the gentleman. Say your name and ask your question.

>> My name is Jasalin. I have questions about the mechanisms that you use to collect the DNS abuse data. That's one question. I would like you to expand on that a bit. And the second question is the quality -- the specific format the data is available. Do you publish the data for public access so people can repurpose the data?

Particularly, is it available in form. And my third question is about privacy related matters. You collect the data and then what mechanism do you use to make the data anonymous so privacy reserving. Those are three questions I would like you to answer. Thank you.

>> ROWENA SCHOO: Thank you for those questions. Really great section. So something around methodology. The format of the data and also privacy. So to give you more context. The way that the methodology works, we, collabs out of the University. In just a number of existing threat feeds, from sources that provide these lists either for fairly low or no cost, there's a list of those in our methodology. Those lists are then ingested and go through, essentially, a filter where collabs do certain things. They remove what we call special domains. Special domains are things like something like Google Docs which would be appropriate to act at the DNS level if there were issues. One of the other things they do is take screen shots so there's evidence. And they also take something that is kind of like a digital fingerprint of a variety of attributes related to the domain name at that point in time. That digital fingerprint that is checked to see if mitigation has happened and that feeds into the determination of whether the domain name is malicious or compromised. We provide those charts. The charts are interactive. We don't provide the under lying domain names to everyone. If people are interested in seeing those, they can approach the providers. We list out who they are. That's something the registrars can do if they are interested in receiving those domain names which they can investigate and consider whether action is appropriate within their control. In terms of the privacy question. There isn't any personal data collected in this project. So we don't have to anonMIEZ it. We don't collect personal data. I hope that answers all three. If not, let me know.

>> We have nodding in the room. So thank you. That answers great. And, yeah, any questions online?

Thank you, Rowena. Very interesting.

>> ROWENA SCHOO: Thank you. And thank you for bearing with us through a few technical issues.

>> Okay. Thanks. We'll end the session and see you soon.

>> ROWENA SCHOO: Wonderful. Thanks, everyone.