Tuesday, June 23, 2020

Transition to practice success story: Using machine learning to aid in the fight against cyberattacks

Artificial intelligence and machine learning becoming key technologies in cybersecurity operations

S. Jay Yang, professor at the Rochester Institute of Technology, is a 2019 Trusted CI Fellow and the first 2020 Trusted CI Transition to Practice (TTP) Fellow. His research group has developed several pioneering machine learning, attack modeling, and simulation systems to provide predictive analytics and anticipatory cyber defense. His earlier works included FuSIA, VTAC, ViSAw, F-VLMM, and attack obfuscation modeling.

In 2019, the Center for Applied Cybersecurity Research (CACR) and OmniSOC, the security operations center for higher education, began working with Dr. Yang and his team at Rochester Institute of Technology to implement Dr. Yang’s ASSERT research prototype with the OmniSOC. ASSERT is a machine learning system that automatically categorizes attacker behaviors derived from alerts and other information into descriptive models to help a SOC operator more effectively identify related attacker behavior.

“SOC analysts are overwhelmed by intrusion alerts,” said Yang. “By providing a characteristic summary of different groups of alerts, ASSERT can bring SOC analysts’ attention to critical attacks quicker and help them make informed decisions.”

CACR staff are working with OmniSOC engineers and Yang’s team from Rochester Institute of Technology to validate the methodology and test the research prototype for use at OmniSOC for applicability to SOC workflows using data OmniSOC aggregates from IU as the first of these explorations of machine learning approaches.

The team is using a subset of an anonymized parallel feed of (only) IU’s OmniSOC data. This data is pipelined to a prototype deployed on IU’s virtualization infrastructure. The results will be provided to OmniSOC engineers and analysts to determine if the method has utility for OmniSOC’s workflows. This project aims to catalyze further applied AI research for cybersecurity by taking advantage of the size of the security data set aggregated by OmniSOC, the expertise of CACR staff, and the relationships both organizations have within higher-ed security and research communities.

Ryan Kiser is a senior security analyst at the Indiana University Center for Applied Cybersecurity Research and one of the researchers involved in the project. We spoke with Kiser to catch up on how the project got started and where the project stands now.

Trusted CI: How did you learn about Dr. Jay Yang’s work?

Jay was a member of the Trusted CI cybersecurity cohort. The intent of the cohort was to get a group of security researchers together so that we could help make connections with the community that Trusted CI serves -- that is, the higher-ed and research communities and the facilities that are funded by NSF.

Some of Jay’s work is related to machine learning. Jay visited Bloomington to visit IU. It was a good opportunity for us to talk about his research. It seemed like the ability to generate models of attack was potentially applicable to OmniSOC. One of his grad students was working on a series of visualizations and a way for people to interact with the results from ASSERT, and he was able to demonstrate it for us in person.

Trusted CI: Where does the project stand now?

The project happened in phases. We planned to do it that way from the start because we weren't sure from the beginning that this would be something that could provide real value because it's still a research prototype.

We interacted with the researcher early on to find out what they need. We then tried to figure out how we can reduce this data down to reduce the risk of using operational data while still providing the functionality that is needed for the research. We determined a way to anonymize data and got approval from the security and policy offices for the use of the data in the way we proposed. Once we had that approval we could start.

The first phase was to just get a testbed set up and get the prototype deployed into the testbed, then start to get the right data from OmniSOC into the prototype. That concluded in early January.

We were starting to get results, so we started the second round to see if we can make use of this. Part of that was to develop a set of use cases for OmniSOC.

Another part of the project is that we had an undergraduate student here at IU develop visualizations as part of his capstone project and we set up some additional software to enable us to do that on the testbed. That's the phase of the project that is concluding now.

Suricata is a network monitoring and alerting tool used at IU. We wanted to take a subset of the data that Suricata is generating at IU and use that as the basis for an initial analysis, an exploration. The hope is that ultimately this can be applied more broadly, that we could do something like full network sensor data.

Another tool called Zeek captures a lot more data than Suricata about what is flowing over the network. Our hope is that once the groundwork is laid using the small dataset with Suricata, OmniSOC can start using the much larger volume of data that Zeek captures, hopefully getting much more valuable results out of it.

We have learned a lot throughout this process. One of the biggest takeaways that I have from this is the way in which it is limited. You cannot take a dataset and throw it at a neural network and then have a usable model that you can use to analyze other data. You have to tailor these things to the use case in order to solve a particular problem.

Our goal now is to work with OmniSOC and Jay to come up with a roadmap for OmniSOC and Jay to use to realize this potential. We're going to write up what we found by the end of July and plot a path forward for Jay’s group and OmniSOC to try to bring it into a real production environment.