Past Awardee

Mining Outliers and Rare Events in Data Streams

Jiawei Han
Jiawei Han

College: Engineering
Award year: 2006-2007

With computer systems and networks as the basic and crucial infrastructure of our modern society, it is increasingly challenging to develop effective and efficient methods to detect and prevent network intrusion, especially dealing with sophisticated, highly motivated, and organized intruders. Experienced intruders may hide or remove the evidence of an intrusion by deleting logs, altering timestamps, and installing their own utilities to subvert the operating system. Given the dynamic nature of the networks, it is crucial to quickly discovery intrusions, locate and preserve potential evidences before they are lost or altered, without disrupting the normal operation of the network. Such tasks call for the development of fast, scalable and effective data stream mining methods.

The objective of this proposed research is to systematically investigate the principles and algorithms for computer network-based data stream mining, develop scalable and effective data stream mining methods, and construct a StreamOutlierMine system. The system will analyze the dynamic network flow data effectively, efficiently, and as much as possible in real time, to detect current and potential attacks and take timely actions, when possible, to prevent such attacks. Built on our fruitful research results and our close collaborations with NCSA researchers on data mining, data warehousing, and data mining applications, and especially based on our recent research work supported by the NSF-IIS-0308215 project: Mining Dynamics of Data Streams in Multi-Dimensional Space, and by the NCASSR project on MAIDS: Mining Alarming Incidents in Data Streams, we have developed a set of effective and efficient methods for multidimensional analysis of data streams and implemented a real-time data stream mining system, MAIDS. Besides consolidation of our previously developed data stream mining methods and further exploration of their cyber-security applications, this project will be focused on the following new data stream mining tasks closely linked to cyber-security: (1) multidimensional analysis and detection of rare events in data streams, and (2) user-guided, real-time mining of local outliers to uncover on-going attacks. We will seek for other promising applications on mining outliers and rare events in data streams and massive data sets in this project as well.