« Packetfence - GPL Worm Defense Tool | Main | The EarlyBird System for Real-time Detection of Unknown Worms »

Learning to Detect New and Unknown Malicious Programs

As you may have guessed by now, the automated discovery of new threats for rapid response is one of my favorite topics. This paper is up that alley:
Malicious programs pose a major threat to security, especially in the Windows platform. The most dangerous of these malicious programs are the ones that are new or unknown because they are not detected by traditional signature based anti-virii software. In this paper we present a data-mining approach to detecting new and unknown malicious programs. We extract a set of features from these programs through static analysis and build a classifier that detects which programs are potentially malicious. This classifier can generalize to other new or other unknown programs. We verify our results by testing our methods on a set of programs not used during training (i.e., programs unknown to our classifier). In one experiment, our method detects 81.54% of previously unknown malicious programs with a 0.96% false-positive rate.
Source: Learning to Detect New and Unknown Malicious Programs, Eleazar Eskin, Matthew G. Schultz, Erez Zadok, and Salvatore J. Stolfo.

January 26, 2005 in papers | Permalink
Tell others: digg submit | del.icio.us this | Reddit

Comments

This is actually an interesting idea, though it does sound like a number of syscall fingerprinting host IDS papers I've seen.

It's possible that this could be useful as a means for detecting new malware at some sort of large email gateway (I don't really buy the idea of this being implemented on individual machines.) But, even with a false positive rate of about 1% you would still see 10,000 false positives/day if you had the thing inspecting 1M executable email attachments (1M is just a wild guess) per day. That's still alot of attachemnts for experts to have to examine to find 8-10 new instances of malicious code.

To cost of having to examine 1000 candidate executables per day in the hope of finding one is probably too high unless something can narrow that set down more. It would be interesting to know how many unique executable binary attachments would be seen on, say, yahoo's mail servers per day. Perhaps just 'uniq'ing them by md5/sha-1 would get them down to a more managable number.

Perhaps someone is already doing this sort of thing?

Posted by: Robert | Jan 26, 2005 8:52:23 PM

The comments to this entry are closed.