Converting files to images has proven to be a very effective way to detect malware

General
Converting files to images has proven to be a very effective way to detect malware

Artificial intelligence, otherwise known as machine learning, is all around us. Facebook uses it to curate content (and target ads), Google uses it to filter millions of spam emails every day, and OpenAI's bots won two out of three matches against Dota 2 champions last year. The uses seem endless. One more thing: Microsoft and Intel have developed a clever machine learning framework that detects malware with surprising accuracy by transforming grayscale images.

Microsoft details this technology, which it calls Static Malware Image Network Analysis (STAMINA), in a blog post (via ZDNet). This technology consists of a three-step process. Briefly, this machine learning project begins by taking a binary file and converting it into a 2D image.

The image is then fed into the framework. This second step is a process called transition learning, which essentially helps the algorithm build upon existing knowledge, while simultaneously comparing the image to existing learning.

Finally, the results are analyzed to see how effective this process was in detecting malware specimens, how much it missed, and how much it misclassified as malware (known as false positives).

As part of the study, Microsoft and Intel sampled a dataset of 2.2 million files. Of these, 60 percent were known malware files, which were used to train the algorithms, and 20 percent were used to validate the algorithms. The remaining 20 percent was used to test the actual effectiveness of the scheme.

According to Microsoft, after applying STAMINA to files, the method accurately detected and classified 99.07 percent of malware files, with a false positive rate of 2.58 percent. This is an excellent result.

"These results certainly encourage the use of deep transfer learning for malware classification purposes. By avoiding the search for optimal hyperparameters and architectures, training time can be reduced and computational resources can be saved," Microsoft said.

STAMINA is not without its limitations. Part of the process involves resizing the image to a number of pixels that is manageable for such applications. However, for deeper analysis and larger size applications, this method "becomes less effective because of the limitations of converting billions of pixels into a JPEG image and then resizing it," Microsoft says.

In other words, STAMINA is effective for testing files in the lab, but needs some tweaking to be feasibly employed in larger capacities. This probably means that Windows Defender will not immediately benefit from STAMINA.

Categories