Clustering Microsoft Windows Executables based on TF-IDF and API Information
Main Article Content
Keywords
Clustering, Windows Executable, TF-IDF, API, K-means, Random Forest
Abstract
The illegal software usage is 39% worldwide and malware is frequent in the illegal software. To
protect attacks from malware, we use software filtering. The software filtering compares equivalence
of a testing software to an original one. This requires comparison between all the legal programs in
the market. So we have to reduce the number of comparisons by clustering programs in the market.
Every market provides categories to programs such as image viewer, video player, audio player, and
messenger, etc. But it is not clear that these categories are best fit to filter malware. We suggest new
categories which are more suitable to classification experimentally. Our categories are automatically
made from the K-means clustering algorithm based on TF-IDF and API information. Experimental
results show that our clustering scheme is better than the existing categories to classify malware.