Bugs are unavoidable in any software system, and open bug repositories have been used widely to collect bug reports from different users around the world. Among all categories of bug reports, security bug reports are are special interest because they usually require earlier fixes and should be hidden from public to avoid potential attacks before a corresponding patch is released and distributed. Due to the large amount of bug reports, and limited number of bug triagers, it is usually hard for bug triagers to manually identify security bugs in time. Therefore, an automatic approach to identifying security bugs and provide the candidate security bugs to the triagers with higher priority will help developers to fix them sooner as well as reduce their exposure time to the public. In this paper, we propose a fully automatic approach to identify security bug reports in open bug repositories. Specifically, considering the imbalance characteristic of the data (security bugs are typically a small portion of all bugs), we proposed an approach combining term ranking and classification. We constructed a data set of security bugs from the RedHat open bug repository, and evaluated our approach on the data set. The experimental results show that our approach achieves 86\% on both recalls and f-scores, which outperforms basic classification and term-ranking approaches by at least 8.3 and 9.9 percentage points, respectively.
Shaikh Nahid Mostafa
All Bug Reports Data
Automatic Identification of Security Bug Reports via Semi-Supervised Learning and CVE Mining