Software testing has been the major approach to software quality assurance for decades, but it typically involves intensive manual efforts. To reduce manual efforts, researchers have proposed numerous approaches to automate test-case generation, which is one of the most time-consuming tasks in software testing. One most recent achievement in the area is Dynamic Symbolic Execution (DSE), and tools based on DSE, such as KLEE, may generate test suites achieving higher code coverage than the test suites used in practice. However, besides the competitive code coverage, there have been few studies to compare DSE-based test suites and test suites in practice more thoroughly on various metrics to understand the detailed differences between the two types of test suites, as well as whether DSE-based test suites are able to replace or provide extra value to test suites in practice. In this paper, we present an empirical study on the GNU CoreUtils programs, and compare DSE-based test suites with test suites in practice on their test sufficiency. Our empirical study also investigates how these two types of test suites differ from each other. Our results show that while DSE-based test suites are able to generate test cases with higher code coverage, they are relatively less effective on covering hard-to-cover code and killing mutants. Furthermore, our qualitative study reveals that the two types of test suites are good at covering different types of code and killing different types of mutants.
Qualitative Analysis Details