UNIVERSITY OF HERTFORDSHIRE COMPUTER SCIENCE RESEARCH COLLOQUIUM "The Misuse of the NASA Metrics Data for Software Defect Prediction" David Gray (Computer Science, University of Hertfordshire) 2 November 2011 (Wednesday) Meeting Room C152 Hatfield, College Lane Campus 1-2 pm Everyone is Welcome to Attend Refreshments will be available Abstract: The NASA Metrics Data Program data sets have been heavily used in software defect prediction experiments. In this talk I will demonstrate and explain why these data sets require significant pre-processing in order to be suitable for defect prediction. This involves a thorough data cleansing process, where between 6 to 90 percent of data points are removed from each set. The issues found in these data sets potentially jeopardise much prior research, and reiterate that data quality is a serious issue in software engineering. -- http://cs-colloq.feis.herts.ac.uk/