Welcome to the Promise Data Repository!
In 2006, the repository held 23 data sets.
In 2008, at last update, the repository holds 134 data sets in the following areas:
- Defect Prediction (90)
- Effort Prediction (18)
- General (9)
- Model-based SE (8)
- Text Mining (9)
Further contributions are always welcome in the above areas, or any other.
Why so much data? Firstly, there is the open source effect: public code and public logs means more data sets.
Secondly, the nature of an SE project means that once a tracking system is in place, then each new project (and each new release of each project) generates yet another data set.
How to reference data
G. Boetticher, T. Menzies and T. Ostrand, PROMISE Repository of empirical software engineering data http://promisedata.org/ repository, West Virginia University, Department of Computer Science, 2007
How to donate data
The recommended format is Attribute-Relation File Format (ARFF). For an example of such a dataset see Nasa93.
If ARFF is not an appropriate format for your data, please provide a detailed description of your data format in the paper. A guideline from UCI Machine Learning repository for documenting datasets can be found on-line. This information is placed before the actual data when using ARFF format.
If you are using an alternative format that does not support comments in the dataset, provide this information in a separate file with extension .desc, and submit the URL of this file.
Once the data is prepared, send to mail[at]promisedata.org.
Share and enjoy!

