The PROMISE'07
WORKSHOP- May 20, 2007
To be held with ICSE 2007, Minneapolis, MN, USA
Public Data Policy
PROMISE 2007 gives the highest priority to case studies,
experience reports, and presented results that are based
on publicly available datasets. To increase the chance
of acceptance, authors are urged to submit papers that
use such datasets. Data can come from anywhere including
the workshop Web site. Such papers should include the URL
address of the dataset(s) used.
A copy of the public datasets used in the accepted papers
will be posted on the PROMISE Software Engineering
Repository. Therefore, if applicable, the authors should
obtain the necessary permission to donate the data prior
to submitting their paper. All donors will be acknowledged
on the PROMISE repository Web site.
Goals
- To expand the current public repository of data sets
related to software engineering in order to conduct repeatable,
refutable or improvable experiments (the
current PROMISE
repository already contains 24 data sets).
- To deliver to the software engineering community useful
and usable and verified models or methods:
-
Models predict software properties of interest to
21st century software practitioners.
- Methods are learning systems for building particular
models for particular situations.
- To compile a list of open research questions that are
deemed essential by the researchers in the field.
- To show, by example, to the next generation of software
engineering researchers that empiricism is useful,
practical, exciting, and insightful.
- To bring together researchers and practitioners with
the aim of sharing experience and expertise.
- To steer discussion and debate on various aspects and
issues related to building predictive software models.
Journal issue
Papers accepted to PROMISE 2007 (and 2006) will be
eligible for submission to a special issue of the Journal
of Empirical Software Engineering on repeatable experiments
in software engineering.
The issue will be edited by Tim Menzies.
Topics
- Applications of predictive models to software
engineering data.
-
What predictive models can be learned from
software engineering data?
-
Strengths and limitations of predictive models.
-
Empirical Model Evaluation Techniques.
- What are best baseline models for different classes of
predictive software models?
- Are existing measures and techniques to evaluate
and compare model goodness (e.g. precision, recall,
error rate, ROC analysis) adequate for evaluating software
models? Or are more specific measures geared toward
software engineering domain needed?
- Are certain measures better suited for certain
classes of models?
- What are the appropriate techniques to test the
generated models e.g. hold-out, cross-validation, or
chronological splitting?
-
Field evaluation challenges and techniques.
- What are the best practices in evaluating the generated
software models in the real world?
- What are the obstacles in the way of field testing a
model in the real world?
- How to overcome obstacles in the acceptance of
predictive models in the real world?
-
How to test the generated models?
-
What are the obstacles in the way of field testing
a model in the real world?
-
What predictive models are more prone to
model shift? (Concept drift).
- When does a model need to be replaced?
- What are the best approaches to keeping the model
in sync with software changes?
- Building models using machine learning, statistical
methods, and other methods.
- How do these techniques lend themselves to building
predictive software models?
- Are some methods better suited for certain
classes of models?
- How do these algorithms scale up when handling
very large amounts of data?
- What are the challenges posed by the nature of data
stored in software
repositories that make certain techniques less
effective than the others?
- Cost benefit analysis of predictive models
- Is cost-benefit analysis a necessary step in evaluating
all predictive models?
- What are the requirements for one to be able to perform
a cost benefit analysis?
- What particular costs and benefits should be considered
for these models?
-
Case studies on building predictive software models.
Benchmarks
To encourage data sharing and/or publicize new and
challenging research direction, a special category of
papers will be considered for inclusion in the workshop.
Papers submitted under this category should at least
include the following information:
- The public URL to a new dataset
- Background notes on the domain
- What problem does the data represent?
- What would be gained if the problem was solved?
- Proposes a measure of goodness to be used to judge the
results; for instance a good defect detector has a
high probability of detection and a low probability
of false alarm.
-
A review of current work in the field (e.g. what is
wrong with current solutions or why has no one solved
this problem before?)
- Preferably some baseline results.
-
Description of data format.
The
recommended format is Attribute-Relation File Format (ARFF).
For an example of such a dataset see
Cocomo NASA/Software cost estimation
on the PROMISE Software Engineering Repository.
However, if ARFF is not an appropriate format for
your data, please provide a detailed description of
your data format in the paper. A guideline from UCI
Machine Learning repository for documenting datasets
can be found in
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/DOC- REQUIREMENTS.
This information is placed before the actual data
when using ARFF format. However, if you are using an
alternative format that does not support comments in
the dataset, provide this information in a separate file
with extension .desc, and submit the URL of this file.
Submission
Submissions to PROMISE 2007 are now closed. Please see
the list of accepted papers
Contact us
2007@promisedata.org
Important dates
Submission : Jan. 20, '07
Notification : Feb. 10, '07
Camera ready: Mar 5, '07
General Chair
- Gary Boetticher
U. of Houston- Clear Lake
Steering Committee
- Gary Boetticher
U. of Houston - Clear Lake
- Tim Menzies
West Virginia U.,US
- Tom Ostrand
AT&T
Program Committee
- Vic Basili
U. Maryland,US
- Dan Berry
U. Waterloo,Canada
- Barry Boehm
U. Southern California,US
- Gary Boetticher
U. of Houston- Clear Lake,US
- Lionel Briand
Carleton U.,Canada
- Bojan Cukic
West Virginia U.,US
- Alex Dekhtyar
U. Kentucky,US
- Martin Feather
NASA JPL,US
- Norman Fenton
Queen Mary (U. of London),UK
- Jane Hayes
U. Kentucky,US
- Jairus Hihn
NASA JPL's Deep Space Network,US
- Gunes Koru
U. of Maryland,Balt. Cty,US
- Tim Menzies
West Virginia University,US
- Martin Neil
Queen Mary(U. of London),UK
- Allen Nikora
NASA JPL,US
- Tom Ostrand
AT&T,US
- Daniel Port
U. Hawaii,US
- Julian Richardson
NASA ARC,US
- Guenther Ruhe
U. Calgary,Canada
- Martin Shepperd
Brunel U.,UK
- Forrest Shull
Fraunhofer Centre Maryland,US
- Willem Visser
NASA ARC,US
- Elaine Weyuker
AT&T,US
- Laurie Williams
North Carolina State U.,US
- Marv Zelkowitz
U. of Maryland,US
- Du Zhang
Cal. State Univ., Sacramento, USA