The 6th International Conference on Predictive Models in Software Engineering
Sept 12-13, 2010
Panel Session: "Is Replication a Bad Idea?"
Tim Menzies (Computer Science, West Virginia University, USA)
Playing devils advocate, I will present the anti-replication case; i.e.
- Replication of data mining studies is stupid since data mining confused correlation with causality;
- Trite replications of methodologically flawed studies is just time-wasting pedantry;
- What matters is not "replication" but "reproduction" of causal effects in different domains;
- Studying project data separate to project context is crazy since what matters are "case studies" (that collect new project data), not mere "experiments" (that look for effects in current project data).
I will then flip the switch and talk about what I really believe:
- Web-based Science 2.0 with its open data emphasis has changed the nature of empirical SE;
- The data mining methods of PROMISE place it at the forefront of that new kind of science.
Ayse Bener (Information Technology Management, Ryerson University, Canada)
I think the question "is replication a bad idea?" has different answers in for academia, industry, and in training.
From an academic point of view, exact replication studies do not lend themselves to publications. Here at Softlab, we have written replication papers before but we either took the dataset as is and used the same methodology and then extended the methodology, or tried the same methodology on different datasets.
However, in industry work, it is very useful to replicate the same methodology on the company's own data and draw some conclusions. Companies would like to see the similarities with others and they would like to understand what their difference is if we get a different result on their data from the rest of the crowd. It is a benchmarking game for them.
Finally, if we forget about publishing all together, pure replication also serves a valuable educational and learning purpose. In my empirical SW engineering class I assign papers and ask students to replicate them. In this way, students can quickly come up to speed with the state of the art.
Mark Harman (Computer Science, King's College, United Kingdom)
The problem with replication is that it is seen as boring, unoriginal and of little value to career development. The problem with replication is that it is seen as boring, unoriginal and of little value to career development.
The last paragraph illustrates the issue. However, all is not as it seems. We have a poor set of associations with the word "replication". The dictionary definition of the word is "the action of copying or reproducing something". Understandably, therefore, we may fall into the trap of thinking of it as repeating, without original content. Who wants a replica when you can have the original? I can think of no positive connotations of the word for the person practicing it. Worse, we may think it is impossible in any case: Drugs trials might be replicatable, but how will we ever replicate another researcher's software development environment and remove all confounding factors?
If we want replication studies, we need to consider the oft ignored social aspects of the way we "do science'; we need to work with the grain of human nature and not against it. I believe that with a little imagination and necessary support from the community, valuable replication can be built into PhD programs in a way that could be beneficial to both the students undertaking the work and our discipline as a whole. In my slot on the panel I will explain how I believe that this might be achieved.