IDRC 2014 Shootout Data and Rules
Traditional Software Shootout
This year’s shootout is quite different from previous competitions. For the first time, a petrochemical dataset will be analyzed. We would like to thank Halliburton, Christopher M. Jones and David Perkins, for providing the data and Michael Myrick for facilitating the process. However, with the nature of the data comes the territory. Since this is a competitive field of research, steps have been taken to reduce the potential for competitors to use the data for their own commercial advantage. Specifically, the wavelength scales are unspecified (although it covers the NIR), the calibration values have been normalized, and the nature of the parameters being predicted is not specified.
Two datasets are provided. Only one parameter is available per dataset.
The challenge will consist in developing the best model for the parameters and datasets provided using the calibration data. Because of the limited amount of information available, success in the shootout will rely on the participants’ ability to build a model by relying only on their chemometrics skills, and not their knowledge of the data. However, the most important task will be to build a model that will be robust to the variability present in the validation set and possibly not present in calibration. In addition, the quality of the presentation of the results and the reasoning behind the approach taken will be used to determine the winner. Participants are to:
1) Develop the best possible model for the parameter on the calibration set
2) Test their model on a test set (we provide reference values)
3) Predict a validation set (we do NOT provide reference values)
4) Detail the reasoning when selecting pre-treatment methods, regression method, and number of latent variables
Download the shootout rules here.
Dataset 1
Matlab -  all data
UNSB - all data
JDX - calibration, validation, test set
note - in the JDX files the first 2 columns of the spectral data correspond to temperature (degree C) and Pressure (psi)
Dataset 2
Matlab - all data
UNSB - all data
We hope you enjoy this new format and we welcome your feedback.