Thursday, July 8, 2010

How to manage experimental data

The data generated by the experiments is increasing significantly. How to manage them become a huge issue. In this blog we propose some best practices for successfully manage and index the datasets.
We will follow these practices in the future.

We will use the excel or OpenOffice to store the data.
The format is:
In the first table, it is about the overall indexing of the tables needed in experiment.
Next, in each table, it will only store one closely related set of data. For example, when you measure the performance of an algorithm, you may want to measure the running time of the algorithm, also, you want to store the space efficiency of the algorithm. So, in the first page, it will be the name of the two tables and some brief intro to these tables. It is better off including the name of the datasets in the first page. In the second page, it is the time table, which could be the running time of the algorithm, and the running time of some other strawman algorithms. Also, it could include the preprocessing time of the algorithm...

Another question is how to manage the raw data. Raw data is the data that is not yet processed.
For each raw data table, we will need to record the original source of the data, the name of the data.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.