Frauds in Science

. Saturday, June 13, 2009

A month ago I wrote a post in my Spanish Intelligent Systems blog about frauds in science. In this post I resume what I wrote because I think it can be a good initial point for a debate about the present and future of Science.

Talking about frauds in science can be quite long, there are many little things in the actual scientific process that should be corrected (fake conferences, strange publishing processes, etc.), but I'll focus on big frauds,.

In August 2005, PLoS Medicine published "Why Most Published Research Findings are False", dealing with bad experimental design which conduces to wrong research findings. As stated in this paper, the scientific process is much focused in nobel research, and there is almost no support for research trying to replicate previous results, trying to corroborate previous findings.

In the paper "Repairing research integrity" published in Nature, June 2008, Sandra Titus and her team analyzes the integrity on scientific studies. Based on a survey over more than 4.000 researchers from over 600 institutiones, the results showed more than 200 cases of bad conduct in some scientific study, a number much higher than the registered previously by ORI (Office of Research Integrity). More than 60% of the total meant to data falsification, being plagiarism the next more usual bad conduct detected. Some of these frauds are detected on time, as the Kristin Roovers case, that was discovered by the editors of The Journal of Clinical Investigations when he sent a paper containing some images that had been manipulated with Photoshop.

There exists some regions where frauds are even a bigger problem. This is the case of China, where more than 60% of PhD students admit they have plagiarize some work. This represents a really big problem for China's research and even for the whole scientific community.

Recently, another big fraud in science was discovered, when The Scientist informed that Elsevier, one of the biggest scientific publishers, has several agreements with companiers to publish scientific journals that the companies use to promote their products. The first case detected was the Australasian Journal of Bone and Joint Medicine, where a paper was published promoting a product from Merck, a company that paid Elsevier for designing this journal. Summer Johnson writes about this big fraud in Bioethics, a really recommended lecture.

The scientific community must react to all these things, if we want to preserve the image of science but, what can we do? I think there are several options that could improve the scientific process:

1.- Open Access. The Elsevier case must make us to think that letting companies like Elsevier to control scientific publishing is not a good idea. Open Access seems a good way to prevent science from the desires and interests of big publishing companies. It is also a good way to assure an egalitarian access to scientific results.

2.- We should try to help iniciatives refering to negative results (or less important ones). Journals like Journal of Interesting Negative Results in Natural Language Processing and Machine Learning, Journal of Negative Results on Biomedicine, or Journal of Negative Results, are doing a good job publishing that kind of results.

3.- It also seems very important to improve working conditions of researchers. For instance, in Spain a lot of researchers earn less money that if they were working in a supermarket or driving a taxi, occupations with less responsabilities and less impact in the society. Who can care about doing high quality research if can't give his/her family a decent living?

4.- It is also needed to take up again scientific ethics. As researchers we must value what science word means. Science is not about publishing papers, science is all about improving the global knowledge, science is something really great.

Wikipedia Page Traffic Statistics for DataMiners


Gregory Piatetsky pointed in KDnuggets Twitter account the release of a data package containing 7 months of hourly pageview statistics for all articles in Wikipedia. This dataset has a compressed size over 320 GB, over 1 TB uncompressed, and includes 7 months of hourly page traffic statistics for over 2.5 Million Wikipedia articles. All text content, statistics and link data in the dataset are licensed under GFDL (GNU Free Document License).