Knowing that our data were no longer trustworthy was a very difficult decision to reach, but it’s critical that we can stand behind the results of all our papers. I no longer stand behind the results of these three papers.
There has been some questions of why I (and others) didn’t catch these problems in the data sooner. This is a valid question. I teach a stats course (on mixed modeling) and even I harp on my students about how so many problems can be avoided by some decent data exploration. So let me be clear: I did data exploration. I even followed Alain Zuur’s “A protocol for data exploration to avoid common statistical problems“. I looked through the raw data, looking for obvious input errors and missing values. […]
Altogether, I was left with the conclusion that there was good variation in the data, no obvious outliers or weird groupings, and an excess of 600 values which was expected due to the study design. As a scientist, I know that I have a responsibility to ensure the integrity of our papers which is something I take very seriously, leading me to be all the more embarrassed (& furious) that my standard good practices failed to detect the problematic patterns in the data. Multiple folks have since looked at these data and came to the same conclusion that until you know what to look for, the patterns are not obvious.