Dr Ringach wrote: “Statistical measures on 2x2 contingency tables are a way to assess the performance of individual diagnostic tests or treatments.” Granted, the statistics I have been discussing can be used for that but they can also be used for almost any modality. Sensitivity, specificity, negative predictive value, and positive predictive value are part of a system to evaluate performance called the binary classification test. Wikipedia is really not bad on this:
Binary classification is the task of classifying the members of a given set of objects into two groups on the basis of whether they have some property or not. Some typical binary classification tasks are
- medical testing to determine if a patient has certain disease or not (the classification property is the disease)
- quality control in factories; i.e. deciding if a new product is good enough to be sold, or if it should be discarded (the classification property is being good enough)
- deciding whether a page or an article should be in the result set of a search or not (the classification property is the relevance of the article - typically the presence of a certain word in it)
I would add horse racing and tips sheets, airport security measures, dogs as a modality to search for drugs, dogs as a modality for searching for buried earthquake victims, almost anything that has right and wrong answers. (See my blog for the horse racing example.)
Dr Ringach continues:
Pointing out to any individual failure does not rule out the entire biomedical research enterprise. One cannot squeeze an entire scientific field of inquiry into a 2x2 table. The notion is pure nonsense. What is the positive predictive value of physics?
These are misleading at best and fallacies at worst (or vice-versa depending on your perspective). The first criticism rests on contrasting the size of one item, a test, with the size of a second much larger item, an entire enterprise. This criticism was used to undermine the Germ Theory of Disease, as many people could not imagine that a very, very tiny thing called a germ could kill a huge strong man. They were wrong.
One individual failure (not a series of failures needed for the 2X2 table) would destroy many facts of science. One perpetual motion machine would invalidate the second law of thermodynamics and call into question Newtonian physics as a whole. A rabbit in the pre-Cambrian would call into question and probably falsify our standard view of evolution. That is why these activities are science, not pseudoscience; because they can be falsified.
But then I am not saying that an individual failure, like the fact that animals are not predictive, falsifies or rules out the entire biomedical research enterprise. As I have may have mentioned before, using animals as predictive models is a separate category of animal use in general and both are mere divisions of the entire biomedical research enterprise. Note that it is Dr Ringach who equates animal use per se with the entire biomedical research enterprise. This comment is telling as, based on my experience, animal-based researchers really do view what they do as the entire biomedical research enterprise. This dates back to Claude Bernard who said: “the true sanctuary of medical science is a [animal] laboratory.” And
Experiments on animals, with deleterious substances or in harmful circumstances, are very useful and entirely conclusive for the toxicity and hygiene of man. Investigations of medicinal or of toxic substances also are wholly applicable to man from the therapeutic point of view; for as I have shown, the effects of these substances are the same on man as on animals . . .
Squeezing an entire scientific enterprise into a scientific table actually can and is done. Many endeavors that were once considered science are now not so categorized precisely because of that 2X2 table or similar statistics. Comparing the size of an endeavour vis-à-vis the number of people participating in it, with the size of the table is not appropriate either. Regardless of how many people believe something and how much money is spent on it, specific results can stop the entire thing. One study can derail the entire endeavor. There are many examples of this in medicine.
As for physics, the predictive value of physics or science as a whole has been tested many times. Newtonian physics for example, when the appropriate item being studied is not at very small or very fast ranges of the spectrum has a predictive value of 100%. But physics is a physical science hence does not need sensitivity, positive predictive values, and so forth for the most part, as it has laws. The life sciences for the most part have theories, like evolution, and principles as opposed to laws. (This is basic philosophy of science.) As I said in a previous blog, the fact that biology has principles rather than laws is why the biological sciences need sensitivity, specificity, positive predictive value, and negative predictive value as well as other statistics. (I recommend books by Ernst Mayr for more on this.) This is not to say the physical sciences do not use statistics, they do, just that the makeup of the physical sciences is different from the life sciences and testing varies accordingly.
Also, Dr Ringach is equating the papers I reference on toxicology with individual tests. This is not the case. Many of the papers Shanks and I reference evaluated areas of toxicology while others examined individual tests. In all, animal models were found to fail as predictive modalities.
However, in my opinion while the above is almost universally accepted, none of the empirical evidence Shanks and I references is actually our strongest point. The second law of thermodynamics results in the US Patent Office ignoring patent applications for perpetual motion machines. Individual tests for each application are not performed nor are they needed. Likewise, an understanding of evolution and or complexity is all that is required in order to come to the conclusion that animals will never be predictive for humans in disease and drug testing. This is what we cover most thoroughly in Animal Models in Light of Evolution. This is classified as science theory as opposed to empiricism. (I cringe to use the word theory as it is very misunderstood and improperly used but I will risk it here.) A specific overarching theory, like evolution or relativity can explain or falsify much in a single blow.
Finally I agree with Dr Ringach that science is self-correcting. While I am not a predictive modality, I will make a prediction. Because science is self-correcting, eventually our ideas and the facts we present will lead to animal models being abandoned as predictive models for drug and disease response.
I am covering material I have blogged about many times before and about which I coauthored an entire book. The fact that I am still being asked to address these points raises the question: “Why am I being asked to cover points that have been made and discussed before?” As I have said before, if the serious reader wants more information I suggest books. I would start with Animal Models in Light of Evolution but there are many good philosophy of science texts, critical thinking texts, and statistics texts and web sites if the reader desires more in-depth knowledge on those subjects. Those with an agenda will not be satisfied with any amount of facts, details, or explanations.