<!doctype html public "-//w3c//dtd html 4.0 transitional//en">
<html>
<head>
   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
   <meta name="GENERATOR" content="Mozilla/4.79 [en] (Windows NT 5.0; U) [Netscape]">
   <meta name="Author" content="Dr. Chris Kirtley (Kwok Kei Chi)">
   <meta name="Classification" content="Medicine">
   <meta name="Description" content="Email discussion on statistics">
   <meta name="KeyWords" content="Gait, biomechanics, walking, statistics">
   <title>What people said...</title>
</head>
<body>
<b><font size=+2>CGA FAQ, Statistics: What people said...</font></b><b><font size=+2></font></b>
<p>I just received this enquiry and was about to email my response when
it occurred to me that this might be a good topic for discussion on CGA.
Look forward to your comments!<i></i>
<p><i>We are a little green when it comes to the statistical interpretation
of gait lab data. I have been attempting to look at the effects of two</i>
<br><i>&nbsp; orthotic interventions on pathological gait. We have completed
ANOVA analysis on 18 gait parameters so far. We were quite pleased</i>
<br><i>&nbsp; with ourselves, until it was pointed out that our error level
of 5%, effectively meant we had a 1 in 20 chance of interpreting a parameter
as</i>
<br><i>&nbsp; significantly different; when error was to true cause of
the difference.</i><i></i>
<p><i>&nbsp; It was suggested that it is possible to test the sensitivity
of each parameter, although he was unable to shed any light on how this</i>
<br><i>&nbsp; is accomplished.</i><i></i>
<p><i>&nbsp; Could you provide any further enlightenment??</i>
<br>&nbsp;
<p>Chris
<p>Well, I'm no stats expert either, I'm afraid. But there are basically
two types of error:
<p>Type I: when you conclude that the hypothesis is correct (that there
is a difference between two groups, e.g. control vs. treated) when it's
not. This is the commonest error - caused by chance, as you mention. You
can reduce the chance of making your error by reducing alpha from 5% to
1%, but very few people do this in rehab research;
<p>Type II: (also common but not as well recognized) when you conclude
that the hypothesis is not proven (i.e. there's no difference between the
two groups) but in fact there is a difference. This is usually caused by
having insufficient subjects (i.e. low statistical power, or Beta) - very
common in our field! Ther are ways to calculate the power needed (and therefore
number of subjects needed if you know the "effect size" (the size of the
difference that is considered clinically significant - usually derived
from a pilot study).
<p>My own personal view is that stats cause more problems that they are
worth in rehab research. I'd far rather see people just present the data
and let me make up my own mind. Unfortunately, stats have become expected
(even though they are usually very dubious because of the small numbers
of subjects). The risk is that people just look at the stats and don't
look at the data.
<p>Even when you have enough subjects for the various criteria to be satisfied
(e.g. normal distributions) conventional (Fischer) statistics can still
be quite misleading. If you speak to mathematicians these days they will
often laugh when you mention Fischer and tell you that the only reason
he did stats this way was because of the limitation in calculating power
at the time. Modern statisticians are much more interested in computer
simulation studies, which are apparently much more informative.
<p>Chris
<br>--
<br>Dr. Chris Kirtley MD PhD
<br>Associate Professor
<br>Dept. of Biomedical Engineering
<br>Catholic University of America
<br>620 Michigan Ave NE, Washington, DC 20064
<br>Tel. 202-319-6134,&nbsp; fax 202-319-4287
<br>Email: kirtleymd@yahoo.com
<br><a target="_blank" href="http://faculty.cua.edu/kirtley">http://faculty.cua.edu/kirtley</a>
<br>
<hr WIDTH="100%">
<br>ANOVA is the right test: it looks globally to see if there differences
between the groups - if (and only if) there is you can now inspect your
<br>individual groups to see which are different&nbsp; - from the global
mean.
<br>&nbsp;
<br>From<a href="mailto:christopher.smith@kcl.ac.uk"> Dr Christopher Smith</a>
<br>Head of Biomedical Science BSc
<br>Centre for Applied Biomedical Research, King's College London
<br>4.1 Shepherd's House, Guy's Campus, London Bridge SE1 1UK
<br>Phone/fax 020 7848 6301, Biomed Office 020 7848 6400
<br>If need be, phone me on 0797 0713507
<br>christopher.smith@kcl.ac.uk
<hr WIDTH="100%">
<br>Just some food for thought:
<p><i>&nbsp;Figures often beguile me, particularly when I have the arranging
of them myself; in which case the remark attributed to Disraeli would</i>
<br><i>often apply with justice and force: "There are three kinds of lies:
lies, damned lies and statistics.</i>
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
- Autobiography of Mark Twain
<p><a href="lovejoya@cleveland.edu">Alan Lovejoy</a>
<br>
<hr WIDTH="100%"><font color="#000080">There was a good <a href="#rothstein">editorial</a>
in this regard written by <a href="#rothstein">Jules Rothstein</a>, the
editor of the Journal of the American Physical Therapy Assoc, this past
May.&nbsp; Here is the hyperlink: <a target="_blank" href="http://www.ptjournal.org/May2003/May03_EdNote.cfm">http://www.ptjournal.org/May2003/May03_EdNote.cfm</a>
.&nbsp; Speak to everyone soon,</font>&nbsp;John J Fraser, PT, MS
<hr WIDTH="100%">
<br>Greetings Chris et al:
<br>I'm reminded of an epidemiologist citing <b>Winifred Castle</b> (a
statistician) who, apparently, said:
<p><b><i>We researchers use statistics the way a drunkard uses a lamp post
- more for support than illumination.</i></b><b><i></i></b>
<p>Cheers,
<p><a href="tstevens@sbgh.mb.ca">Ted Stevenson</a>
<hr WIDTH="100%">
<br>At least you are thinking about this issue. Its exceedingly common
to see papers published who have investigated a statistical tests on
<br>a thousand different parameters and find that 20 of them show "statistically
significant results" at the 5% level. It's particularly
<br>prevalent in gait analysis where there is no shortage of parameters
to look at. There are various solutions. There is a related though
<br>not identical phenomenon that the more parameters you look at the higher
the percentage of statistically significant results will be the
<br>product of random chance.
<p>The most commonly used is called the <b>Bonferroni correction</b> which
effectively says the more tests you do the lower the level at which
<br>you should accept statistical significance. Any decent stats book will
guide you through this process as its applied to multiple t-tests.
<br>The principle is the same for ANOVA but I'm not sure whether the technical
details are the same.
<p>A much stronger method is to<b> limit the number of parameters </b>you
look at before you start. Preferably nominate one key parameter in
<br>advance and stick to this - what ever you do make sure you can count
the number of parameters on one hand (and no polydactyly).
<br>How you do this is up to you. You can either use you clinical skill
and judgement to nominate these or do a pilot study, run tests on
<br>all the variables, and use the data to nominate the top five variables
for a definitive trial. The problem with this is that in the present
<br>environment no-one will believe you. Running multiple statistical comparisons
on data is so common that it will be assumed that
<br>you've done all those tests&nbsp; and just reported the good results.
In <b>big scale projects you can now actually pre-declare your primary</b>
<br><b>outcome measures with the Lancet</b> before you start to ensure
that you don't cheat. This is a little over the top for a most of us mere
<br>mortals though.
<p>Another approach which I've heard proposed recently from a visiting
lecturer&nbsp; from the UK (Dr Jonathan Sterne, University of
<br>Bristol, UK - I gather he's just brought a new book out which it may
be worth looking for) is to move away from assuming that
<br>anything below 5% is significant and anything above is not. Clearly
there's little difference between p=0.0499 and p=0.0501 and its
<br>daft to have a precise cut-off. Sterne would have you <b>look at the
p-values as indicating comparative levels of confidence</b> in results.
<br>This then forms the basis for a balanced assessment of the data and
suggestion of probable explanations (which may include the
<br>suggestion that any particular result is a chance finding). <b>In biomechanics
it is rare that your parameters are ever fully independent </b>and
<br>finding patterns within your significance values amongst related parameter
can be powerful evidence of a real effect rather than an
<br>aberration. Using 5% as a clear cut-off makes the process of science
appear objective but this is a lie. We should accept that the
<br>interpretation of results is subjective and get down to the nitty-gritty
of doing this honestly and intelligently.
<p>Another hang-up of Stern's, which is partially related, and reasonably
well supported in the literature is to<b> focus more on confidence</b>
<br><b>limits in interpretting data rather than p-values</b>.
<p>I find <b>Martin Bland's <i>An introduction to medical statistics</i></b>
to be an excellent guide to these issues (although it is a little superficial
in its
<br>treatment of ANOVA). Bland (mostly with Doug Altman) has also written
a number of articles on related issues for the BMJ which
<br>can be accessed easily through his web-site (<a target="_blank" href="http://www.mbland.sghms.ac.uk/jmb.htm">http://www.mbland.sghms.ac.uk/jmb.htm</a>).
<p>This whole area is a can of worms but you've got no option but to get
to grip with it if you want to valid science.
<p>Hope this is useful.
<p><a href="mailto:richard.baker@rch.org.au">Richard Baker</a>
<p>Gait Analysis Service Manager, Royal Children's Hospital
<br>Flemington Road, Parkville, Victoria 3052
<br>Tel: +613 9345 5354, Fax +613 9345 5447
<p>Adjunct Associate Professor, Physiotherapy, La Trobe University
<br>Honorary Senior Fellow, Mecahnical and Manufacturing Engineering, Melbourne
University
<hr WIDTH="100%">
<br>
<hr WIDTH="100%">
<h3>
<a NAME="rothstein"></a>Living With Error (from <i><a target="_blank" href="http://www.ptjournal.org/May2003/May03_EdNote.cfm">Physical
Therapy</a></i> May 2003)</h3>

<p><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Even as we mourn the tragic end of the shuttle Columbia, we can marvel
at NASA's
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
incredible history of 10 successful manned Apollo missions and more than
100 successful
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
shuttle missions. In addition to highly trained personnel, spaceflight
requires highly reliable
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
technology. When NASA talks about reliability, the agency ultimately is
talking about how
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
long parts function before they are likely to fail. Consider the Apollo
space program:
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Some experts have suggested that there were a total of about 2,000,000
functional parts
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
in the Saturn V rocket, lunar module, and command module. How much error
could have
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
been tolerated in this complicated array? Even if the reliability for each
part had been
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
99.9% for its contribution to the mission, the potential existed for about
2,000 parts to
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
fail—in which case, the command module almost certainly would not have
made it to the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
moon and back!
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; When
we physical therapists talk about reliability, of course, we're talking
about the error
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
associated with a measurement. Reliability of 99.9% for a measurement used
in physical
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
therapy would almost always be astoundingly good! We could only dream….
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Reliability
can be a critical issue in the planning of a study. It also is a critical
issue in
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
clinical practice. The Journal has found that, regardless of whether authors
are describing
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
research or a patient case, the reason why a measurement was used cannot
always be
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
discerned in the submitted manuscript. Some authors do discuss the selection
of
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
measurements; other authors have to be asked to do so during revision.
Either way,
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
however, authors rarely clarify their clinical decision making—clarification
that would
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
enhance an otherwise superb article.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; The
truth is, no measurement is perfect. Whether physical therapists are making
a
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
diagnosis or determining a change in impairment or disability, all measurements
have some
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
error associated with them. All of our decisions can be error ridden! And
errors are not
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
eliminated or minimized by ignoring their presence.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Authors
often try to justify the use of tests with the statement, "The reliability
and validity
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
of the measurements have been established." Even with a supporting reference,
that
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
statement is untenable. Reliability isn't like pregnancy. You can say that
a woman is either
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
pregnant or not pregnant—but you can't say that a measurement is either
reliable or
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
unreliable. Not only does error always exist, it is context dependent,
and it relates to how
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
the measurement will be used.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Is
the error so large that using the measurement would be unlikely to provide
useful
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
information? Both in research and in routine practice, we have to consider
whether the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
error could interfere with understanding the results of research or practice.
Unlike
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
statements of pregnancy, estimates of reliability lie along a continuum,
and we need to
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
know where along the continuum they lie and what that means for how the
measurement
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
can be used.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; We
also benefit from knowing something about how other authors have studied
the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
reliability of the measurement being used. Did other authors study subjects
who are similar
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
to those currently being described? Did the physical therapists who took
the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
measurements in those other studies have training and experience similar
to the physical
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
therapists taking the measurements in the current study? Were the procedures
similar?
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Was the research sufficiently robust in terms of numbers of subjects and
methods that the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
estimated error can be accepted as an excellent approximation of the true
error? These
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
are not esoteric issues. They relate to the practical world in which we
live. And they are
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
the basis on which clinicians should choose the measurements they use with
patients.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Unless
authors share their thinking about the measurements they used, the concepts
in an
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
article cannot be developed, and readers are left to imagine what they
should actually
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
have been told. Instead of saying that reliability "was established," careful
authors say,
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
"We believe that the measurement was sufficiently reliable to be used because…,"
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
followed by a logical argument, references, and details. In the Journal's
experience, it
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
takes only a few sentences to do this right. When the issue requires more
than a few
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
sentences, that usually means there is complexity, and the paper therefore
will be made
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
better by addressing the issue and the complexities forthrightly.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; In
characterizing estimates of error—specifically, statistics that describe
reliability—many
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
authors cite experts such as Landis and Koch,1 who contended that values
of kappa
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
above 80% indicated excellent agreement, values above 60% indicated substantial
levels
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
of agreement, values of 40% to 60% indicated moderate agreement, and values
below
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
40% indicated poor to fair agreement. Other authors discuss other statistics.
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Unfortunately, what they have in common is an arbitrary method of judgment
that does
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
not relate to how a measurement will be used.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; For
authors, the convenience of being able to "classify" reliability estimates
and then give
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
them value-laden names is clear. By naming reliability estimates, authors
can discuss them
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
quickly and "be done with it," claiming that their measurements have been
blessed by the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
well-respected authors of the original papers (if those measurements, for
example, reach
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
excellent levels). The problem is that we have no basis for the classification.
If we were
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
considering the diagnostic accuracy of two surgeons, for instance, would
we find it
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
acceptable that there was a 20% chance of disagreement when it came to
the decision to
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
perform life-threatening surgery? On the other hand, that level of disagreement
about the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
necessity to remove a lipoma wouldn't (in my view) be so bad.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Can
we tolerate having the same amount of error in all of our measurements?
If we use a
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
measurement that has a possible 30% error to determine whether there is
normal
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
accessory motion at the glenohumeral joint, can we consider that measurement
to be as
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
useful as a measurement with a possible 30% error that is used to determine
whether we
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
should refer a patient to a physician in an effort to ward off possible
permanent
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
neurological damage due to disc disease?
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Authors
and clinicians have an obligation to provide an argument as to why any
problems
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
with a measurement are not sufficiently large to be consequential, and
the amount of error
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
that we can tolerate depends on what we are measuring and how a measurement
will be
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
used. The Landis and Koch approach has no context and does not take into
account the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
nature of the measurement and the decisions that might be made based on
the use of the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
measurements. Context and use are critical issues for both authors and
clinicians, the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
difference being that authors must discuss these issues explicitly in submitted
papers,
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
whereas clinicians must consider these issues in patient management.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Measurements
are not equivalent to aerospace parts, of course, but there is something
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
that the Apollo space program can teach us about reliability. Because NASA
could not
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
reduce the error level to "acceptable," they adopted an alternate strategy:
planned
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
redundancy, usually triple redundancy. They developed so many backup systems
that a
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
catastrophic failure could occur only when there were multiple failures
of the same system.
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
When our clinical measurements have more error than we want, the Apollo
example
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
should remind us that alternate strategies can be developed—but authors
need to explain
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
these strategies, and, if they did not use any, authors should explain
why.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Journal
authors work hard in conducting studies and documenting practice, and harder
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
still at preparing and revising their papers. They do their own work a
disservice when they
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
fail to share their thought process in choosing measurements and other
aspects of their
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
research methods. The same is true of clinicians who do not elaborate on
why they chose
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
measurements and interventions in case reports or who practice without
regard to the
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
quality of their measurements. Ignorance about the error level associated
with
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
measurements or dogmatic refusal to consider research evidence is poor
practice.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Please
don't view this Note as a statement that reliability is more important
than other
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
measurement properties—it is not! Validity, specificity, sensitivity, and
a host of other
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
properties—as well as related topics such as receiver operating characteristics
(ROC
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
curves)—are equally, if not more, important. The issue for all of these
topics is the use,
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
and the usefulness, of measurements. We need to justify and explain what
we do, thereby
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
achieving better articles, better practice, and, in the long run, better
physical therapists.
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Jules
M Rothstein, PT, PhD, FAPTA
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Editor in Chief
<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a href="mailto:jules-rothstein@attbi.com">jules-rothstein@attbi.com</a>
<br>&nbsp;
<h4>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; References</h4>

<p><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<a target="_blank" href="http://www4.ncbi.nlm.nih.gov:80/htbin-post/Entrez/query?uid=843571&form=6&db=m&Dopt=r">1
Landis JR, Koch GG. The measurement of observer agreement for categorical
data. Biometrics. 1977;33:159–174.</a>
<br>&nbsp;
<div class="MsoNormal" style="MARGIN: 0in 0in 0pt"><i></i></div>&nbsp;

<p><br><a href="../faq.html">Back to FAQ</a><a href="../faq.html"><img SRC="signpost.gif" BORDER=0 height=32 width=33></a>
</body>
</html>