Teach-in #23: Gait Cycle Data Averaging Techniques, What people said...

CGA members,

I'll start with the disclaimer that our lab does not ensemble patient data on a regular basis.  Having said that, I'd be glad to give my input.

I'll assume that the ensemble analysis averages a point from one trial at, say, 30% of the gait cycle with the same point in the other trials.  If something like a peak flexion is the point of interest in a particular curve, the peak may not lie at exactly the same timing in the gait cycle in each trial.  The peak flexion of the ensemble average is therefore not mathematically identical
to the average of the peak flexion of each trial.

All other factors being equal (marker placement, etc), the more accurate method would have to be the second, i.e., the average of the individual peak flexions. I'm not overly confident that I can identify the gait cycle events exactly the same in each trial, and I do not trust the autocorrelate functions.  These factors will add to inaccuracy of the ensemble average.

Lanie Gutierrez
Gait lab engineer, PhD Student
Karolinska Hospital, Stockholm, Sweden


This is a great question.  I am curious if someone has actually done it both ways and compared the results.  That would be the best way to answer the question.

When I thought about this before I concluded that the second method (extract gait parameters from each gait cycle) should be more correct, but that for most applications the difference is so small that it would not be worth the extra trouble.

Your specific questions:

> 1) Would the mean and SD calculated for a given selected point (e.g. peak
> knee swing flexion) be identical if it were calculated by the second method
> as opposed to the original, ensemble averaging method?

Ensemble averaging will always lower the peaks and introduce temporal blurring, because events do not occur at the same time in different cycles.  So you will get a systematic underestimation of peak values.  This could be a problem if there is large variability in event timing between gait cycles.  On the other hand, estimating peaks from individual cycles will systematically overestimate the true peak value, because there is always noise on the curves.  Ensemble averaging will reduce the effect of noise on peak values.

How each technique would affect the SD is not immediately obvious to me.

> 2) If not, then which technique is the more mathematically correct?

If noise is not big enough to cause errors, I would say that the second technique is more correct because it does not rely on the
assumption that events occur at the same time in each gait cycle. However, usually that assumption is quite good.  Also, right at
the peak, the ensemble average is not very sensitive to timing of the peak because the derivative is zero at that point.  So it is
probably not worth the extra trouble to implement the second method.

I have not really thought about gait parameters other than peak values, though.

Another aspect that may be relevant is the subsequent statistical analysis.  If all you have is a mean and SD from an ensemble
average, you will probably do a statistical test based on the assumption of a normal distribution (e.g. more than 2*SD from the mean of healthy subjects is labeled as abnormal).  If you had extracted the gait parameters from each gait cycle, you could actually find out what the statistical distribution is to better choose/justify the
statistical tests.

Ton van den Bogert
A.J. (Ton) van den Bogert, PhD
Department of Biomedical Engineering
Cleveland Clinic Foundation
9500 Euclid Avenue (ND-20)
Cleveland, OH 44195, USA
Phone/Fax: (216) 444-5566/9198

Dear Chris and others,

An interesting topic.

1) Would the mean and SD calculated for a given selected point (e.g. peak knee swing flexion) be
identical if it were calculated by the second method as opposed to the original, ensemble averaging

No. The mean and SD calculated by the 2nd method are defined relative to the amplitude of peak knee
flexion independent of timing.
The mean and SD calculated from the first method (EA) are defined relative to the average timing of
peak knee flexion. As it is unlikely that peak knee flexion will occur at the same % of the cycle for all
trials the mean calculated by the EA method will underestimate the true peak knee flexion and the SD
should also be greater for the EA method than the alternative method.
I say should because on data for one subject that I calculated means and SD's for from both methods the
SD was slightly smaller for the EA method.

21 year old male walking at self-selected natural gait velocity. Data was normalised to a percentage of the
swing phase (FO = 0 %, FC = 100 %)

Peak knee flexion (rads) @ % of swing phase

Method 1 (EA)
Mean: -1.0652
SD: 0.0197

Method 2 (Individual trial peaks)
Mean: -1.0655
SD: 0.0218

Timing of peak knee flexion (% of swing)
Mean: 35.2
SD: 0.8367

So for this data set the peak knee flexion calculated by the EA method is an average of the magnitude of
knee flexion at 35 % of the swing phase (35.2 rounded down) and therefore for those trial in which peak
knee flexion did not occur at 35 % of swing, the peak will be under- represented. I would hypothesise
that the slightly lower SD for the EA method is not a true indication of what would happen if these 2
methods were applied to many subjects however I don't have the time at present to test this hypothesis :-(

2) If not, then which technique is the more mathematically correct?

I think that both techniques are mathematically valid, however for clinical evaluation I would think more information can be gained from looking at both the mean and SD of the magnitude and timing of the relevant biomechanical variables and comparing these to a normative database.

I should state that my experience in gait biomechanics is from a research not clinical perspective and therefore this comment is not based on practical experience.


Peter Mills
Biomechanics-Dynamics Group
School of Physiotherapy and Exercise Science
Griffith University
Gold Coast PMB 50
Gold Coast Mail Centre 9726
Queensland Australia
Ph: +61 7 5552 8357 Fax: +61 7 5552 8674
Email: p.mills@mailbox.gu.edu.au

Dear all,

Don't think this debate should proceed much further before you've all had a  chance to read Gait Posture 2000 Dec;12(3):257-64. Reduction of gait data  variability using curve registration. Sadeghi H, Allard P, Shafie K,
Mathieu PA, Sadeghi S, Prince F, Ramsay J. Abstract and link to full text at

The paper addresses exactly this issue with regard to muscle powers which are resolved into components in the different planes (but that's a whole different discussion). It shows, amongst other things that H1 muscle power (sagittal) is 20% lower when calculated from ensemble averaged data than by  taking the average of the H1 peaks from individual trials. The paper then
goes on to propose an exceedingly sensible alternative strategy for time normalising data to get around this.

If there were more than 24 hours in the day I'd have worked on introducing these methods into our clinical practice by now.


Richard Baker PhD CEng
Gait Analysis Service Manager and Director of Research
Hugh Williamson Gait Laboratory
Royal Children's Hospital
Flemington Road,
Victoria 3052, Australia

Adjunct Associate Professor, Schools of Physiotherapy and Human
Biosciences, La Trobe University
Honorary Senior Fellow, Department of Mechanical and Manufactuing
Engineering, University of Melbourne

Tel:    +61(0)3 9345 5354
Fax:    +61(0)3 9345 5447
e-mail: bakerr@cryptic.rch.unimelb.edu.au

Hi Chris,

Ray forwarded me your question in regard to normalisation and the use of derived variables in gait. I have no real interest in the assessment of gait but I do have an interest in EMG profiles and motor control - the same issues raise their ugly head - basically we collect so much data (in EMG and gait analysis) that we are able to post process in so many different ways we have a problem of missing the clinical reason why one is actually doing the assessment.

[I was going to ask the Biomech-L group about the evidence for the clinical utility of some of the derived methods / variables.
For example, at what point does a specific parameter force a change in the clinical decision making process]

Anyway to answer the question one needs to revert to the reason why the data is being collected. Here is a test of validation for you that may assist in the decision of which derived variable to select. You get two groups of subjects where we can be [clinically] confident that there is a clinical difference. You could get a group of normals test them and then place a brace / weight on the lower limb. You perform the assessments and derive the variables.

You then perform a statistical test of within subject comparison and have a look at the effect size (essentially the p value). One
would expect that the derived variable of value would be sensitive to the clinical difference we can observe. Indeed the level of confidence of (p value) gives an indication as to the ability of the processing protocol to differentiate between groups. The level of confidence does not need to reach statistical significance since this is associated with the power of the analysis but if one derived variable is p = 0.06 and the other p = 0.23 then you can be confident that the first is more sensitive (and statistically more powerful) than the second.

I used this technique in describing different methods amplitude normalisation for EMG. Some use the coefficient of variation
(CV) as a measure of removing within group variance (Winter and Yang) - however it is plausible that the reduction in variance
may also reduce the true biological variance. If the CV is combined with the effect size (p-value technique) then it is clear that there is an advantage in certain amplitude normalisation protocols. I have that reference for you if you wish.

Anyway getting to your mathematics question.

Mean of the individual ensemble averages is equivalent to the mean of all trials The SD of the individual ensemble average is not equivalent to the SD of all trials placed together.

Therefore the use of the 2 techniques will generate same means but variance in the data set will change (I think) This impacts on the interpretation of the clinical and statistical differences.

This is particularly true for an unpaired test (two groups). With a within group comparison you may choose to select change
scores or other analysis technique
... for example.. I would suggest using a analysis of covariance (ANCOVA) i.e a regression model where the initial value is
used as the covariate.  This provides greater statistical power and is able to standardise the initial difference between groups.
Vickers AJ, Altman DG 2001 Statistics notes: Analysing controlled trials with baseline and follow up measurements. British
Medical Journal 323:1123-1124.

Just a few thoughts.


ps. Hopefully we have a large number of trials where we can be confident in the utility of SD. Otherwise we may wish to
consider the Median value or some truncated mean process. I have seen SD calculated from 3 trials - ....Then again one
could use the effect size analysis to see if the median is more applicable than the mean assessment protocol.


Garry T Allison  Associate Professor of Physiotherapy
The Centre for Musculoskeletal Studies     http://www.cms.uwa.edu.au/
Department of Surgery, The University of Western Australia.
Level 2 Medical Research Foundation Building
Rear 50 Murray Street
Perth  Western Australia 6000.
email  <gta@cms.uwa.edu.au>
ph: (618) 9224 0219
Fax (618) 9224 0204

Disclaimer: I’m don’t “do” gait analysis, nor am I a statistician:

 But, from the statistical definition, I am unsure if the EA method Standard Deviation is “correct” (emphasis on the quotes).

 My understanding is for a correct standard deviation, the data points must be independent, but when using the EA method, the SD calculated at any time (t) as compared to the previous or next time point (t+1 or t-1) are not necessarily independent.

Basically, gait data is time series data, and I remember my statistics professors emphasizing that other analysis methods were needed in such cases.

But I assume this only applies to comparisons between time points within the EA (for example peak at heel strike compared to
peak at toe off).

 Now does it matter? I don’t know, but if you’re going to do the stats, then I would assume it does.



Bryan Kirking

Dear CGA-ers:

     We looked at our time-series data to see when a selected point in the gait cycle occurs per stride
     and per trial (ie, ensembled averaged strides). So, for peak swing knee flexion, we found for three strides:

     Ensembled Averaged = 70.6 +/- 2.2 @ 83% gait cycle
     per Stride = 71.1 +/- 1.7 @ 81.7 +/- 1.5 % gait cycle

     1) Would the mean and SD calculated for a given selected point (e.g. peak knee swing flexion) be
     identical if it were calculated by the second method as opposed to the original, ensemble
     averaging method?
     For the most part, no, due to the fact that the timing of when a given point occurs (ie, peak
     knee swing flexion) will usually be different, even for a very consisitent walker.

     2) If not, then which technique is the more mathematically correct?
     I would say the second technique is statistically correct because it addresses the real issue, which
     is the magnitude of the selected point independent of timing. Furthermore, the "size" of the time of
     occurrence of the selected point for a given number of strides is indicated with the second method
     whereas it is assumed when strides are ensembled averaged. As seen in the above results, there
     is no significant difference between the magnitudes of both methods but the timing range (for one
     SD) is known with the per stride method. I've found this to be an advantage, particularly in research,
     and some of my clinical colleagues have asked for this as well.

     Thanks and good luck - Arnel

     lllllllllll  Arnel Aguinaldo, MA, ATC
     (.)**(.)  Bioengineer, Motion Analysis Laboratory
        !.      Children's Hospital San Diego
       -U-    Tel: 858.966.5807 [aaguinaldo@chsd.org]
                Online: www.sandiegogaitlab.com

In the example illustrated for knee flexion, if one wants to obtain an average for peak knee flexion, it seems that there are two issues to consider:
  1. the timing of peak knee flexion and
  2. the magnitude of peak knee flexion.
As I understand it, ensemble averaging assumes that there is no difference in timing, and therefore may produce an invalid magnitude.  I have experienced this problem with my research - when a max does not occur at the same time point (% gait cycle) for each trial used in the ensemble average, then the max for the resulting curve will NOT match the absolute
max, which is what would be of interest for statistical comparison.

Therefore, I cannot see many circumstances (in research) where ensemble averaging would be appropriate.  I suppose that when a qualitative impression is all that is desired, then ensemble averaging may be acceptable.

Chriss Stanford

Christian Stanford, MA ATC CSCS
Orthopedic Research Associate
Motion Analysis Laboratory
Children's Hospital and Health Center
Mail Code 5054
3020 Children's Way
San Diego, CA 92123
phone:  (858)-576-1700 ext.3186
fax:  (858)-966-7494

The attached slide (click to enlarge) is from some teaching material I prepared for the last
ESMAC course in Milan. It's three ankle power traces identical except that  one is displaced forwards in time by 2% of a gait cycle and one displaced backwards by 2%. The yellow trace is the mean and the shaded area the standard deviation of the two traces. As you can see the peak average is only 70% of the average peak. More worrying if you take two standard
deviations from mean to indicate abnormality a person must have less than 25% of average peak power to be classed as abnormal in this way. The standard deviation for the peaks independent of time is, of course, zero.

Weird eh!


Richard Baker PhD CEng
Gait Analysis Service Manager and Director of Research
Hugh Williamson Gait Laboratory
Royal Children's Hospital
Flemington Road,
Victoria 3052, Australia

Adjunct Associate Professor, Schools of Physiotherapy and Human
Biosciences, La Trobe University
Honorary Senior Fellow, Department of Mechanical and Manufactuing
Engineering, University of Melbourne

Tel:    +61(0)3 9345 5354
Fax:    +61(0)3 9345 5447
e-mail: bakerr@cryptic.rch.unimelb.edu.au

Dear Chris and Others,

Now we are calculating about the conditions of continuous clinical gait analysis. Could you answer the following questions:

  1. Required time of one complete measurement? (gait analysis - 10-15 steps, + EMG 8 channels + force plates)
  2. time and work of the calculation for all usual gait parameters? (coordinates of the anatomical points, angles, inverse dynamical calculation, EMG evaluation, averages +SD etc.)
  3. Cost of that type of measurement?
  4. What is the usual maximum number of patients/day in a gait lab?
Answering the original question of Data averaging techniques:
according to our practice at calculation of the different usual
diagrams comparing more gait cycles on treadmill the changes of
the diagrams can characterize the condition of a patient better than
the averages and the SD. I think very soon we can summarize our
results and conclusions.

We calculate also the angular velocities of the body segments
based on the coordinates of three arbitrary not colinear points of a
segment. The results are in the coordinate system of the points
(measuring coordinate system). We developed an Excel workbook
to calculate and demonstrate the angular velocity vector. For use of
this program you need only the following coordinates for the
investigating time interval: time (msec) x,y,z coordinates (mm) of the mentioned 3 points.
one more additional question: would somebody like to try this software?


Laszlo Kocsis, PhD
associate professor
Budapest University of Technology and Economics
Department of Applied Mechanics
Head of Biomechanical laboratory

Hi Chris,

I believe that the answer to your first question regarding the relative
agreement between statistics derived from
ensemble averaged curves versus  statistics derived from individual
curves will depend almost entirely on the degree of within-patient
variability.  The discrepancy between methods becomes exacerbated if the
clinician wishes to establish normative values for a group of patients.

Consequently, we've chosen to utilize both methods of averaging, but for
different purposes.  When presenting waveform data, we utilize ensemble
averaging, but only to provide the clinician with a sense of the trends
presented in the gait cycle.  We also present this data in two forms: a
plot with all raw curves present, and a plot with the curve and standard
deviation lines surrounding the curve.  This way, the clinician can
determine if any of the gait cycles can be considered 'outliers' prior
to looking at the ensemble average and standard deviation plot.


We calculate the descriptive statistics by obtaining the information
from each waveform (i.e. peak flexion during swing, and time of peak
flexion during swing) and using this data to calculate the mean and
standard deviation for each parameter of interest.  As a result, we then
know the true variability associated with the peaks (or whatever
parameter is chosen) and the variability associated with the time at
which the peak (or other event) occured.  We use the data derived in
this manner to make quantitative comparisons between patients or between
trials.  My personal belief is that this is the appropriate statistical
method for making quantative evaluations of gait-cycle data.

I suppose that  both methods can be considered 'correct', even though
the discrepancy between results can become substantial if the patient of
group of patients is extremely variable.  For us, it's a matter of
selecting the right tool for the task at hand.

Jim Richards
University of Delaware
Health and Exercise Science

Back to Teach-in