Comparing methods for agreement

Researchers working in MRI often compare one MRI technique against another, or an MRI technique against another method of measuring the same quantity. Comparing methods of measurement for agreement requires careful statistical analysis. It is easy to choose the wrong statistical test for this problem. Let’s briefly consider correlation, regression, and the t test for comparing methods, and learn why these techniques are not the most appropriate way of comparing methods for agreement. Then we’ll see why the bias plot does the job properly.

So, you’ve measured data with method 1, and with method 2. You’ve probably plotted the data on a scatter plot. You figure you’ll fit a line to the data, right? Seems like a sensible first step. Your stats package dutifully does as you ask, and spits out a correlation coefficient (a.k.a Pearson’s correlation coefficient, or product-moment correlation coefficient).

The correlation coefficient only states the “straight line” association (the linear relation) between two variables. It is not an indicator of agreement (see Bland & Altman, Lancet 1986, p307-310). This is because it is possible to have a high correlation coefficient when agreement is poor. So let’s try simple linear regression.

Regression is used to show how well one variable (or method) can be used to predict another. You might be tempted to use regression if one of your methods of measurement is a gold standard, or is considered the “true” value. Regression can be useful in this case, because it describes the strength of any relationship between two variables (with the R2 value). However, R2 is not an indication of agreement (agreement may be deduced from the y = mx + c equations, if m is close to 1 and c is close to zero). Modified regression methods are also available if there is imprecision in both the predictor variable (x) and the response variable (y). But there’s a better method of assessing agreement.

What about the t test? Actually, using the t test to compare methods for agreement is just wrong. You might think that when the means are shown to be not significantly different (by a high P value), we deduce that the methods agree, but this is incorrect. The reason for this is that a high scatter of differences between the methods can lead to an important difference in means (that is, bias) being non-significant. In other words, it is possible that poor agreement between two methods can be hidden in the scatter of the data, and the two methods can appear to agree!

A bias plot is appropriate, to show the degree of agreement between two methods of measurement. Bias plots are sometimes referred to as Bland-Altman plots (see the Lancet article linked above). The difference between two methods of measurement is plotted (the ordinate) against the subject mean of the two methods (the abscissa). If one of the methods is a “true” estimate, or gold standard, one can plot the difference between the two methods, vs. that reference method (instead of the mean of the two methods). Read Bland & Altman’s short and easy-to-read article; the bias plot provides a number of visual clues about the agreement between your methods.

There’s software out there which does all this for you. Download and install Analyse-It (with the Clinical Laboratory module, 30 day fully-functional free trial) which is a Microsoft Office Excel add-on. Report the bias of the method you’re testing, including the 95% confidence interval (CI) of the bias value. If the CI is wide, you can’t be very certain of the bias value. If the CI includes zero, it is not unlikely that zero bias could be the real bias (which is what you want, if you want the methods to agree; narrow CI including zero is the best outcome in this case).