Medical Device & Diagnostic Industry
Magazine
MDDI Article Index
CLINICAL TRIALS
Biostatistics and the Analysis of Clinical Data
Richard P. Chiacchierini
Analysis of the data from a medical device clinical trial or study is one of many critical steps along the path to FDA approval and, ultimately, to the marketplace. It is the culmination of all prior planning and execution of the study protocol. In the course of a proper analysis, underlying assumptions are verified, study populations and sites are checked for comparability, and all primary and secondary study variables are evaluated.
Clinical data can arise from a controlled clinical trial or from other clinical studies that reveal information about the performance of a medical device. The term clinical study encompasses a broad spectrum of situations in which data are gathered in a clinical setting. A clinical trial is a very specific type of clinical study.
Depending on the way a study is conducted, statistical analysis of its data can be variously affected by such design considerations as sample size, comparison groups, masking, or randomization. The particular type of analysis conducted on clinical study data is dictated by the way the study was actually conducted--which may or may not be the same as originally designed. Changes in the protocol during the course of a study will also require changes in the methods of analysis to be used. This article presents the basic framework for a proper statistical analysis of data arising from the conduct of a medical device clinical trial or study. The methods discussed here are extremely powerful, but their effectiveness depends critically on the quality of the data to which they are applied. No statistical method, regardless of its sophistication, can overcome major data weaknesses that arise from seriously flawed study design or conduct.
Starting Points. The device manufacturer should recognize at the outset that analyzing the data from a clinical study is a painstaking and expensive proposition. Despite the common misconception that data analysis is "simple and straightforward, requiring little time, effort, or expense," statisticians know that "careful analysis requires a major investment of all three" (Friedman, et al., p. 241; see bibliography, p. 56). In recent years, the common misconception has been amplified by the growing number of user-friendly computer software packages that seemingly promise to make data analysis effortless. But giving the analysis of clinical data less effort than it requires often leads to incorrect or inappropriate analyses that cause major delays in FDA's product review process. Agency reviewers are skeptical of statements made by a sponsor that are not supported by a proper and appropriate analysis.
A good analysis should start with an analytical strategy. The strategy should be crudely developed at the time the protocol is written and refined as the study or trial goes to completion. It should describe in general terms:
- The anticipated analysis procedures.
- The basis for the sample size.
- The primary and secondary variables.
- The subgroups, if any, that will be investigated by hypothesis tests.
- The influencing variables (covariates) that are important, and why they are important.
Although refinement of the analytical strategy should not be taken to include wholesale changes that drastically alter the intention of the original study, it may include the addition of greater detail that moves the initial strategy from generality to specificity. The original strategy document should provide a skeleton for the analytical scheme and the refinements should provide the meat.
At first glance, many analytical methods may appear suited to the data, but only a few are likely to have underlying assumptions that are truly consistent with the data. To determine the correct analytical technique to be used, the manufacturer needs to know the answers to a number of critical questions:
- Why were the data gathered?
- How were the data gathered?
- From whom were the data gathered?
- When and for how long were the data gathered?
- Where were the data gathered?
A database with rows representing patients, and columns representing variables can yield summary data tables that might appear capable of analysis by a number of different methods. In actuality, however, there are likely to be a very limited number of methods (possibly one or two) for which the analytical assumptions are satisfied. Use of other methods that do not satisfy the analytical assumptions is inappropriate and their results are considered unreliable.
Although the term statistical analysis embraces an ever-increasing number of methods that might be used by a medical device sponsor, all such analytical methods can be classified into two main groups: hypothesis testing and estimation. In hypothesis testing, the researcher usually compares the occurrence of one or more features of interest in two or more groups of patients. Most hypothesis testing in medical device clinical trials compares the mean, proportion, or other features of the device-treated group to the same features in the control group. Features could involve such measures as the mean time to healing or hemostasis, or the proportion of patients who showed a preselected degree of improvement.
In estimation, the researcher's interest is to determine the relative value of a characteristic of interest in a group under study. The estimated value is usually accompanied by a statement about its certainty, or confidence interval, which is expressed as a percentage. Estimation is a necessary part of hypothesis testing, but it is not the culmination of the method. Estimation is also important in the analysis of safety variables. For example, in a clinical study of a "me-too" device, where effectiveness is not an issue, FDA and the sponsor may be interested in estimating the proportion of patients that might experience a particular complication. To ensure that the estimate has a high probability of being accurate, the researchers would also need to determine the confidence interval for it.
No single presentation on the statistical analysis of medical device clinical data can be sufficiently comprehensive to cover all aspects of this complicated and diverse methodology. Although this article is not intended to provide new or provocative material, it will cover the basic tenets that form the foundation for a proper analysis of clinical study data. These tenets are divided into three main sec-tions: preliminary analysis, comprehensive analysis, and analytical interpretation.
PRELIMINARY ANALYSIS
Authors of textbooks about statistical data analysis rarely discuss the need to match the analytical method to the character of the data. Often, they simply assume that the reader is sophisticated enough to investigate whether the variance of the groups being compared is sufficiently similar, or whether the distribution of the data is suitable for the analytical method being proposed. This is clearly a leap of faith that is not supported by experience.
In the evaluation of any set of data, from whatever source, it is essential to begin with an investigation of the data's basic character.
- What is the nature of the distribution of the primary, secondary, and influencing variables?
- Is the distribution of variables consistent with normal (Gaussian) or another well-known distribution?
- If the data are not normally distributed, can they be changed by a function (a transformation) that preserves their order, but brings them into conformity with well-known assumptions about their distribution?
- Is the sample of adequate size such that normality of the means can be assumed even if the data are not normally distributed?
- Are the variances of the subgroups to be compared equal?
These questions are the realm of descriptive statistics. They can be answered by applying simple, well-known tests or by inspecting rudimentary data plots such as histograms or box plots. Such questions are essential for enabling the statistician to validate the assumptions that underlie the data, and to select the most appropriate analytical method consistent with the data.
Basic Character of the Data. Clinical data are similar to other forms of data in that there are two types of variables, quantitative and qualitative. Quantitative variables are numbers that can have any value within some acceptable range. For example, a person's weight in pounds could be 125.73. Qualitative variables, however, must conform to discrete classes, and are usually characterized numerically by whole numbers. For instance, a patient who is disease-free could be characterized by a zero, and a patient who has the disease could be classified as a one. The analytical procedures appropriate for these two types of variables are diverse. While there have recently been tremendous advances in the analysis of qualitative data, the techniques for analyzing quantitative variables remain more powerful because there is more numerical information in a number like 125.73 than there is in a zero or a one.
The distribution of variables in a sample is a critical factor in determining what method of analysis can be used. Normal, or Gaussian, distribution resembles the symmetrical bell-shaped curve by which most students are graded throughout their scholastic careers. It is fully characterized by two features, the mean, a measure of the location of the distribution, and the variance, a measure of the spread of the distribution. Many well-known statistical methods for analyzing means or averages--such as the t-test or the paired t-test--are based on normal distribution. Such methods rely on normality to ensure that the mean represents a measure of the center of the distribution.
Because statistical theory holds that the means of large samples are approximately normally distributed, an assumption of normality becomes less important as sample sizes increase. However, when sample sizes are small, as they are likely to be in most medical device clinical studies, it is crucial to determine whether the data to be analyzed are consistent with a normal distribution or with another well-characterized distribution.
Most common statistical tests of quantitative variables, including the t-tests and analysis of variance (ANOVA), are tests of the equality of the measures of location belonging to two or more subgroups that are assumed to have equal variance. A measure of location, such as a mean or median, is a single number that best describes the placement of the distribution (usually its center) on a number line. Because equal variance provides the basis of nearly all tests that involve measures of location, in such cases an assumption of equal variance is more critical than an assumption of normality--even when the tests do not rely on any specific distribution of the data (called nonparametric tests). If the variances are not equal among the subgroups being compared, it is frequently possible to find a formula or function (a transformation) that preserves order and results in variables that do have equal variance.
When considering the distribution of data, it is also important to look at a picture of them. Data can be plotted for each group under consideration to determine whether the distribution is shifted toward higher or lower values (skewed). The presence of one or more values that are much higher or lower than the main body of data indicates possible outliers. Data plots can also help to locate other data peculiarities. Common, statistically sound adjustment methods can be used to correct for many types of data problems.
Baseline Variable Evaluation. Once the character of the variables of interest has been established, the analysis can test for comparability between the treatment and control groups. Comparability is established by performing statistical tests to compare demographic factors, such as age at the time of the study, age at the time of disease onset, or gender, or prognostic factors measured at baseline, such as disease severity, concomitant medication, or prior therapies. Biased results can occur when the comparison groups show discrepancies or imbalances in variables that are known or suspected to affect primary or secondary outcome measures. For instance, when a group includes a large proportion of patients whose disease is less advanced than in those of the comparison group, the final analysis will usually favor the outcomes for the former group, even without an effect that is due to the device.
About 30 years ago, another example of this effect occurred in a study that was comparing the effectiveness of surgery and iodine-131 for treatment of hyperthyroidism. The investigators found the seemingly inconsistent result that patients who received the supposedly less-traumatic radiation therapy had a much higher frequency of illness and death than those who underwent surgery. An investigation of the baseline characteristics of the two groups revealed that the patients selected for the surgery group were younger and in better general health than those selected for the iodine treatment. The inclusion criteria for the surgery group were more stringent than those for the iodine group because the patients had to be able to survive the surgery. In this example, noncomparability resulted in an inconsistent finding that was resolved only through investigation.
It is desirable to perform comparability tests using as many demographic or prognostic variables simultaneously as the method of analysis will allow. The reason for using this approach is that the influence of a single demographic or prognostic characteristic on the outcome variable may be strongly amplified or diminished by the simultaneous consideration of a second characteristic. However, the size of most medical device clinical studies is rarely sufficient to allow the simultaneous consideration of more than two variables. More commonly, the sample size of the trial will allow the investigator to consider only one variable at a time.
As part of their comparability testing, one characteristic that manufacturers must always evaluate is the study site. Such an analysis should include not only the demographic and prognostic factors, but also the outcome variables. This evaluation is important because it provides the major basis for pooling the data from various clinical sites, which is very often essential to meeting the study sample size requirement.
Imbalances detected in comparability testing do not necessarily invalidate study results. By knowing that such differences exist, however, the analyst can account for their presence when comparing the outcomes data from the treatment and control groups. Many statistical procedures can be used to adjust for imbalances either before or during the comprehensive analysis, but such adjustments are usually restricted to instances where the extent of the difference is not great. Large differences in variables that affect data outcomes among comparison groups can rarely be adjusted adequately to make the comparison groups comparable.
COMPREHENSIVE ANALYSIS
The methods used for comprehensive analysis of clinical data vary according to the nature of the data, but also according to whether the analysis focuses on the effectiveness or the safety of the device. Selection of an appropriate method must also take into account the nature of the device under study. The following sections outline some of the statistical methods available for comprehensive analysis of effectiveness data for in vitro diagnostic products and therapeutic devices, and for assessing safety-related data.
Effectiveness Analyses for Diagnostic Devices. In vitro diagnostic devices require statistical techniques that are quite specialized. Usually the analysis is based on a specimen, such as a vial of blood, collected from a patient. The same specimen is analyzed by two or more laboratory methods to detect an analyte that is related to the presence of a condition or disease. Thus, each specimen results in a pair of measurements that are related to one another. In the case of a new method devised to detect the amount of serum cholesterol, for example, each blood sample would be used to produce two measures of serum cholesterol, one from the conventional method and one from the new method.
The statistical treatment of such related (or correlated) data is very different from that of unrelated (or uncorrelated) data because both measurements are attempting to measure exactly the same thing in the same individual. Generally, if both laboratory measurements result in a quantitative variable, the first analysis attempts to measure the degree of relationship between the measurements. The usual practice is to perform a simple linear regression analysis that assumes that the pairs of values resulting from the laboratory tests are related in a linear way.
In linear regression analysis, a best-fit line through the data is found statistically, and the slope is tested to determine whether it is statistically different from zero. A finding that the slope differs from zero indicates that the two variables are related, and careful attention should be paid to the correlation coefficient, a measure of the closeness of the points to the best-fit line. A correlation coefficient with a high value, either positive or negative, indicates a strong linear relationship between the two variables being compared. However, this correlation is an imperfect measure of the degree of relationship between the two measurements (i.e., although a good correlation with a coefficient near one may not indicate good agreement between the two measurements, a low correlation is almost surely indicative of poor agreement).
Although correlation can indicate whether there is a linear relationship between two laboratory measurements, it does not provide good information concerning their degree of equivalence. Perfect equivalence would be shown if the correlation were very near one, the slope very near one, and the intercept very near zero. It is possible to have a very good relationship between the two measures, but still have a slope that is statistically very different from one and an intercept that is very different from zero. Such a situation usually suggests that one of the two measurements is biased relative to the other.
If the conventional method used in the testing is a true "gold standard" or reference method, it may be possible to adjust the chemical or electronic measurement system of the device being evaluated to make the slope one and the intercept zero. If the conventional method is not a reference method or gold standard, then the sponsor is faced with the possibility that the new method under test may be better than the one to which it is being compared. In such a situation, tinkering with the device to force equivalence may be inadvisable.
When the conventional method is not a reference method or gold standard, the degree of agreement can be assessed by another method that goes beyond regression analysis. Recognizing that the absence of a gold standard means that the conventional method is imperfect, Bland and Altman devised a technique that compares the difference between the two measurements plotted against their mean (see bibliography, p. 56). The analyst establishes a confidence interval for the difference between the two measurements and assesses the number of differences falling within the interval. If the number is similar to that predicted by theory, and the width of the interval is small enough to be clinically acceptable, then the new measurement system is considered to be in good agreement with the conventional method. However, the determination that an interval's width is clinically acceptable cannot be established by statistical techniques and must involve the judgment of a health professional.
Establishing agreement between the quantitative measures is only the first step in the analysis of an in vitro diagnostic device. Since these devices and those that are designed to give qualitative results are diagnostic, the analyst must also assess the ability of the device to detect the condition. Such an assessment requires that a value (a cutoff value) that specifies the disease state or condition has been identified for each measurement system. It is critical that this value be established on a different set of data from the measurements currently under analysis; it is unacceptable to use a value that characterizes a disease state by reference to its own data set.
The next step is to classify the patients into two groups, those with the condition and those without it. This is performed for both the new method and the conventional method by reference to a qualitative outcome or by use of the cutoff value. The result is a two-by-two table in which the four cells represent the number of patients found negative for the disease or condition by both measurement methods, the number found positive by the conventional method but negative by the new method, the number found negative by the conventional method but positive by the new method, and the number found positive by both methods. From this table it is possible to estimate the sensitivity, specificity, predictive value positive, and predictive value negative, along with their respective confidence intervals. These values are usually compared with those for other classification systems for the disease or condition under test to determine whether they are close to those known values.
The next step in the analysis of diagnostic devices involves either a relative risk assessment or a receiver operating characteristic (ROC) analysis. There is software available to perform either of these analyses. The relative risk is a ratio of the risk of the disease among patients with a positive test value to the risk of disease among patients with a negative test value. The relative risk analysis is particularly effective and can be done by use of either a logistic regression or a Cox regression depending on whether the patients have constant or variable follow-up, respectively. ROC analysis provides a measure of the robustness of the cutoff value as a function of sensitivity and specificity.
These techniques, described more fully below, allow the analysis of the measurement method along with any potential influencing variable. If the final model, fit to the data, contains a statistically significant contribution that is attributable to the sponsor's measurement system--whether or not there are significant effects attributable to other covariates--the test method provides an independent means of assessing the disease or condition. The reason for this powerful interpretation is that the test resulting from these methods is based on a statistic that has been adjusted for the presence of other significant covariates.
Finally, if the device is diagnostic for a condition that takes a relatively long time to develop (such as cancer), the analyst should evaluate the lead time afforded by the device. Sometimes this evaluation is a simple mean with a corresponding confidence interval. For these types of devices to be effective, the interval should not include zero. In addition, the farther away the lower limit of the interval is from zero, the better.
Effectiveness Analysis for Therapeutic Devices. In-depth analysis of a therapeutic device usually involves hypothesis testing to determine whether the device maintains or improves the health of patients. In some cases, FDA may permit a sponsor to compare a particular device operating performance characteristic (OPC) to a test treatment. Even in such cases, however, the result will be a test of the hypothesis that the treatment is better than or equal to a constant, the OPC. Selection of an appropriate method for in-depth analysis of data from such trials or studies depends on many factors, such as:
- Is the primary variable quantitative or qualitative?
- Was the primary variable measured only once or on several occasions?
- What other variables could affect the measurement under evaluation?
- Are those other variables qualitative (ordered or not) or quantitative?
Quantitative Primary Variables. If the primary variable under evaluation is quantitative, selection of an appropriate method of analysis will depend on how many times that variable was measured and on the nature of any other variables that need to be considered. If there is only a single measurement for each variable, and there are no differences among the potential covariates belonging to the treated and control groups, the appropriate method of analysis may be a parametric or nonparametric ANOVA or t-test. For example, a study of a new cardiovascular stent that is expected to offer better protection against restenosis, with all other things being equal, could compare the six-month luminal diameter by this method.
The choice of an appropriate analytical method changes if the covariates belonging to the two comparison groups differ and are measured qualitatively. Such cases may require use of a more complex analysis of variance or an analysis of covariance (ANCOVA). The ANCOVA method is particularly suited to analyzing variables that are measured before and after treatment, assuming that the two measurements are related in a linear or approximately linear manner. Using ANCOVA, the statistician first adjusts the posttreatment measure for its relationship with the pretreatment measure, and then performs an analysis of variance. Using the example of the cardiovascular stent, ANCOVA would be a suitable method of analysis if the amount of improvement in the six-month luminal diameter of the artery treated by the stent depended on the original luminal diameter of the artery.
In medical device studies, outcome variables are often measured more than once for each study subject. Although there are very powerful methods of statistical analysis that can be applied to such situations, they require what statisticians call balance; for example, every time a variable is measured it must be measured for every patient. A balanced repeated measures ANOVA can be performed with or without covariates. With covariates, this method reveals the effect of each patient's covariate value on the outcome variable, the effect of time for each patient, and whether the effect of time for each patient is changed by different values of the covariate. Continuing with the stent example, a repeated measures ANOVA could be applied to evaluate measurements of luminal diameter before implantation and at 3, 6, 9, and 12 months after implantation, and of the location of coronary lesions. In this case, the primary outcome variable is luminal diameter, and the covariate is the location of the lesions.
A repeated measures ANOVA also can be used if a few patients missed one or possibly two measurements. However, doing so requires the statistician to use sophisticated statistical algorithms in order to estimate the missing outcome measures, and these can present problems. To find solutions, it is sometimes necessary to restrict the data or make other assumptions that may weaken the resulting statistical conclusions.
Some studies result in a quantitative outcome variable and one or more quantitative covariates. In this situation, multiple regression methods are useful in evaluating outcome variables (called dependent variables), especially if the study involves several levels or doses of treatment as well as other factors (independent variables). Regression is a powerful analytical technique that enables the statistician to simultaneously assess the primary variables as well as any covariates.
The regression model is an equation in which the primary outcome variable is represented as a function of the covariates and other independent variables. The importance of each independent variable is assessed by determining whether its corresponding coefficient is significantly different from zero. If the coefficient is statistically greater than zero, then that independent variable is considered to have an effect on the dependent variable and is kept in the model; otherwise, it is discarded. The final model includes only those variables found to be statistically related to the dependent variable. The model enables the statistician to determine the strength of each independent variable relative to the others as well as to the device treatment. In the stent example, a multiple regression analysis would be appropriate for data where the luminal diameter was measured twice (say, at baseline and at 6 months), and the length of patient lesions was measured as an independent variable.
Qualitative Primary Variables. For device studies in which the outcome variable is qualitative, other types of analysis must be employed. Some of these resemble the methods used to analyze quantitative variables. For instance, log-linear modeling can be used to develop the same types of evaluations for a qualitative outcome variable as ANOVA and ANCOVA provide for quantitative measures.
Log-linear modeling techniques are equivalent to such commonly used Chi-square methods as the Cochran-Mantel-Haenzel method. They enable the statistician to compare the distribution of treatment and control patients within outcome classes; some techniques also make it possible to determine how consistent the influence of covariates is, and to adjust for that influence.
Because qualitative variables are represented by whole numbers, these methods require special algorithms in order to estimate quantities of interest. For most studies, finding solutions for estimating those quantities can be accomplished readily with the aid of complex computer programs. Occasionally, however, the solutions are not forthcoming without further restrictions on the data. Again, such restrictions tend to limit the statistical conclusions that can be inferred from the data.
Logistic regression methods are the qualitative counterparts to the multiple regression techniques described for quantitative variables. While the two methods include models and interpretations that correspond closely, logistic regression computations are not as straightforward as those for multiple regression. Still, they enable the statistician to determine relationships between the outcome variable and independent variables, and they offer an extremely powerful analytical tool that has markedly raised the level of sophistication for qualitative data analysis. Logistic regression allows the use of either quantitative or qualitative covariates, but requires that all study patients have a follow-up time that is essentially the same.
In logistic regression methods, a proportion is represented by a complex formula, a part of which is a multiple regressionlike expression. By estimating the coefficients for the independent variables, including the device treatment, the statistician is able to determine whether a particular independent variable is statistically related to the dependent variable. The final model contains only these independent variables, the coefficients of which differ significantly from zero. Further, the logistic regression method estimates the odds ratio--a measure of the relative risk for each independent variable adjusted for the presence of the other variables. For example, if the device were a special light designed to treat a fungus on the toenail, and if the logistic regression measured the rate of cure at 3 months after treatment, then an odds ratio of 7.9 for the treatment would imply that, adjusted for other variables in the final model, the patients who had the treatment were 7.9 times more likely to experience a cure at 3 months than patients who did not have it.
The Cox regression method is another powerful technique for analyzing qualitative outcome measures. Adapted from survival analyses, this method can determine the effect of treatments and other potential covariates even when the data do not have the same follow-up time. It yields a model and results that are analogous to those of the logistic regression method, but are not limited to patient survival outcomes. With caution, it can be applied to any outcome that includes measurement of the time to a particular event, such as time to healing or cure. A powerful characteristic of the Cox regression method is that it keeps the patient in the analysis until he or she is censored--that is, drops out of the study. This can be a critical factor in small medical device studies, in which statistical power can be compromised by even a modest number of patients lost to follow-up.
Safety Analyses. As in the case of effectiveness analyses, the selection of statistical methods appropriate for safety analyses depends on many factors. If FDA and the study sponsor have a great deal of knowledge about the complications associated with a specific disease and its therapies, estimating the rate of complication with corresponding 95% confidence intervals may be appropriate. But if little is known about those complications, a more elaborate statistical treatment is required.
The most common method used to analyze complications is to compute freedom-from-complication rates by survival methods; one of the most commonly used analysis procedures for survival data is the Kaplan-Meier method. The popularity of this method is partly attributable to the fact that it measures the time to occurrence of a complication, and, like the Cox regression method, keeps people in the life table until they are censored. In addition, at the occurrence of each event, the Kaplan-Meier method provides an estimate of the event rate and its standard error, enabling the statistician to compute confidence intervals for each event.
A related method is the life table method, in which the study duration is divided into equal segments and the proportion of events and censored patients is evaluated for each segment. For example, if the study had a one-year duration, the life table could be viewed as 12 one-month segments. Calculation of the rates would depend on the number of patients that entered the study each month, the number of events that occurred in that month, the number of patients that dropped out of the study in that month, and the number of patients who went on to the next month. The event rate is calculated for each month rather than at the occurrence of each event, and the standard error is also determined, again allowing the computation of confidence intervals.
If it is necessary to test the hypothesis that two samples (such as a control and treated group) have the same complication experience for the study duration in the presence of covariates, this can be accomplished by comparing survival (freedom from complication) rates derived through use of the Cochran-Mantel-Haenzel method or an equivalent procedure. Cox regression provides a good method with which to determine the relative importance of covariates on a rate complication.
Such analytical methods are useful for comparing the rates at which a treated and control group encounter their first occurrence of a complication, but the occurrence of multiple complications or multiple occurrences of the same complication do not lend themselves readily to a single appropriate analytical technique. A combination of nonindependent analyses is usually required to completely explain the effects of multiple events.
ANALYTICAL INTERPRETATION
Every statistical analysis has limitations that place boundaries on its interpretation of study data. The writer of the sponsor's submission must be aware of those limitations and write statements consistent with them. For example, numerical relationships detected as statistically significant by regression techniques are associations, not cause-and-effect relationships. To support the associative evidence provided by such analyses, the sponsor should also make use of preclinical animal studies and other data that reinforce the determination of cause-and-effect. On the basis of regression analysis, it would be incorrect to say that each unit of treatment caused the outcome variable to change a certain number of units unless all the information associated with the effect indicated causality.
A common misinterpretation involves the confidence interval. Nonstatisticians will say, "There is a 95% chance that the true mean lies in the given interval." Technically, this is incorrect. The proper statistical interpretation is that the true mean will be contained in intervals constructed just like the one obtained from the given sample 95% of the time on repeated sampling. There is a subtle but important difference in the two statements. The first statement is false. After one has calculated the interval, the true mean of the population being sampled is either in the interval or it is not. If it is, then the probability of being in the interval is one; if it is not, then the probability is zero.
In this short article, it is not possible to consider all the limitations that might affect a particular analytical method. The primary message here is that study sponsors should be aware that these limitations exist, and that the written description of the study results must be consistent with the limitations of the analytical technique used.
CONCLUSION
The best recommendation one can give sponsors of medical devices concerning the analysis of clinical data is to use the analytical method that is most consistent with the data and that best responds to the study objective. In the device submission, relate the analysis to the intended claim for the device and interpret the results in the light of the claim, considering the limitations inherent in the analytical method. Finally, fully document and reference the methods used; provide the statistical analysis in a technical appendix and describe the results, in English, in the body of the clinical study section.
Richard P. Chiacchierini is vice president for statistical services at C. L. McIntosh and Associates, Inc. (Rockville. MD).
BIBLIOGRAPHY
Back to the topAgresti A, Categorical Data Analysis, New York, Wiley, 1990.
Bland J, and Altman D, "Statistical Methods for Assessing Agreement between Two Methods of Clinical Measurement," Lancet, February 8, pp 307310, 1986.
Fleiss J, The Design and Analysis of Clinical Experiments, New York, Wiley, 1986.
Fleiss J, Statistical Methods for Rates and Proportions, New York, Wiley, 1981.
Friedman L, Furberg C, and DeMets D, Fundamentals of Clinical Trials, St. Louis, Mosby Year Book, 1985.
Hahn G, and Meeker W, Statistical Intervals: A Guide for Practitioners, New York, Wiley, 1991.
Kleinbaum D, Logistic Regression, New York, Springer-Verlag, 1994.
Lee E, Statistical Methods for Survival Data Analysis, Belmont, CA, Lifetime Learning Publications, 1980.
Meinert C, Clinical Trials: Design, Conduct, and Analysis, New York, Oxford University Press, 1986.
Myers R, Classical and Modern Regression with Applications, 2nd ed, Belmont, CA, Duxbury Press, 1990.
Back to the topGLOSSARY OF STATISTICAL TERMS
Back to the topAnalysis of Covariance (ANCOVA). An analytical method for quantitative variables that adjusts for the presence of covariates before performing an analysis of variance on the groups being compared.
Analysis of Variance (ANOVA). An analytical method that compares the means of groups by analyzing each group's contribution to the overall uncertainty of the data, the variance.
Balance. The condition in a study in which all subgroups being analyzed have equal numbers of patients.
Chi-Square Methods. A group of qualitative variable techniques whose results are compared to values found in a theoretical Chi-square distribution table.
Cochran-Mantel-Haenzel Method. A Chi-square method that permits statistical comparison of odds ratios across subgroups and also allows differences in those ratios to be adjusted.
Correlation Coefficient. In linear regression, a measure of the closeness of data points to the best-fit line. It can assume a value between -1 and +1; the nearer the value to either -1 or +1, the nearer are the points to the line.
Cox Regression Method. An analytical method in which event data for each group under comparison are transformed to fit a linear model. Models for each group are then compared to determine whether they are equal. This method assumes that hazard rates for each group are at least proportional to each other.
Distribution (or Probability Distribution). A mathematical function characterized by constants, called parameters, that relate the values that a variable can assume to the probability that a particular value will occur.
Kaplan-Meier Method (or Product Limit Method). A method for analyzing survival data, based on the distribution of variable time periods between events (or deaths).
Linear Regression Method. For a single item, a method for determining the best-fit line through points representing the paired values of two measurement systems (one representing a dependent variable and the other representing an independent variable). Under certain conditions, statistical tests of the slope and intercept can be made, and confidence intervals about the line can be computed.
Life Table Method. A method for analyzing survival data, based on the proportion of study subjects surviving to fixed time intervals after treatment or study initiation.
Log-Linear Modeling Techniques. Methods for analyzing qualitative data in which a function of the probability that a particular event will occur is logarithmically transformed to fit a linear model.
Logistic Regression Method. A specialized log-linear modeling technique in which the logarithm of the proportion of a group having a particular characteristic, divided by one minus that proportion, is fit into a multiple regression linear model.
Multiple Regression Analysis. A multivariate extension of linear regression in which two or more independent variables are fit into a best linear model of a dependent variable.
Nonparametric Tests. Hypothesis tests that do not require data to be consistent with any particular theoretical distribution, such as normal distribution.
Normal (Gaussian) Distribution. A theoretical distribution characterized by the traditional bell-shaped curve. It is fully characterized by its mean and standard deviation.
Paired T-Test. A test in which two related samples (such as before and after measurements) arise from a study; the test is based on the difference between the sample values, and the test statistic is a called a Student's t.
Parametric Test. A hypothesis test that requires data to conform to some well-known theoretical distribution, such as normal distribution.
Relative Risk Assessment. An evaluation of the risk of disease in a patient who possesses a certain characteristic relative to one who does not possess that characteristic. Relative risk can be assessed as a property of a clinical test.
Repeated Measures Analysis of Variance. An ANOVA that analyzes two or more related measurements of the same variable.
T-Test. A hypothesis test based on the theoretical Student's t distribution.
Back to the top


