Statistical Significance vs. Equivalence: What Clinical Investigations Really Show

At medXteam, the focus is on clinical data. In this context, as CRO, we not only carry out clinical trials with medical devices in accordance with MDR and ISO 14155, but also offer all other options and forms of data collection and product approval as well as market surveillance. The focus of clinical trials is on the data collected, the evaluation of the data and the interpretation of the results. When interpreting results, a common mistake is to interpret the lack of a statistically significant difference between two treatments or products as evidence of their equivalence. In this blog post we will examine why a non-significant difference does not mean equivalence and what consequences this can have for clinical studies of medical devices .

Underlying regulations

EU Regulation 2017/745 (MDR)
ISO 14155

1. Introduction

An essential step after collecting data in clinical trials is their evaluation. Testing statistical significance or equivalence plays a crucial role here, depending on the nature of the study and the aim of the investigation. Statistical significance refers to whether the observed results are likely due to a real effect rather than random fluctuations. Equivalence, on the other hand, means that two treatments or products can be considered equivalent because their differences are not clinically relevant.

2. What does a non-significant difference mean?

A non-significant difference in a clinical trial means that the observed difference between two groups is not large enough to be statistically confident that it was not due to chance. Typically, a p-value greater than 0.05 is considered not significant. The p-value indicates how likely it is that the observed data or something more extreme will occur given the null hypothesis. The significance level (usually 0.05) is the threshold at which the p-value is considered small enough to reject the null hypothesis.

Example:

A clinical study compares a new implant with an existing implant and finds a p-value of 0.08. This means that the probability that the observed difference was due to chance is higher than 5%. Since the p-value is above the established significance level of 0.05, the difference is considered not significant.

3. Why is this not equivalent to equivalence?

In contrast to testing for a statistically significant difference, equivalence testing aims to show that the differences between two treatments or products are so small that they lie within a clinically acceptable range. This is achieved through specific study designs such as equivalence or non-inferiority studies.

Equivalence studies:

These studies set two predefined limits (equivalence limits) within which the differences between treatments must lie to be considered equivalent. The goal is to show that the effectiveness or safety of the new product does not differ significantly from that of the established product.

Non-inferiority studies:

These studies check whether the new product is no worse than the existing product by only setting a lower limit that the new product cannot exceed.

4. Differences in methodology

4.1 Null hypothesis

When testing for statistically significant differences, the null hypothesis is usually that there is no difference. In equivalence studies, however, the null hypothesis is that the treatments are not equivalent. The study must provide enough evidence to refute this null hypothesis.

Statistical significance tests play a central role in both types of studies, but the objectives and interpretation of the results differ. In classic tests of statistical significance, one looks for evidence that an observed difference did not occur by chance. The null hypothesis is rejected if a statistically significant difference is found (p-value < α).

In equivalence studies, however, the null hypothesis is that the treatments are not equivalent (that there is a significant difference). To refute this null hypothesis, the study must show that the differences between treatments are small enough to fall within a predefined equivalence range. Statistical significance is also tested here, but a different confidence interval is used. The results must show that the confidence interval of the difference lies entirely within the equivalence region to achieve statistical significance in terms of equivalence.

So in both cases statistical significance is used, but with different goals and interpretations.

4.2 Confidence intervals

While when testing for significant differences, confidence intervals are used to show the uncertainty of the estimate, in equivalence studies, confidence intervals are used to check whether they lie within the established equivalence limits. If the entire confidence interval lies within these limits, equivalence can be assumed.

These differences in methodology make it clear that the mere absence of a statistically significant difference is not sufficient to demonstrate equivalence. There are other factors that must be taken into account to ensure correct interpretation of the study results.

4.3 Lack of power of the study

A study with a small sample size or insufficient power may miss true differences. The lack of a significant difference may therefore simply be due to the study not being sufficiently powered to detect this difference. This is where sample size planning comes into play: careful sample size planning is crucial to ensure the power of the study. The power of a study describes the probability that the study will detect a real effect if it actually exists. Without appropriate sample size planning, there is a risk that a study will not be able to detect significant differences, even if they exist, due to too few participants.

4.4 Confidence intervals and uncertainty of the estimate

A non-significant difference can be associated with wide confidence intervals, which can indicate both clinically important differences and no differences. This shows the uncertainty of the estimate and does not suggest equivalence.

4.5 False null hypothesis

The null hypothesis in most studies is that there is no difference. Failure to reject this null hypothesis does not mean that it has been proven that there is no difference, just that there is not enough evidence to claim the opposite.

5. Examples of problems in clinical trials of medical devices

5.1 Comparison of two implants

In a study evaluating a new hip implant compared to an established product, a p-value of 0.06 was found. Although the difference is not statistically significant, the new implant could still be less effective or safe. A wide confidence interval could range from large superiority to significant inferiority.

5.2 Evaluation of a new diagnostic device

A new diagnostic device is tested against a standard device and the results show a p-value of 0.09. This doesn't mean that both devices are equally good, just that the study didn't find enough evidence to determine a difference. The study may not be large enough to detect small but clinically relevant differences.

6. How should equivalence be checked?

6.1 Equivalence and non-inferiority studies

To test equivalence, specific study designs such as equivalence or non-inferiority studies must be used. These studies have specific hypotheses and statistical methods to show that the differences between treatments are within a predefined tolerance limit.

Example:

An equivalence study could define that the new implant is clinically equivalent if the difference in functionality is within a range of ± 2% compared to the standard implant.

6.2 Confidence intervals and equivalence limits

Instead of just looking at p-values, confidence intervals should also be considered. If the entire confidence interval lies within the predefined equivalence limits, equivalence can be assumed.

7. Practical steps to avoid misunderstandings

Clear study design:

The study should clearly define whether it aims to find differences (superiority study) or to prove equivalence or non-inferiority. This influences the choice of statistical methods and the interpretation of the results.

Adequate sample size:

A sufficient sample size is crucial to ensure the power of the study. This helps detect real differences and avoid false negatives.

Predefined equivalence limits:

Before starting the study, clear equivalence limits should be established based on clinical considerations. This helps to better assess the clinical relevance of the results.

8. Conclusion

The absence of a statistically significant difference in clinical trials does not automatically mean that the medical devices tested are equivalent. Specific study designs and statistical methods are required to demonstrate equivalence. Careful planning and interpretation of study results are crucial to assess the true effectiveness and safety of medical devices. This is the only way we can ensure that new products meet the high standards of clinical practice and offer real benefits for patients.

9. How we can help you

Our statisticians accompany you from data collection through analysis to interpretation of the results. Be safe.

As CRO, we support you throughout the entire process of generating and evaluating clinical data and in the approval and market monitoring of your product. And we start with the clinical strategy! We also create the complete clinical evaluation file for you.

In the case of clinical trials, we consider together with you whether and, if so, which clinical trial needs to be carried out, under what conditions and in accordance with what requirements. We clarify this as part of the pre-study phase: In 3 steps, we determine the correct and cost-effective strategy with regard to the clinical data collection required in your case.

If a clinical trial is to be carried out, basic safety and performance requirements must first be met. The data from the clinical trial then flow into the clinical evaluation, which in turn forms the basis for post-market clinical follow-up (PMCF) activities (including a PMCF study if necessary).

In addition, all medical device manufacturers require a quality management system (QMS), including when developing Class I products.

We support you throughout your entire project with your medical device, starting with a free initial consultation, help with the introduction of a QM system, study planning and implementation through to technical documentation - always with primary reference to the clinical data on the product: from the beginning to the end End.

Do you already have some initial questions?

You can get a free initial consultation here: free initial consultation

Blog

Statistical Significance vs. Equivalence: What Clinical Investigations Really Show

Main menu