Articles on research methodology

Evaluation of a geriatrics primary care model using prospective matching to guide enrollment

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01360-4

Abstract

Background

Few definitive guidelines exist for rigorous large-scale prospective evaluation of nonrandomized programs and policies that require longitudinal primary data collection. In Veterans Affairs (VA) we identified a need to understand the impact of a geriatrics primary care model (referred to as GeriPACT); however, randomization of patients to GeriPACT vs. a traditional PACT was not feasible because GeriPACT has been rolled out nationally, and the decision to transition from PACT to GeriPACT is made jointly by a patient and provider. We describe our study design used to evaluate the comparative effectiveness of GeriPACT compared to a traditional primary care model (referred to as PACT) on patient experience and quality of care metrics.

Methods

We used prospective matching to guide enrollment of GeriPACT-PACT patient dyads across 57 VA Medical Centers. First, we identified matches based an array of administratively derived characteristics using a combination of coarsened exact and distance function matching on 11 identified key variables that may function as confounders. Once a GeriPACT patient was enrolled, matched PACT patients were then contacted for recruitment using pre-assigned priority categories based on the distance function; if eligible and consented, patients were enrolled and followed with telephone surveys for 18 months.

Results

We successfully enrolled 275 matched dyads in near real-time, with a median time of 7 days between enrolling a GeriPACT patient and a closely matched PACT patient. Standardized mean differences of < 0.2 among nearly all baseline variables indicates excellent baseline covariate balance. Exceptional balance on survey-collected baseline covariates not available at the time of matching suggests our procedure successfully controlled many known, but administratively unobserved, drivers of entrance to GeriPACT.

Conclusions

We present an important process to prospectively evaluate the effects of different treatments when randomization is infeasible and provide guidance to researchers who may be interested in implementing a similar approach. Rich matching variables from the pre-treatment period that reflect treatment assignment mechanisms create a high quality comparison group from which to recruit. This design harnesses the power of national administrative data coupled with collection of patient reported outcomes, enabling rigorous evaluation of non-randomized programs or policies.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01360-4

Sampling strategies to evaluate the prognostic value of a new biomarker on a time-to-event end-point

Abstract

Background

The availability of large epidemiological or clinical data storing biological samples allow to study the prognostic value of novel biomarkers, but efficient designs are needed to select a subsample on which to measure them, for parsimony and economical reasons. Two-phase stratified sampling is a flexible approach to perform such sub-sampling, but literature on stratification variables to be used in the sampling and power evaluation is lacking especially for survival data.

Methods

We compared the performance of different sampling designs to assess the prognostic value of a new biomarker on a time-to-event endpoint, applying a Cox model weighted by the inverse of the empirical inclusion probability.

Results

Our simulation results suggest that case-control stratified (or post stratified) by a surrogate variable of the marker can yield higher performances than simple random, probability proportional to size, and case-control sampling. In the presence of high censoring rate, results showed an advantage of nested case-control and counter-matching designs in term of design effect, although the use of a fixed ratio between cases and controls might be disadvantageous. On real data on childhood acute lymphoblastic leukemia, we found that optimal sampling using pilot data is greatly efficient.

Conclusions

Our study suggests that, in our sample, case-control stratified by surrogate and nested case-control yield estimates and power comparable to estimates obtained in the full cohort while strongly decreasing the number of patients required. We recommend to plan the sample size and using sampling designs for exploration of novel biomarker in clinical cohort data.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01283-0

Evaluating complex interventions in context: systematic, meta-narrative review of case study approaches

Abstract

Background

There is a growing need for methods that acknowledge and successfully capture the dynamic interaction between context and implementation of complex interventions. Case study research has the potential to provide such understanding, enabling in-depth investigation of the particularities of phenomena. However, there is limited guidance on how and when to best use different case study research approaches when evaluating complex interventions. This study aimed to review and synthesise the literature on case study research across relevant disciplines, and determine relevance to the study of contextual influences on complex interventions in health systems and public health research.

Methods

Systematic meta-narrative review of the literature comprising (i) a scoping review of seminal texts (n = 60) on case study methodology and on context, complexity and interventions, (ii) detailed review of empirical literature on case study, context and complex interventions (n = 71), and (iii) identifying and reviewing ‘hybrid papers’ (n = 8) focused on the merits and challenges of case study in the evaluation of complex interventions.

Results

We identified four broad (and to some extent overlapping) research traditions, all using case study in a slightly different way and with different goals: 1) developing and testing complex interventions in healthcare; 2) analysing change in organisations; 3) undertaking realist evaluations; 4) studying complex change naturalistically. Each tradition conceptualised context differently—respectively as the backdrop to, or factors impacting on, the intervention; sets of interacting conditions and relationships; circumstances triggering intervention mechanisms; and socially structured practices. Overall, these traditions drew on a small number of case study methodologists and disciplines. Few studies problematised the nature and boundaries of ‘the case’ and ‘context’ or considered the implications of such conceptualisations for methods and knowledge production.

Conclusions

Case study research on complex interventions in healthcare draws on a number of different research traditions, each with different epistemological and methodological preferences. The approach used and consequences for knowledge produced often remains implicit. This has implications for how researchers, practitioners and decision makers understand, implement and evaluate complex interventions in different settings. Deeper engagement with case study research as a methodology is strongly recommended.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01418-3

Pre-statistical harmonization of behavrioal instruments across eight surveys and trials

Abstract

Background

Data harmonization is a powerful method to equilibrate items in measures that evaluate the same underlying construct. There are multiple measures to evaluate dementia related behavioral symptoms. Pre-statistical harmonization of behavioral instruments in dementia research is the first step to develop a statistical crosswalk between measures. Studies that conduct pre-statistical harmonization of behavioral instruments rarely document their methods in a structured, reproducible manner. This is a crucial step which entails careful review, documentation and scrutiny of source data to ensure sufficient comparability between items prior to data pooling. Here, we document the pre-statistical harmonization of items measuring behavioral and psychological symptoms among people with dementia. We provide a box of recommended procedure for future studies.

Methods

We identified behavioral instruments that are used in clinical practice, a national survey, and randomized trials of dementia care interventions. We rigorously reviewed question content and scoring procedures to establish sufficient comparability across items as well as item quality prior to data pooling. Additionally, we standardized coding to Stata-readable format, which allowed us to automate approaches to identify potential cross-study differences in items and low-quality items. To ensure reasonable model fit for statistical co-calibration, we estimated two-parameter logistic Item Response Theory models within each of the eight studies.

Results

We identified 59 items from 11 behavioral instruments across the eight datasets. We found considerable cross-study heterogeneity in administration and coding procedures for items that measure the same attribute. Discrepancies existed in terms of directionality and quantification of behavioral symptoms for even seemingly comparable items. We resolved item response heterogeneity, missingness and skewness, conditional dependency prior to estimation of item response theory models for statistical co-calibration. We used several rigorous data transformation procedures to address these issues, including re-coding and truncation.

Conclusions

This study highlights the importance of each aspect involved in the pre-statistical harmonization process of behavioral instruments. We provide guidelines and recommendations for how future research may detect and account for similar issues in pooling behavioral and related instruments.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01431-6

Mediation analysis methods used in observational research: a scoping review and recommendations

Abstract

Background

Mediation analysis methodology underwent many advancements throughout the years, with the most recent and important advancement being the development of causal mediation analysis based on the counterfactual framework. However, a previous review showed that for experimental studies the uptake of causal mediation analysis remains low. The aim of this paper is to review the methodological characteristics of mediation analyses performed in observational epidemiologic studies published between 2015 and 2019 and to provide recommendations for the application of mediation analysis in future studies.

Methods

We searched the MEDLINE and EMBASE databases for observational epidemiologic studies published between 2015 and 2019 in which mediation analysis was applied as one of the primary analysis methods. Information was extracted on the characteristics of the mediation model and the applied mediation analysis method.

Results

We included 174 studies, most of which applied traditional mediation analysis methods (n = 123, 70.7%). Causal mediation analysis was not often used to analyze more complicated mediation models, such as multiple mediator models. Most studies adjusted their analyses for measured confounders, but did not perform sensitivity analyses for unmeasured confounders and did not assess the presence of an exposure-mediator interaction.

Conclusions

To ensure a causal interpretation of the effect estimates in the mediation model, we recommend that researchers use causal mediation analysis and assess the plausibility of the causal assumptions. The uptake of causal mediation analysis can be enhanced through tutorial papers that demonstrate the application of causal mediation analysis, and through the development of software packages that facilitate the causal mediation analysis of relatively complicated mediation models.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01426-3

Major adverse cardiovascular event definitions used in observational analysis of administrative databases: a systematic review

Abstract

Background

Major adverse cardiovascular events (MACE) are increasingly used as composite outcomes in randomized controlled trials (RCTs) and observational studies. However, it is unclear how observational studies most commonly define MACE in the literature when using administrative data.

Methods

We identified peer-reviewed articles published in MEDLINE and EMBASE between January 1, 2010 to October 9, 2020. Studies utilizing administrative data to assess the MACE composite outcome using International Classification of Diseases 9th or 10th Revision diagnosis codes were included. Reviews, abstracts, and studies not providing outcome code definitions were excluded. Data extracted included data source, timeframe, MACE components, code definitions, code positions, and outcome validation.

Results

A total of 920 articles were screened, 412 were retained for full-text review, and 58 were included. Only 8.6% (n = 5/58) matched the traditional three-point MACE RCT definition of acute myocardial infarction (AMI), stroke, or cardiovascular death. None matched four-point (+unstable angina) or five-point MACE (+unstable angina and heart failure). The most common MACE components were: AMI and stroke, 15.5% (n = 9/58); AMI, stroke, and all-cause death, 13.8% (n = 8/58); and AMI, stroke and cardiovascular death 8.6% (n = 5/58). Further, 67% (n = 39/58) did not validate outcomes or cite validation studies. Additionally, 70.7% (n = 41/58) did not report code positions of endpoints, 20.7% (n = 12/58) used the primary position, and 8.6% (n = 5/58) used any position.

Conclusions

Components of MACE endpoints and diagnostic codes used varied widely across observational studies. Variability in the MACE definitions used and information reported across observational studies prohibit the comparison, replication, and aggregation of findings. Studies should transparently report the administrative codes used and code positions, as well as utilize validated outcome definitions when possible.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01440-5

Advancing data science in drug development through an innovative computational framework for data sharing and statistical analysis

Abstract

Background

Novartis and the University of Oxford’s Big Data Institute (BDI) have established a research alliance with the aim to improve health care and drug development by making it more efficient and targeted. Using a combination of the latest statistical machine learning technology with an innovative IT platform developed to manage large volumes of anonymised data from numerous data sources and types we plan to identify novel patterns with clinical relevance which cannot be detected by humans alone to identify phenotypes and early predictors of patient disease activity and progression.

Method

The collaboration focuses on highly complex autoimmune diseases and develops a computational framework to assemble a research-ready dataset across numerous modalities. For the Multiple Sclerosis (MS) project, the collaboration has anonymised and integrated phase II to phase IV clinical and imaging trial data from ≈35,000 patients across all clinical phenotypes and collected in more than 2200 centres worldwide. For the “IL-17” project, the collaboration has anonymised and integrated clinical and imaging data from over 30 phase II and III Cosentyx clinical trials including more than 15,000 patients, suffering from four autoimmune disorders (Psoriasis, Axial Spondyloarthritis, Psoriatic arthritis (PsA) and Rheumatoid arthritis (RA)).

Results

A fundamental component of successful data analysis and the collaborative development of novel machine learning methods on these rich data sets has been the construction of a research informatics framework that can capture the data at regular intervals where images could be anonymised and integrated with the de-identified clinical data, quality controlled and compiled into a research-ready relational database which would then be available to multi-disciplinary analysts. The collaborative development from a group of software developers, data wranglers, statisticians, clinicians, and domain scientists across both organisations has been key. This framework is innovative, as it facilitates collaborative data management and makes a complicated clinical trial data set from a pharmaceutical company available to academic researchers who become associated with the project.

Conclusions

An informatics framework has been developed to capture clinical trial data into a pipeline of anonymisation, quality control, data exploration, and subsequent integration into a database. Establishing this framework has been integral to the development of analytical tools.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01409-4

Impact of the COVID-19 pandemic on publication dynamics and non-COVID-19 research production

Abstract

Background

The COVID-19 pandemic has severely affected health systems and medical research worldwide but its impact on the global publication dynamics and non-COVID-19 research has not been measured. We hypothesized that the COVID-19 pandemic may have impacted the scientific production of non-COVID-19 research.

Methods

We conducted a comprehensive meta-research on studies (original articles, research letters and case reports) published between 01/01/2019 and 01/01/2021 in 10 high-impact medical and infectious disease journals (New England Journal of Medicine, Lancet, Journal of the American Medical Association, Nature Medicine, British Medical Journal, Annals of Internal Medicine, Lancet Global Health, Lancet Public Health, Lancet Infectious Disease and Clinical Infectious Disease). For each publication, we recorded publication date, publication type, number of authors, whether the publication was related to COVID-19, whether the publication was based on a case series, and the number of patients included in the study if the publication was based on a case report or a case series. We estimated the publication dynamics with a locally estimated scatterplot smoothing method. A Natural Language Processing algorithm was designed to calculate the number of authors for each publication. We simulated the number of non-COVID-19 studies that could have been published during the pandemic by extrapolating the publication dynamics of 2019 to 2020, and comparing the expected number to the observed number of studies.

Results

Among the 22,525 studies assessed, 6319 met the inclusion criteria, of which 1022 (16.2%) were related to COVID-19 research. A dramatic increase in the number of publications in general journals was observed from February to April 2020 from a weekly median number of publications of 4.0 (IQR: 2.8–5.5) to 19.5 (IQR: 15.8–24.8) (p < 0.001), followed afterwards by a pattern of stability with a weekly median number of publications of 10.0 (IQR: 6.0–14.0) until December 2020 (p = 0.045 in comparison with April). Two prototypical editorial strategies were found: 1) journals that maintained the volume of non-COVID-19 publications while integrating COVID-19 research and thus increased their overall scientific production, and 2) journals that decreased the volume of non-COVID-19 publications while integrating COVID-19 publications. We estimated using simulation models that the COVID pandemic was associated with a 18% decrease in the production of non-COVID-19 research. We also found a significant change of the publication type in COVID-19 research as compared with non-COVID-19 research illustrated by a decrease in the number of original articles, (47.9% in COVID-19 publications vs 71.3% in non-COVID-19 publications, p < 0.001). Last, COVID-19 publications showed a higher number of authors, especially for case reports with a median of 9.0 authors (IQR: 6.0–13.0) in COVID-19 publications, compared to a median of 4.0 authors (IQR: 3.0–6.0) in non-COVID-19 publications (p < 0.001).

Conclusion

In this meta-research gathering publications from high-impact medical journals, we have shown that the dramatic rise in COVID-19 publications was accompanied by a substantial decrease of non-COVID-19 research.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01404-9

Modular literature review: a novel systematic search and review method to support priority setting in health policy and practice

Abstract

Background

There is an unmet need for review methods to support priority-setting, policy-making and strategic planning when a wide variety of interventions from differing disciplines may have the potential to impact a health outcome of interest. This article describes a Modular Literature Review, a novel systematic search and review method that employs systematic search strategies together with a hierarchy-based appraisal and synthesis of the resulting evidence.

Methods

We designed the Modular Review to examine the effects of 43 interventions on a health problem of global significance. Using the PICOS (Population, Intervention, Comparison, Outcome, Study design) framework, we developed a single four-module search template in which population, comparison and outcome modules were the same for each search and the intervention module was different for each of the 43 interventions. A series of literature searches were performed in five databases, followed by screening, extraction and analysis of data. “ES documents”, source documents for effect size (ES) estimates, were systematically identified based on a hierarchy of evidence. The evidence was categorised according to the likely effect on the outcome and presented in a standardised format with quantitative effect estimates, meta-analyses and narrative reporting. We compared the Modular Review to other review methods in health research for its strengths and limitations.

Results

The Modular Review method was used to review the impact of 46 antenatal interventions on four specified birth outcomes within 12 months. A total of 61,279 records were found; 35,244 were screened by title-abstract. Six thousand two hundred seventy-two full articles were reviewed against the inclusion criteria resulting in 365 eligible articles.

Conclusions

The Modular Review preserves principles that have traditionally been important to systematic reviews but can address multiple research questions simultaneously. The result is an accessible, reliable answer to the question of “what works?”. Thus, it is a well-suited literature review method to support prioritisation, decisions and planning to implement an agenda for health improvement.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01463-y

Feasibility of a hybrid clinical trial for respiratory virus detection in toddlers during the influenza season

Abstract

Background

Traditional clinical trials are conducted at investigator sites. Participants must visit healthcare facilities several times for the trial procedures. Decentralized clinical trials offer an interesting alternative. They use telemedicine and other technological solutions (apps, monitoring devices or web platforms) to decrease the number of visits to study sites, minimise the impact on daily routine, and decrease geographical barriers for participants. Not much information is available on the use of decentralization in randomized clinical trials with vaccines.

Methods

A hybrid clinical trial may be assisted by parental recording of symptoms using electronic log diaries in combination with home collected nasal swabs. During two influenza seasons, children aged 12 to 35 months with a history of recurrent acute respiratory infections were recruited in 12 primary health centers of the Valencia Region in Spain. Parents completed a symptom diary through an ad hoc mobile app that subsequently assessed whether it was an acute respiratory infection and requested collection of a nasal swab. Feasibility was measured using the percentage of returned electronic diaries and the validity of nasal swabs collected during the influenza season. Respiratory viruses were detected by real-time PCR.

Results

Ninety-nine toddlers were enrolled. Parents completed 10,476 electronic diaries out of the 10,804 requested (97%). The mobile app detected 188 potential acute respiratory infections (ARIs) and requested a nasal swab. In 173 (92%) ARI episodes a swab was taken. 165 (95.4%) of these swabs were collected at home and 144 (87.3%) of them were considered valid for laboratory testing. Overall, 152 (81%) of the ARIs detected in the study had its corresponding valid sample collected.

Conclusions

Hybrid procedures used in this clinical trial with the influenza vaccine in toddlers were considered adequate, as we diagnosed most of the ARI cases on time, and had a valid swab in 81% of the cases. Hybrid clinical trials improve participant adherence to the study procedures and could improve recruitment and quality of life of the participants and the research team by decreasing the number of visits to the investigator site.

This report emphasises that the conduct of hybrid CTs is a valid alternative to traditional CTs with vaccines. This hybrid CT achieved high adherence of participant to the study procedures.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01474-9

Impact of vaccine prioritization strategies on mitigating COVID-19: an agent-based simulation study using an urban region in the United States

Abstract

Background

Approval of novel vaccines for COVID-19 had brought hope and expectations, but not without additional challenges. One central challenge was understanding how to appropriately prioritize the use of limited supply of vaccines. This study examined the efficacy of the various vaccine prioritization strategies using the vaccination campaign underway in the U.S.

Methods

The study developed a granular agent-based simulation model for mimicking community spread of COVID-19 under various social interventions including full and partial closures, isolation and quarantine, use of face mask and contact tracing, and vaccination. The model was populated with parameters of disease natural history, as well as demographic and societal data for an urban community in the U.S. with 2.8 million residents. The model tracks daily numbers of infected, hospitalized, and deaths for all census age-groups. The model was calibrated using parameters for viral transmission and level of community circulation of individuals. Published data from the Florida COVID-19 dashboard was used to validate the model. Vaccination strategies were compared using a hypothesis test for pairwise comparisons.

Results

Three prioritization strategies were examined: a minor variant of CDC’s recommendation, an age-stratified strategy, and a random strategy. The impact of vaccination was also contrasted with a no vaccination scenario. The study showed that the campaign against COVID-19 in the U.S. using vaccines developed by Pfizer/BioNTech and Moderna 1) reduced the cumulative number of infections by 10% and 2) helped the pandemic to subside below a small threshold of 100 daily new reported cases sooner by approximately a month when compared to no vaccination. A comparison of the prioritization strategies showed no significant difference in their impacts on pandemic mitigation.

Conclusions

The vaccines for COVID-19 were developed and approved much quicker than ever before. However, as per our model, the impact of vaccination on reducing cumulative infections was found to be limited (10%, as noted above). This limited impact is due to the explosive growth of infections that occurred prior to the start of vaccination, which significantly reduced the susceptible pool of the population for whom infection could be prevented. Hence, vaccination had a limited opportunity to reduce the cumulative number of infections. Another notable observation from our study is that instead of adhering strictly to a sequential prioritizing strategy, focus should perhaps be on distributing the vaccines among all eligible as quickly as possible, after providing for the most vulnerable. As much of the population worldwide is yet to be vaccinated, results from this study should aid public health decision makers in effectively allocating their limited vaccine supplies.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01458-9

Impact of the COVID-19 pandemic on publication dynamics and non-COVID-19 research production

Abstract

Background

Methods

Results

Conclusion

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01404-9

Guidance for using artificial intelligence for title and abstract screening while conducting knowledge syntheses

Abstract

Background

Systematic reviews are the cornerstone of evidence-based medicine. However, systematic reviews are time consuming and there is growing demand to produce evidence more quickly, while maintaining robust methods. In recent years, artificial intelligence and active-machine learning (AML) have been implemented into several SR software applications. As some of the barriers to adoption of new technologies are the challenges in set-up and how best to use these technologies, we have provided different situations and considerations for knowledge synthesis teams to consider when using artificial intelligence and AML for title and abstract screening.

Methods

We retrospectively evaluated the implementation and performance of AML across a set of ten historically completed systematic reviews. Based upon the findings from this work and in consideration of the barriers we have encountered and navigated during the past 24 months in using these tools prospectively in our research, we discussed and developed a series of practical recommendations for research teams to consider in seeking to implement AML tools for citation screening into their workflow.

Results

We developed a seven-step framework and provide guidance for when and how to integrate artificial intelligence and AML into the title and abstract screening process. Steps include: (1) Consulting with Knowledge user/Expert Panel; (2) Developing the search strategy; (3) Preparing your review team; (4) Preparing your database; (5) Building the initial training set; (6) Ongoing screening; and (7) Truncating screening. During Step 6 and/or 7, you may also choose to optimize your team, by shifting some members to other review stages (e.g., full-text screening, data extraction).

Conclusion

Artificial intelligence and, more specifically, AML are well-developed tools for title and abstract screening and can be integrated into the screening process in several ways. Regardless of the method chosen, transparent reporting of these methods is critical for future studies evaluating artificial intelligence and AML.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01451-2

Effectiveness of exercise interventions on mental health and health-related quality of life in women with polycystic ovary syndrome: a systematic review

Abstract

Background

Polycystic ovary syndrome (PCOS) is a complex condition, impacting cardio-metabolic and reproductive health, mental health and health-related quality of life. The physical health benefits of exercise for women with PCOS are well-established and exercise is increasingly being recognised as efficacious for improving psychological wellbeing. The aim of this review was to summarise the evidence regarding the effectiveness of exercise interventions on mental health outcomes in women with PCOS.

Methods

A systematic search of electronic databases was conducted in March of 2020. Trials that evaluated the effect of an exercise intervention on mental health or health-related quality of life outcomes in reproductive aged women with diagnosed PCOS were included. Methodological quality was assessed using the modified Downs and Black checklist. Primary outcomes included symptoms of depression and anxiety, and health-related quality of life.

Results

Fifteen articles from 11 trials were identified and deemed eligible for inclusion. Exercise demonstrated positive improvements in health-related quality of life in all of the included studies. Half of included studies also reported significant improvements in depression and anxiety symptoms. There was large variation in methodological quality of included studies and in the interventions utilised.

Conclusions

The available evidence indicates that exercise is effective for improving health-related quality of life and PCOS symptom distress. Exercise also shows some efficacy for improving symptoms and/or prevalence of depression and anxiety in women with PCOS. However, due to large heterogeneity of included studies, conclusions could not be made regarding the impact of exercise intervention characteristics. High-quality trials with well reported exercise intervention characteristics and outcomes are required in order to determine effective exercise protocols for women with PCOS and facilitate translation into practice.

https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-021-12280-9

Economic burden of varicella in Europe in the absence of universal varicella vaccination

Abstract

Background

Though the disease burden of varicella in Europe has been reported previously, the economic burden is still unknown. This study estimated the economic burden of varicella in Europe in the absence of Universal Varicella Vaccination (UVV) in 2018 Euros from both payer (direct costs) and societal (direct and indirect costs) perspectives.

Methods

We estimated the country specific and overall annual costs of varicella in absence of UVV in 31 European countries (27 EU countries, plus Iceland, Norway, Switzerland and the United Kingdom). To obtain country specific unit costs and associated healthcare utilization, we conducted a systematic literature review, searching in PubMed, EMBASE, NEED, DARE, REPEC, Open Grey, and public heath websites (1/1/1999–10/15/2019). The number of annual varicella cases, deaths, outpatient visits and hospitalizations were calculated (without UVV) based on age-specific incidence rates (Riera-Montes et al. 2017) and 2018 population data by country. Unit cost per varicella case and disease burden data were combined using stochastic modeling to estimate 2018 costs stratified by country, age and healthcare resource.

Results

Overall annual total costs associated with varicella were estimated to be €662,592,061 (Range: €309,552,363 to €1,015,631,760) in Europe in absence of UVV. Direct and indirect costs were estimated at €229,076,206 (Range €144,809,557 to €313,342,856) and €433,515,855 (Range €164,742,806 to €702,288,904), respectively. Total cost per case was €121.45 (direct: €41.99; indirect: €79.46). Almost half of the costs were attributed to cases in children under 5 years, owing mainly to caregiver work loss. The distribution of costs by healthcare resource was similar across countries. France and Germany accounted for 49.28% of total annual costs, most likely due to a combination of high numbers of cases and unit costs in these countries.

Conclusions

The economic burden of varicella across Europe in the absence of UVV is substantial (over 600 M€), primarily driven by caregiver burden including work productivity losses.

https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-021-12343-x

History of drinking problems diminishes the protective effects of within-guideline drinking on 18-year risk of dementia and CIND

Abstract

Objective

To examine the moderating effect of older adults’ history of drinking problems on the relationship between their baseline alcohol consumption and risk of dementia and cognitive impairment, no dementia (CIND) 18 years later.

Method

A longitudinal Health and Retirement Study cohort (n = 4421) was analyzed to demonstrate how older adults’ baseline membership in one of six drinking categories (non-drinker, within-guideline drinker, and outside-guideline drinker groups, divided to reflect absence or presence of a history of drinking problems) predicts dementia and CIND 18 years later.

Results

Among participants with no history of drinking problems, 13% of non-drinkers, 5% of within-guideline drinkers, and 9% of outside-guideline drinkers were classified as having dementia 18-years later. Among those with a history of drinking problems, 14% of non-drinkers, 9% of within-guideline drinkers, and 7% of outside-guideline drinkers were classified with dementia. With Non-Drinker, No HDP as reference category, being a baseline within-guideline drinker with no history of drinking problems reduced the likelihood of dementia 18 years later by 45%, independent of baseline demographic and health characteristics; being a baseline within-guideline drinker with a history of drinking problems reduced the likelihood by only 13% (n.s.). Similar patterns obtained for the prediction of CIND.

Conclusions

For older adults, consuming alcohol at levels within validated guidelines for low-risk drinking may offer moderate long-term protection from dementia and CIND, but this effect is diminished by having a history of drinking problems. Efforts to predict and prevent dementia and CIND should focus on older adults’ history of drinking problems in addition to how much alcohol they consume.

https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-021-12358-4

Reporting methodological issues of the mendelian randomization studies in health and medical research: a systematic review

Abstract

Background

Mendelian randomization (MR) studies using Genetic risk scores (GRS) as an instrumental variable (IV) have increasingly been used to control for unmeasured confounding in observational healthcare databases. However, proper reporting of methodological issues is sparse in these studies. We aimed to review published papers related to MR studies and identify reporting problems.

Methods

We conducted a systematic review using the clinical articles published between 2009 and 2019. We searched PubMed, Scopus, and Embase databases. We retrieved information from every MR study, including the tests performed to evaluate assumptions and the modelling approach used for estimation. Using our inclusion/exclusion criteria, finally, we identified 97 studies to conduct the review according to the PRISMA statement.

Results

Only 66 (68%) of the studies empirically verified the first assumption (Relevance assumption), and 40 (41.2%) studies reported the appropriate tests (e.g., R2, F-test) to investigate the association. A total of 35.1% clearly stated and discussed theoretical justifications for the second and third assumptions. 30.9% of the studies used a two-stage least square, and 11.3% used the Wald estimator method for estimating IV. Also, 44.3% of the studies conducted a sensitivity analysis to illuminate the robustness of estimates for violations of the untestable assumptions.

Conclusions

We found that incompleteness of the justification of the assumptions for the instrumental variable in MR studies was a common problem in our selected studies. This may misdirect the findings of the studies.

Comparisons of statistical distributions for cluster sizes in a developing pandemic

Abstract

Background

We consider cluster size data of SARS-CoV-2 transmissions for a number of different settings from recently published data. The statistical characteristics of superspreading events are commonly described by fitting a negative binomial distribution to secondary infection and cluster size data as an alternative to the Poisson distribution as it is a longer tailed distribution, with emphasis given to the value of the extra parameter which allows the variance to be greater than the mean. Here we investigate whether other long tailed distributions from more general extended Poisson process modelling can better describe the distribution of cluster sizes for SARS-CoV-2 transmissions.

Methods

We use the extended Poisson process modelling (EPPM) approach with nested sets of models that include the Poisson and negative binomial distributions to assess the adequacy of models based on these standard distributions for the data considered.

Results

We confirm the inadequacy of the Poisson distribution in most cases, and demonstrate the inadequacy of the negative binomial distribution in some cases.

Conclusions

The probability of a superspreading event may be underestimated by use of the negative binomial distribution as much larger tail probabilities are indicated by EPPM distributions than negative binomial alternatives. We show that the large shared accommodation, meal and work settings, of the settings considered, have the potential for more severe superspreading events than would be predicted by a negative binomial distribution. Therefore public health efforts to prevent transmission in such settings should be prioritised.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01517-9

Statistical methods for evaluating the fine needle aspiration cytology procedure in breast cancer diagnosis

Abstract

Background

Statistical issues present while evaluating a diagnostic procedure for breast cancer are non rare but often ignored, leading to biased results. We aimed to evaluate the diagnostic accuracy of the fine needle aspiration cytology(FNAC), a minimally invasive and rapid technique potentially used as a rule-in or rule-out test, handling its statistical issues: suspect test results and verification bias.

Methods

We applied different statistical methods to handle suspect results by defining conditional estimates. When considering a partial verification bias, Begg and Greenes method and multivariate imputation by chained equations were applied, however, and a Bayesian approach with respect to each gold standard was used when considering a differential verification bias. At last, we extended the Begg and Greenes method to be applied conditionally on the suspect results.

Results

The specificity of the FNAC test above 94%, was always higher than its sensitivity regardless of the proposed method. All positive likelihood ratios were higher than 10, with variations among methods. The positive and negative yields were high, defining precise discriminating properties of the test.

Conclusion

The FNAC test is more likely to be used as a rule-in test for diagnosing breast cancer. Our results contributed in advancing our knowledge regarding the performance of FNAC test and the methods to be applied for its evaluation.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01506-y

Assessing transferability in systematic reviews of health economic evaluations – a review of methodological guidance

Abstract

Objective

For assessing cost-effectiveness, Health Technology Assessment (HTA) organisations may use primary economic evaluations (P-HEs) or Systematic Reviews of Health Economic evaluations (SR-HEs). A prerequisite for meaningful results of SR-HEs is that the results from existing P-HEs are transferable to the decision context (e.g, HTA jurisdiction). A particularly pertinent issue is the high variability of costs and resource needs across jurisdictions. Our objective was to review the methods documents of HTA organisations and compare their recommendations on considering transferability in SR-HE.

Methods

We systematically hand searched the webpages of 158 HTA organisations for relevant methods documents from 8th January to 31st March 2019. Two independent reviewers performed searches and selected documents according to pre-defined criteria. One reviewer extracted data in standardised and piloted tables and a second reviewer checked them for accuracy. We synthesised data using tabulations and in a narrative way.

Results

We identified 155 potentially relevant documents from 63 HTA organisations. Of these, 7 were included in the synthesis. The included organisations have different aims when preparing a SR-HE (e.g. to determine the need for conducting their own P-HE). The recommendations vary regarding the underlying terminology (e.g. transferability/generalisability), the assessment approaches (e.g. structure), the assessment criteria and the integration in the review process.

Conclusion

Only few HTA organisations address the assessment of transferability in their methodological recommendations for SR-HEs. Transferability considerations are related to different purposes. The assessment concepts and criteria are heterogeneous. Developing standards to consider transferability in SR-HEs is desirable.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01536-6

Locating and testing the healthy context paradox: examples from the INCLUSIVE trial

Abstract

Background

The healthy context paradox, originally described with respect to school-level bullying interventions, refers to the generation of differences in mental wellbeing amongst those who continue to experience bullying even after interventions successfully reduce victimisation. Using data from the INCLUSIVE trial of restorative practice in schools, we relate this paradox to the need to theorise potential harms when developing interventions; formulate the healthy context paradox in a more general form defined by mediational relationships and cluster-level interventions; and propose two statistical models for testing the healthy context paradox informed by multilevel mediation methods, with relevance to structural and individual explanations for this paradox.

Methods

We estimated two multilevel mediation models with bullying victimisation as the mediator and mental wellbeing as the outcome: one with a school-level interaction between intervention assignment and the mediator; and one with a random slope component for the student-level mediator-outcome relationship predicted by school-level assignment. We relate each of these models to contextual or individual-level explanations for the healthy context paradox.

Results

Neither model suggested that the INCLUSIVE trial represented an example of the healthy context paradox. However, each model has different interpretations which relate to a multilevel understanding of the healthy context paradox.

Conclusions

Greater exploration of intervention harms, especially when those accrue to population subgroups, is an essential step in better understanding how interventions work and for whom. Our proposed tests for the presence of a healthy context paradox provide the analytic tools to better understand how to support development and implementation of interventions that work for all groups in a population.

Detecting the patient’s need for help with machine learning based on expressions

Abstract

Background

Developing machine learning models to support health analytics requires increased understanding about statistical properties of self-rated expression statements used in health-related communication and decision making. To address this, our current research analyzes self-rated expression statements concerning the coronavirus COVID-19 epidemic and with a new methodology identifies how statistically significant differences between groups of respondents can be linked to machine learning results.

Methods

A quantitative cross-sectional study gathering the “need for help” ratings for twenty health-related expression statements concerning the coronavirus epidemic on an 11-point Likert scale, and nine answers about the person’s health and wellbeing, sex and age. The study involved online respondents between 30 May and 3 August 2020 recruited from Finnish patient and disabled people’s organizations, other health-related organizations and professionals, and educational institutions (n = 673). We propose and experimentally motivate a new methodology of influence analysis concerning machine learning to be applied for evaluating how machine learning results depend on and are influenced by various properties of the data which are identified with traditional statistical methods.

Results

We found statistically significant Kendall rank-correlations and high cosine similarity values between various health-related expression statement pairs concerning the “need for help” ratings and a background question pair. With tests of Wilcoxon rank-sum, Kruskal-Wallis and one-way analysis of variance (ANOVA) between groups we identified statistically significant rating differences for several health-related expression statements in respect to groupings based on the answer values of background questions, such as the ratings of suspecting to have the coronavirus infection and having it depending on the estimated health condition, quality of life and sex. Our new methodology enabled us to identify how statistically significant rating differences were linked to machine learning results thus helping to develop better human-understandable machine learning models.

Conclusions

The self-rated “need for help” concerning health-related expression statements differs statistically significantly depending on the person’s background information, such as his/her estimated health condition, quality of life and sex. With our new methodology statistically significant rating differences can be linked to machine learning results thus enabling to develop better machine learning to identify, interpret and address the patient’s needs for well-personalized care.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-021-01502-8

Should samples be weighted to decrease selection bias in online surveys during the COVID-19 pandemic? Data from seven datasets

Abstract

Background

Online surveys have triggered a heated debate regarding their scientific validity. Many authors have adopted weighting methods to enhance the quality of online survey findings, while others did not find an advantage for this method. This work aims to compare weighted and unweighted association measures after adjustment over potential confounding, taking into account dataset properties such as the initial gap between the population and the selected sample, the sample size, and the variable types.

Methods

This study assessed seven datasets collected between 2019 and 2021 during the COVID-19 pandemic through online cross-sectional surveys using the snowball sampling technique. Weighting methods were applied to adjust the online sample over sociodemographic features of the target population.

Results

Despite varying age and gender gaps between weighted and unweighted samples, strong similarities were found for dependent and independent variables. When applied on the same datasets, the regression analysis results showed a high relative difference between methods for some variables, while a low difference was found for others. In terms of absolute impact, the highest impact on the association measure was related to the sample size, followed by the age gap, the gender gap, and finally, the significance of the association between weighted age and the dependent variable.

Conclusion

The results of this analysis of online surveys indicate that weighting methods should be used cautiously, as weighting did not affect the results in some databases, while it did in others. Further research is necessary to define situations in which weighting would be beneficial.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01547-3

Ripple effects mapping: capturing the wider impacts of systems change efforts in public health

Abstract

Background

Systems approaches are currently being advocated and implemented to address complex challenges in Public Health. These approaches work by bringing multi-sectoral stakeholders together to develop a collective understanding of the system, and then to identify places where they can leverage change across the system. Systems approaches are unpredictable, where cause-and-effect cannot always be disentangled, and unintended consequences – positive and negative – frequently arise. Evaluating such approaches is difficult and new methods are warranted.

Methods

Ripple Effects Mapping (REM) is a qualitative method which can capture the wider impacts, and adaptive nature, of a systems approach. Using a case study example from the evaluation of a physical activity-orientated systems approach in Gloucestershire, we: a) introduce the adapted REM method; b) describe how REM was applied in the example; c) explain how REM outputs were analysed; d) provide examples of how REM outputs were used; and e) describe the strengths, limitations, and future uses of REM based on our reflections.

Results

Ripple Effects Mapping is a participatory method that requires the active input of programme stakeholders in data gathering workshops. It produces visual outputs (i.e., maps) of the programme activities and impacts, which are mapped along a timeline to understand the temporal dimension of systems change efforts. The REM outputs from our example were created over several iterations, with data collected every 3–4 months, to build a picture of activities and impacts that have continued or ceased. Workshops took place both in person and online. An inductive content analysis was undertaken to describe and quantify the patterns within the REM outputs. Detailed guidance related to the preparation, delivery, and analysis of REM are included in this paper.

Conclusion

REM may help to advance our understanding and evaluation of complex systems approaches, especially within the field of Public Health. We therefore invite other researchers, practitioners and policymakers to use REM and continuously evolve the method to enhance its application and practical utility.

Developing a tool to assess the skills to perform a health technology assessment

Abstract

Background

Health technology assessment (HTA) brings together evidence from various disciplines while using explicit methods to assess the value of health technologies. In resource-constrained settings, there is a growing demand to measure and develop specialist skills, including those for HTA, to aid the implementation of Universal Healthcare Coverage. The purpose of this study was twofold: a) to find validated tools for the assessment of the technical capacity to conduct a HTA, and if none were found, to develop a tool, and b) to describe experiences of its pilot.

Methods

First, a mapping review identified tools to assess the skills to conduct a HTA. A medical librarian conducted a comprehensive search in four databases (MEDLINE, Embase, Web of Science, ERIC). Then, incorporating results from the mapping and following an iterative process involving stakeholders and experts, we developed a HTA skills assessment tool. Finally, using an online platform to gather and analyse responses, in collaboration with our institutional partner, we piloted the tool in Ghana, and sought feedback on their experiences.

Results

The database search yielded 3871 records; fifteen those were selected based on a priori criteria. These records were published between 2003 and 2018, but none covered all technical skills to conduct a HTA. In the absence of an instrument meeting our needs, we developed a HTA skill assessment tool containing four sections (general information, core and soft skills, and future needs). The tool was designed to be administered to a broad range of individuals who would potentially contribute to the planning, delivery and evaluation of HTA. The tool was piloted with twenty-three individuals who completed the skills assessment and shared their initial impressions of the tool.

Conclusions

To our knowledge, this is the first comprehensive tool enabling the assessment of technical skills to conduct a HTA. This tool allows teams to understand where their individual strengths and weakness lie. The tool is in the early validation phases and further testing is needed.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01562-4

Research monitoring practices in critical care research: a survey of current state and attitudes

Abstract

Background/Aims

In 2016, international standards governing clinical research recommended that the approach to monitoring a research project should be undertaken based on risk, however it is unknown whether this approach has been adopted in Australia and New Zealand (ANZ) throughout critical care research. The aims of the project were to: 1) Gain an understanding of current research monitoring practices in academic-led clinical trials in the field of critical care research, 2) Describe the perceived barriers and enablers to undertaking research monitoring.

Methods

Electronic survey distributed to investigators, research co-ordinators and other research staff currently undertaking and supporting academic-led clinical trials in the field of critical care in ANZ.

Results

Of the 118 respondents, 70 were involved in the co-ordination of academic trials; the remaining results pertain to this sub-sample. Fifty-eight (83%) were working in research units associated with hospitals, 29 (41%) were experienced Research Coordinators and 19 (27%) Principal Investigators; 31 (44%) were primarily associated with paediatric research. Fifty-six (80%) develop monitoring plans with 33 (59%) of these undertaking a risk assessment; the most common barrier reported was lack of expertise. Nineteen (27%) indicated that centralised monitoring was used, noting that technology to support centralised monitoring (45/51; 88%) along with support from data managers and statisticians (45/52; 87%) were key enablers. Coronavirus disease-19 (COVID-19) impacted monitoring for 82% (45/55) by increasing remote (25/45; 56%) and reducing onsite (29/45; 64%) monitoring.

Conclusions

Contrary to Good Clinical Practice guidance, risk assessments to inform monitoring plans are not being consistently performed due to lack of experience and guidance. There is an urgent need to enhance risk assessment methodologies and develop technological solutions for centralised statistical monitoring.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01551-7

Estimation of treatment effects in observational stroke care data: comparison of statistical approaches

Abstract

Introduction

Various statistical approaches can be used to deal with unmeasured confounding when estimating treatment effects in observational studies, each with its own pros and cons. This study aimed to compare treatment effects as estimated by different statistical approaches for two interventions in observational stroke care data.

Patients and methods

We used prospectively collected data from the MR CLEAN registry including all patients (n = 3279) with ischemic stroke who underwent endovascular treatment (EVT) from 2014 to 2017 in 17 Dutch hospitals. Treatment effects of two interventions – i.e., receiving an intravenous thrombolytic (IVT) and undergoing general anesthesia (GA) before EVT – on good functional outcome (modified Rankin Scale ≤2) were estimated. We used three statistical regression-based approaches that vary in assumptions regarding the source of unmeasured confounding: individual-level (two subtypes), ecological, and instrumental variable analyses. In the latter, the preference for using the interventions in each hospital was used as an instrument.

Results

Use of IVT (range 66–87%) and GA (range 0–93%) varied substantially between hospitals. For IVT, the individual-level (OR ~ 1.33) resulted in significant positive effect estimates whereas in instrumental variable analysis no significant treatment effect was found (OR 1.11; 95% CI 0.58–1.56). The ecological analysis indicated no statistically significant different likelihood (β = − 0.002%; P = 0.99) of good functional outcome at hospitals using IVT 1% more frequently. For GA, we found non-significant opposite directions of points estimates the treatment effect in the individual-level (ORs ~ 0.60) versus the instrumental variable approach (OR = 1.04). The ecological analysis also resulted in a non-significant negative association (0.03% lower probability).

Discussion and conclusion

Both magnitude and direction of the estimated treatment effects for both interventions depend strongly on the statistical approach and thus on the source of (unmeasured) confounding. These issues should be understood concerning the specific characteristics of data, before applying an approach and interpreting the results. Instrumental variable analysis might be considered when unobserved confounding and practice variation is expected in observational multicenter studies.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01590-0

Characterising and justifying sample size sufficiency in interview-based studies: systematic analysis of qualitative health research over a 15-year period

Abstract

Background

Choosing a suitable sample size in qualitative research is an area of conceptual debate and practical uncertainty. That sample size principles, guidelines and tools have been developed to enable researchers to set, and justify the acceptability of, their sample size is an indication that the issue constitutes an important marker of the quality of qualitative research. Nevertheless, research shows that sample size sufficiency reporting is often poor, if not absent, across a range of disciplinary fields.

Methods

A systematic analysis of single-interview-per-participant designs within three health-related journals from the disciplines of psychology, sociology and medicine, over a 15-year period, was conducted to examine whether and how sample sizes were justified and how sample size was characterised and discussed by authors. Data pertinent to sample size were extracted and analysed using qualitative and quantitative analytic techniques.

Results

Our findings demonstrate that provision of sample size justifications in qualitative health research is limited; is not contingent on the number of interviews; and relates to the journal of publication. Defence of sample size was most frequently supported across all three journals with reference to the principle of saturation and to pragmatic considerations. Qualitative sample sizes were predominantly – and often without justification – characterised as insufficient (i.e., ‘small’) and discussed in the context of study limitations. Sample size insufficiency was seen to threaten the validity and generalizability of studies’ results, with the latter being frequently conceived in nomothetic terms.

Conclusions

We recommend, firstly, that qualitative health researchers be more transparent about evaluations of their sample size sufficiency, situating these within broader and more encompassing assessments of data adequacy. Secondly, we invite researchers critically to consider how saturation parameters found in prior methodological studies and sample size community norms might best inform, and apply to, their own project and encourage that data adequacy is best appraised with reference to features that are intrinsic to the study at hand. Finally, those reviewing papers have a vital role in supporting and encouraging transparent study-specific reporting.

Bias amplification in the g-computation algorithm for time-varying treatments: a case study of industry payments and prescription of opioid products

Abstract

Background

It is often challenging to determine which variables need to be included in the g-computation algorithm under the time-varying setting. Conditioning on instrumental variables (IVs) is known to introduce greater bias when there is unmeasured confounding in the point-treatment settings, and this is also true for near-IVs which are weakly associated with the outcome not through the treatment. However, it is unknown whether adjusting for (near-)IVs amplifies bias in the g-computation algorithm estimators for time-varying treatments compared to the estimators ignoring such variables. We thus aimed to compare the magnitude of bias by adjusting for (near-)IVs across their different relationships with treatments in the time-varying settings.

Methods

After showing a case study of the association between the receipt of industry payments and physicians’ opioid prescribing rate in the US, we demonstrated Monte Carlo simulation to investigate the extent to which the bias due to unmeasured confounders is amplified by adjusting for (near-)IV across several g-computation algorithms.

Results

In our simulation study, adjusting for a perfect IV of time-varying treatments in the g-computation algorithm increased bias due to unmeasured confounding, particularly when the IV had a strong relationship with the treatment. We also found the increase in bias even adjusting for near-IV when such variable had a very weak association with unmeasured confounders between the treatment and the outcome compared to its association with the time-varying treatments. Instead, this bias amplifying feature was not observed (i.e., bias due to unmeasured confounders decreased) by adjusting for near-IV when it had a stronger association with the unmeasured confounders (≥0.1 correlation coefficient in our multivariate normal setting).

Conclusion

It would be recommended to avoid adjusting for perfect IV in the g-computation algorithm to obtain a less biased estimate of the time-varying treatment effect. On the other hand, it may be recommended to include near-IV in the algorithm unless their association with unmeasured confounders is very weak. These findings would help researchers to consider the magnitude of bias when adjusting for (near-)IVs and select variables in the g-computation algorithm for the time-varying setting when they are aware of the presence of unmeasured confounding.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01563-3

Learning from COVID-19 related trial adaptations to inform efficient trial design—a sequential mixed methods study

Abstract

Background

Many clinical trial procedures were often undertaken in-person prior to the COVID-19 pandemic, which has resulted in adaptations to these procedures to enable trials to continue. The aim of this study was to understand whether the adaptations made to clinical trials by UK Clinical Trials Units (CTUs) during the pandemic have the potential to improve the efficiency of trials post-pandemic.

Methods

This was a mixed methods study, initially involving an online survey administered to all registered UK CTUs to identify studies that had made adaptations due to the pandemic. Representatives from selected studies were qualitatively interviewed to explore the adaptations made and their potential to improve the efficiency of future trials. A literature review was undertaken to locate published evidence concerning the investigated adaptations. The findings from the interviews were reviewed by a group of CTU and patient representatives within a workshop, where discussions focused on the potential of the adaptations to improve the efficiency of future trials.

Results

Forty studies were identified by the survey. Fourteen studies were selected and fifteen CTU staff were interviewed about the adaptations. The workshop included 15 CTU and 3 patient representatives. Adaptations were not seen as leading to direct efficiency savings for CTUs. However, three adaptations may have the potential to directly improve efficiencies for trial sites and participants beyond the pandemic: a split remote-first eligibility assessment, recruitment outside the NHS via a charity, and remote consent. There was a lack of published evidence to support the former two adaptations, however, remote consent is widely supported in the literature. Other identified adaptations may benefit by improving flexibility for the participant. Barriers to using these adaptations include the impact on scientific validity, limitations in the role of the CTU, and participant’s access to technology.

Conclusions

Three adaptations (a split remote-first eligibility assessment, recruitment outside the NHS via a charity, and remote consent) have the potential to improve clinical trials but only one (remote consent) is supported by evidence. These adaptations could be tested in future co-ordinated ‘studies within a trial’ (SWAT).

A flexible approach for variable selection in large-scale healthcare database studies with missing covariate and outcome data

Abstract

Background

Prior work has shown that combining bootstrap imputation with tree-based machine learning variable selection methods can provide good performances achievable on fully observed data when covariate and outcome data are missing at random (MAR). This approach however is computationally expensive, especially on large-scale datasets.

Methods

We propose an inference-based method, called RR-BART, which leverages the likelihood-based Bayesian machine learning technique, Bayesian additive regression trees, and uses Rubin’s rule to combine the estimates and variances of the variable importance measures on multiply imputed datasets for variable selection in the presence of MAR data. We conduct a representative simulation study to investigate the practical operating characteristics of RR-BART, and compare it with the bootstrap imputation based methods. We further demonstrate the methods via a case study of risk factors for 3-year incidence of metabolic syndrome among middle-aged women using data from the Study of Women’s Health Across the Nation (SWAN).

Results

The simulation study suggests that even in complex conditions of nonlinearity and nonadditivity with a large percentage of missingness, RR-BART can reasonably recover both prediction and variable selection performances, achievable on the fully observed data. RR-BART provides the best performance that the bootstrap imputation based methods can achieve with the optimal selection threshold value. In addition, RR-BART demonstrates a substantially stronger ability of detecting discrete predictors. Furthermore, RR-BART offers substantial computational savings. When implemented on the SWAN data, RR-BART adds to the literature by selecting a set of predictors that had been less commonly identified as risk factors but had substantial biological justifications.

Conclusion

The proposed variable selection method for MAR data, RR-BART, offers both computational efficiency and good operating characteristics and is utilitarian in large-scale healthcare database studies.

Global prediction model for COVID-19 pandemic with the characteristics of the multiple peaks and local fluctuations

Abstract

Background

With the spread of COVID-19, the time-series prediction of COVID-19 has become a research hotspot. Unlike previous epidemics, COVID-19 has a new pattern of long-time series, large fluctuations, and multiple peaks. Traditional dynamical models are limited to curves with short-time series, single peak, smoothness, and symmetry. Secondly, most of these models have unknown parameters, which bring greater ambiguity and uncertainty. There are still major shortcomings in the integration of multiple factors, such as human interventions, environmental factors, and transmission mechanisms.

Methods

A dynamical model with only infected humans and removed humans was established. Then the process of COVID-19 spread was segmented using a local smoother. The change of infection rate at different stages was quantified using the continuous and periodic Logistic growth function to quantitatively describe the comprehensive effects of natural and human factors. Then, a non-linear variable and NO₂ concentrations were introduced to qualify the number of people who have been prevented from infection through human interventions.

Results

The experiments and analysis showed the R² of fitting for the US, UK, India, Brazil, Russia, and Germany was 0.841, 0.977, 0.974, 0.659, 0.992, and 0.753, respectively. The prediction accuracy of the US, UK, India, Brazil, Russia, and Germany in October was 0.331, 0.127, 0.112, 0.376, 0.043, and 0.445, respectively.

Conclusion

The model can not only better describe the effects of human interventions but also better simulate the temporal evolution of COVID-19 with local fluctuations and multiple peaks, which can provide valuable assistant decision-making information.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01604-x

A systematic review of methods to estimate colorectal cancer incidence using population-based cancer registries

Abstract

Background

Epidemiological studies of incidence play an essential role in quantifying disease burden, resource planning, and informing public health policies. A variety of measures for estimating cancer incidence have been used. Appropriate reporting of incidence calculations is essential to enable clear interpretation. This review uses colorectal cancer (CRC) as an exemplar to summarize and describe variation in commonly employed incidence measures and evaluate the quality of reporting incidence methods.

Methods

We searched four databases for CRC incidence studies published between January 2010 and May 2020. Two independent reviewers screened all titles and abstracts. Eligible studies were population-based cancer registry studies evaluating CRC incidence. We extracted data on study characteristics and author-defined criteria for assessing the quality of reporting incidence. We used descriptive statistics to summarize the information.

Results

This review retrieved 165 relevant articles. The age-standardized incidence rate (ASR) (80%) was the most commonly reported incidence measure, and the 2000 U.S. standard population the most commonly used reference population (39%). Slightly more than half (54%) of the studies reported CRC incidence stratified by anatomical site. The quality of reporting incidence methods was suboptimal. Of all included studies: 45 (27%) failed to report the classification system used to define CRC; 63 (38%) did not report CRC codes; and only 20 (12%) documented excluding certain CRC cases from the numerator. Concerning the denominator estimation: 61% of studies failed to state the source of population data; 24 (15%) indicated census years; 10 (6%) reported the method used to estimate yearly population counts; and only 5 (3%) explicitly explained the population size estimation procedure to calculate the overall average incidence rate. Thirty-three (20%) studies reported the confidence interval for incidence, and only 7 (4%) documented methods for dealing with missing data.

Conclusion

This review identified variations in incidence calculation and inadequate reporting of methods. We outlined recommendations to optimize incidence estimation and reporting practices. There is a need to establish clear guidelines for incidence reporting to facilitate assessment of the validity and interpretation of reported incidence.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01632-7

Using observational study data as an external control group for a clinical trial: an empirical comparison of methods to account for longitudinal missing data

Abstract

Background

Observational data are increasingly being used to conduct external comparisons to clinical trials. In this study, we empirically examined whether different methodological approaches to longitudinal missing data affected study conclusions in this setting.

Methods

We used data from one clinical trial and one prospective observational study, both Norwegian multicenter studies including patients with recently diagnosed rheumatoid arthritis and implementing similar treatment strategies, but with different stringency. A binary disease remission status was defined at 6, 12, and 24 months in both studies. After identifying patterns of longitudinal missing outcome data, we evaluated the following five approaches to handle missingness: analyses of patients with complete follow-up data, multiple imputation (MI), inverse probability of censoring weighting (IPCW), and two combinations of MI and IPCW.

Results

We found a complex non-monotone missing data pattern in the observational study (N = 328), while missing data in the trial (N = 188) was monotone due to drop-out. In the observational study, only 39.0% of patients had complete outcome data, compared to 89.9% in the trial. All approaches to missing data indicated favorable outcomes of the treatment strategy in the trial and resulted in similar study conclusions. Variations in results across approaches were mainly due to variations in estimated outcomes for the observational data.

Conclusions

Five different approaches to handle longitudinal missing data resulted in similar conclusions in our example. However, the extent and complexity of missing observational data affected estimated comparative outcomes across approaches, highlighting the need for careful consideration of methods to account for missingness in this setting. Based on this empirical examination, we recommend using a prespecified advanced missing data approach to account for longitudinal missing data, and to conduct alternative approaches in sensitivity analyses.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01639-0

Estimating risk ratio from any standard epidemiological design by doubling the cases

Abstract

Background

Despite the ease of interpretation and communication of a risk ratio (RR), and several other advantages in specific settings, the odds ratio (OR) is more commonly reported in epidemiological and clinical research. This is due to the familiarity of the logistic regression model for estimating adjusted ORs from data gathered in a cross-sectional, cohort or case-control design. The preservation of the OR (but not RR) in case-control samples has contributed to the perception that it is the only valid measure of relative risk from case-control samples. For cohort or cross-sectional data, a method known as ‘doubling-the-cases’ provides valid estimates of RR and an expression for a robust standard error has been derived, but is not available in statistical software packages.

Methods

In this paper, we first describe the doubling-of-cases approach in the cohort setting and then extend its application to case-control studies by incorporating sampling weights and deriving an expression for a robust standard error. The performance of the estimator is evaluated using simulated data, and its application illustrated in a study of neonatal jaundice. We provide an R package that implements the method for any standard design.

Results

Our work illustrates that the doubling-of-cases approach for estimating an adjusted RR from cross-sectional or cohort data can also yield valid RR estimates from case-control data. The approach is straightforward to apply, involving simple modification of the data followed by logistic regression analysis. The method performed well for case-control data from simulated cohorts with a range of prevalence rates. In the application to neonatal jaundice, the RR estimates were similar to those from relative risk regression, whereas the OR from naive logistic regression overestimated the RR despite the low prevalence of the outcome.

Conclusions

By providing an R package that estimates an adjusted RR from cohort, cross-sectional or case-control studies, we have enabled the method to be easily implemented with familiar software, so that investigators are not limited to reporting an OR and can examine the RR when it is of interest.

Automating risk of bias assessment in systematic reviews: a real-time mixed methods comparison of human researchers to a machine learning system

Abstract

Background

Machine learning and automation are increasingly used to make the evidence synthesis process faster and more responsive to policymakers’ needs. In systematic reviews of randomized controlled trials (RCTs), risk of bias assessment is a resource-intensive task that typically requires two trained reviewers. One function of RobotReviewer, an off-the-shelf machine learning system, is an automated risk of bias assessment.

Methods

We assessed the feasibility of adopting RobotReviewer within a national public health institute using a randomized, real-time, user-centered study. The study included 26 RCTs and six reviewers from two projects examining health and social interventions. We randomized these studies to one of two RobotReviewer platforms. We operationalized feasibility as accuracy, time use, and reviewer acceptability. We measured accuracy by the number of corrections made by human reviewers (either to automated assessments or another human reviewer’s assessments). We explored acceptability through group discussions and individual email responses after presenting the quantitative results.

Results

Reviewers were equally likely to accept judgment by RobotReviewer as each other’s judgement during the consensus process when measured dichotomously; risk ratio 1.02 (95% CI 0.92 to 1.13; p = 0.33). We were not able to compare time use. The acceptability of the program by researchers was mixed. Less experienced reviewers were generally more positive, and they saw more benefits and were able to use the tool more flexibly. Reviewers positioned human input and human-to-human interaction as superior to even a semi-automation of this process.

Conclusion

Despite being presented with evidence of RobotReviewer’s equal performance to humans, participating reviewers were not interested in modifying standard procedures to include automation. If further studies confirm equal accuracy and reduced time compared to manual practices, we suggest that the benefits of RobotReviewer may support its future implementation as one of two assessors, despite reviewer ambivalence. Future research should study barriers to adopting automated tools and how highly educated and experienced researchers can adapt to a job market that is increasingly challenged by new technologies.

A progressive three-state model to estimate time to cancer: a likelihood-based approach

Abstract

Background

To optimize colorectal cancer (CRC) screening and surveillance, information regarding the time-dependent risk of advanced adenomas (AA) to develop into CRC is crucial. However, since AA are removed after diagnosis, the time from AA to CRC cannot be observed in an ethically acceptable manner. We propose a statistical method to indirectly infer this time in a progressive three-state disease model using surveillance data.

Methods

Sixteen models were specified, with and without covariates. Parameters of the parametric time-to-event distributions from the adenoma-free state (AF) to AA and from AA to CRC were estimated simultaneously, by maximizing the likelihood function. Model performance was assessed via simulation. The methodology was applied to a random sample of 878 individuals from a Norwegian adenoma cohort.

Results

Estimates of the parameters of the time distributions are consistent and the 95% confidence intervals (CIs) have good coverage. For the Norwegian sample (AF: 78%, AA: 20%, CRC: 2%), a Weibull model for both transition times was selected as the final model based on information criteria. The mean time among those who have made the transition to CRC since AA onset within 50 years was estimated to be 4.80 years (95% CI: 0; 7.61). The 5-year and 10-year cumulative incidence of CRC from AA was 13.8% (95% CI: 7.8%;23.8%) and 15.4% (95% CI: 8.2%;34.0%), respectively.

Conclusions

The time-dependent risk from AA to CRC is crucial to explain differences in the outcomes of microsimulation models used for the optimization of CRC prevention. Our method allows for improving models by the inclusion of data-driven time distributions.

The effectiveness of hand hygiene interventions for preventing community transmission or acquisition of novel coronavirus or influenza infections: a systematic review

Abstract

Background

Novel coronaviruses and influenza can cause infection, epidemics, and pandemics. Improving hand hygiene (HH) of the general public is recommended for preventing these infections. This systematic review examined the effectiveness of HH interventions for preventing transmission or acquisition of such infections in the community.

Methods

PubMed, MEDLINE, CINAHL and Web of Science databases were searched (January 2002–February 2022) for empirical studies related to HH in the general public and to the acquisition or transmission of novel coronavirus infections or influenza. Studies on healthcare staff, and with outcomes of compliance or absenteeism were excluded. Study selection, data extraction and quality assessment, using the Cochrane Effective Practice and Organization of Care risk of bias criteria or Joanna Briggs Institute Critical Appraisal checklists, were conducted by one reviewer, and double-checked by another. For intervention studies, effect estimates were calculated while the remaining studies were synthesised narratively. The protocol was pre-registered (PROSPERO 2020: CRD42020196525).

Results

Twenty-two studies were included. Six were intervention studies evaluating the effectiveness of HH education and provision of products, or hand washing against influenza. Only two school-based interventions showed a significant protective effect (OR: 0.64; 95% CI 0.51, 0.80 and OR: 0.40; 95% CI 0.22, 0.71), with risk of bias being high (n = 1) and unclear (n = 1). Of the 16 non-intervention studies, 13 reported the protective effect of HH against influenza, SARS or COVID-19 (P < 0.05), but risk of bias was high (n = 7), unclear (n = 5) or low (n = 1). However, evidence in relation to when, and how frequently HH should be performed was inconsistent.

Conclusions

To our knowledge, this is the first systematic review of effectiveness of HH for prevention of community transmission or acquisition of respiratory viruses that have caused epidemics or pandemics, including SARS-CoV-1, SARS-CoV-2 and influenza viruses. The evidence supporting the protective effect of HH was heterogeneous and limited by methodological quality; thus, insufficient to recommend changes to current HH guidelines. Future work is required to identify in what circumstances, how frequently and what product should be used when performing HH in the community and to develop effective interventions for promoting these specific behaviours in communities during epidemics.

https://bmcpublichealth.biomedcentral.com/articles/10.1186/s12889-022-13667-y

Machine learning approach for the prediction of 30-day mortality in patients with sepsis-associated encephalopathy

Abstract

Objective

Our study aimed to identify predictors as well as develop machine learning (ML) models to predict the risk of 30-day mortality in patients with sepsis-associated encephalopathy (SAE).

Materials and methods

ML models were developed and validated based on a public database named Medical Information Mart for Intensive Care (MIMIC)-IV. Models were compared by the area under the curve (AUC), accuracy, sensitivity, specificity, positive and negative predictive values, and Hosmer–Lemeshow good of fit test.

Results

Of 6994 patients in MIMIC-IV included in the final cohort, a total of 1232 (17.62%) patients died following SAE. Recursive feature elimination (RFE) selected 15 variables, including acute physiology score III (APSIII), Glasgow coma score (GCS), sepsis related organ failure assessment (SOFA), Charlson comorbidity index (CCI), red blood cell volume distribution width (RDW), blood urea nitrogen (BUN), age, respiratory rate, PaO₂, temperature, lactate, creatinine (CRE), malignant cancer, metastatic solid tumor, and platelet (PLT). The validation cohort demonstrated all ML approaches had higher discriminative ability compared with the bagged trees (BT) model, although the difference was not statistically significant. Furthermore, in terms of the calibration performance, the artificial neural network (NNET), logistic regression (LR), and adapting boosting (Ada) models had a good calibration—namely, a high accuracy of prediction, with P-values of 0.831, 0.119, and 0.129, respectively.

Conclusions

The ML models, as demonstrated by our study, can be used to evaluate the prognosis of SAE patients in the intensive care unit (ICU). Online calculator could facilitate the sharing of predictive models.

Machine learning is an effective method to predict the 90-day prognosis of patients with transient ischemic attack and minor stroke

Abstract

Objective

We aimed to investigate factors related to the 90-day poor prognosis (mRS≥3) in patients with transient ischemic attack (TIA) or minor stroke, construct 90-day poor prognosis prediction models for patients with TIA or minor stroke, and compare the predictive performance of machine learning models and Logistic model.

Method

We selected TIA and minor stroke patients from a prospective registry study (CNSR-III). Demographic characteristics,smoking history, drinking history(≥20g/day), physiological data, medical history,secondary prevention treatment, in-hospital evaluation and education,laboratory data, neurological severity, mRS score and TOAST classification of patients were assessed. Univariate and multivariate logistic regression analyses were performed in the training set to identify predictors associated with poor outcome (mRS≥3). The predictors were used to establish machine learning models and the traditional Logistic model, which were randomly divided into the training set and test set according to the ratio of 70:30. The training set was used to construct the prediction model, and the test set was used to evaluate the effect of the model. The evaluation indicators of the model included the area under the curve (AUC) of the discrimination index and the Brier score (or calibration plot) of the calibration index.

Result

A total of 10967 patients with TIA and minor stroke were enrolled in this study, with an average age of 61.77 ± 11.18 years, and women accounted for 30.68%. Factors associated with the poor prognosis in TIA and minor stroke patients included sex, age, stroke history, heart rate, D-dimer, creatinine, TOAST classification, admission mRS, discharge mRS, and discharge NIHSS score. All models, both those constructed by Logistic regression and those by machine learning, performed well in predicting the 90-day poor prognosis (AUC >0.800). The best performing AUC in the test set was the Catboost model (AUC=0.839), followed by the XGBoost, GBDT, random forest and Adaboost model (AUCs equal to 0.838, 0, 835, 0.832, 0.823, respectively). The performance of Catboost and XGBoost in predicting poor prognosis at 90-day was better than the Logistic model, and the difference was statistically significant(P<0.05). All models, both those constructed by Logistic regression and those by machine learning had good calibration.

Conclusion

Machine learning algorithms were not inferior to the Logistic regression model in predicting the poor prognosis of patients with TIA and minor stroke at 90-day. Among them, the Catboost model had the best predictive performance. All models provided good discrimination.

Nested and multipart prospective observational studies, flaming fiasco or efficiently economical?: The Brain, Bone, Heart case study

Abstract

Background

Collecting new data from cross-sectional/survey and cohort observational study designs can be expensive and time-consuming. Nested (hierarchically cocooned within an existing parent study) and/or Multipart (≥ 2 integrally interlinked projects) study designs can expand the scope of a prospective observational research program beyond what might otherwise be possible with available funding and personnel. The Brain, Bone, Heart (BBH) study provides an exemplary case to describe the real-world advantages, challenges, considerations, and insights from these complex designs.

Main

BBH is a Nested, Multipart study conducted by the Specialized Center for Research Excellence (SCORE) on Sex Differences at Emory University. BBH is designed to examine whether estrogen insufficiency-induced inflammation compounds HIV-induced inflammation, leading to end-organ damage and aging-related co-morbidities affecting the neuro-hypothalamic–pituitary–adrenal axis (brain), musculoskeletal (bone), and cardiovascular (heart) organ systems. Using BBH as a real-world case study, we describe the advantages and challenges of Nested and Multipart prospective cohort study design in practice. While excessive dependence on its parent study can pose challenges in a Nested study, there are significant advantages to the study design as well. These include the ability to leverage a parent study’s resources and personnel; more comprehensive data collection and data sharing options; a broadened community of researchers for collaboration; dedicated longitudinal research participants; and, access to historical data. Multipart, interlinked studies that share a common cohort of participants and pool of resources have the advantage of dedicated key personnel and the challenge of increased organizational complexity. Important considerations for each study design include the stability and administration of the parent study (Nested) and the cohesiveness of linkage elements and staff organizational capacity (Multipart).

Conclusion

Using the experience of BBH as an example, Nested and/or Multipart study designs have both distinct advantages and potential vulnerabilities that warrant consideration and require strong biostatistics and data management leadership to optimize programmatic success and impact.

Sample size recalculation based on the prevalence in a randomized test-treatment study

Abstract

Background

Randomized test-treatment studies aim to evaluate the clinical utility of diagnostic tests by providing evidence on their impact on patient health. However, the sample size calculation is affected by several factors involved in the test-treatment pathway, including the prevalence of the disease. Sample size planning is exposed to strong uncertainties in terms of the necessary assumptions, which have to be compensated for accordingly by adjusting prospectively determined study parameters during the course of the study.

Method

An adaptive design with a blinded sample size recalculation in a randomized test-treatment study based on the prevalence is proposed and evaluated by a simulation study. The results of the adaptive design are compared to those of the fixed design.

Results

The adaptive design achieves the desired theoretical power, under the assumption that all other nuisance parameters have been specified correctly, while wrong assumptions regarding the prevalence may lead to an over- or underpowered study in the fixed design. The empirical type I error rate is sufficiently controlled in the adaptive design as well as in the fixed design.

Conclusion

The consideration of a blinded recalculation of the sample size already during the planning of the study may be advisable in order to increase the possibility of success as well as an enhanced process of the study. However, the application of the method is subject to a number of limitations associated with the study design in terms of feasibility, sample sizes needed to be achieved, and fulfillment of necessary prerequisites.

Estimating causal effects in the presence of competing events using regression standardisation with the Stata command standsurv

Abstract

Background

When interested in a time-to-event outcome, competing events that prevent the occurrence of the event of interest may be present. In the presence of competing events, various estimands have been suggested for defining the causal effect of treatment on the event of interest. Depending on the estimand, the competing events are either accommodated or eliminated, resulting in causal effects with different interpretations. The former approach captures the total effect of treatment on the event of interest while the latter approach captures the direct effect of treatment on the event of interest that is not mediated by the competing event. Separable effects have also been defined for settings where the treatment can be partitioned into two components that affect the event of interest and the competing event through different causal pathways.

Methods

We outline various causal effects that may be of interest in the presence of competing events, including total, direct and separable effects, and describe how to obtain estimates using regression standardisation with the Stata command standsurv. Regression standardisation is applied by obtaining the average of individual estimates across all individuals in a study population after fitting a survival model.

Results

With standsurv several contrasts of interest can be calculated including differences, ratios and other user-defined functions. Confidence intervals can also be obtained using the delta method. Throughout we use an example analysing a publicly available dataset on prostate cancer to allow the reader to replicate the analysis and further explore the different effects of interest.

Conclusions

Several causal effects can be defined in the presence of competing events and, under assumptions, estimates of those can be obtained using regression standardisation with the Stata command standsurv. The choice of which causal effect to define should be given careful consideration based on the research question and the audience to which the findings will be communicated.

Restrictions and their reporting in systematic reviews of effectiveness: an observational study

Abstract

Background

Restrictions in systematic reviews (SRs) can lead to bias and may affect conclusions. Therefore, it is important to report whether and which restrictions were used. This study aims to examine the use of restrictions regarding language, publication period, and study type, as well as the transparency of reporting in SRs of effectiveness.

Methods

A retrospective observational study was conducted with a random sample of 535 SRs of effectiveness indexed in PubMed between 2000 and 2019. The use of restrictions and their reporting were analysed using descriptive statistics.

Results

Of the total 535 SRs included, four out of every ten (41.3%) lacked information on at least one of the three restrictions considered (language, publication period, or study type). Overall, 14.6% of SRs did not provide information on restrictions regarding publication period, 19.1% regarding study type, and 18.3% regarding language. Of all included SRs, language was restricted in 46.4%, and in more than half of the SRs with restricted language (130/248), it was unclear whether the restriction was applied during either the search or the screening process, or both. The restrictions were justified for publication period in 22.2% of the respective SRs (33/149), study type in 6.5% (28/433), and language in 3.2% (8/248). Differences in reporting were found between countries as well as between Cochrane and non-Cochrane reviews.

Conclusions

This study suggests that there is a lack of transparency in reporting on restrictions in SRs. Authors as well as editors and reviewers should be encouraged to improve the reporting and justification of restrictions to increase the transparency of SRs.

Assessing the quality of evidence on safety: specifications for application and suggestions for adaptions of the GRADE-criteria in the context of preparing a list of potentially inappropriate medications for older adults

Abstract

Background

Systematic reviews that synthesize safety outcomes pose challenges (e.g. rare events), which raise questions for grading the strength of the body of evidence. This is maybe one reason why in many potentially inappropriate medication (PIM) lists the recommendations are not based on formalized systems for assessing the quality of the body of evidence such as GRADE.

In this contribution, we describe specifications and suggest adaptions of the GRADE system for grading the quality of evidence on safety outcomes, which were developed in the context of preparing a PIM-list, namely PRISCUS.

Methods

We systematically assessed each of the five GRADE domains for rating-down (study limitations, imprecision, inconsistency, indirectness, publication bias) and the criteria for rating-up, considering if special considerations or revisions of the original approach were indicated. The result was gathered in a written document and discussed in a group-meeting of five members with various background until consensus. Subsequently, we performed a proof-of-concept application using a convenience sample of systematic reviews and applied the approach to systematic reviews on 19 different clinical questions.

Results

We describe specifications and suggest adaptions for the criteria “study limitations”, imprecision, “publication bias” and “rating-up for large effect”. In addition, we suggest a new criterion to account for data from subgroup-analyses. The proof-of-concept application did not reveal a need for further revision and thus we used the approach for the systematic reviews that were prepared for the PRISCUS-list.

We assessed 51 outcomes. Each of the proposed adaptions was applied. There were neither an excessive number of low and very low ratings, nor an excessive number of high ratings, but the different methodological quality of the safety outcomes appeared to be well reflected.

Conclusion

The suggestions appear to have the potential to overcome some of the challenges when grading the methodological quality of harms and thus may be helpful for producers of evidence syntheses considering safety.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01715-5

Estimating effects of health policy interventions using interrupted time-series analyses: a simulation study

Abstract

Background

A classic methodology used in evaluating the impact of health policy interventions is interrupted time-series (ITS) analysis, applying a quasi-experimental design that uses both pre- and post-policy data without randomization. In this paper, we took a simulation-based approach to estimating intervention effects under different assumptions.

Methods

Each of the simulated mortality rates contained a linear time trend, seasonality, autoregressive, and moving-average terms. The simulations of the policy effects involved three scenarios: 1) immediate-level change only, 2) immediate-level and slope change, and 3) lagged-level and slope change. The estimated effects and biases of these effects were examined via three matched generalized additive mixed models, each of which used two different approaches: 1) effects based on estimated coefficients (estimated approach), and 2) effects based on predictions from models (predicted approach). The robustness of these two approaches was further investigated assuming misspecification of the models.

Results

When one simulated dataset was analyzed with the matched model, the two analytical approaches produced similar estimates. However, when the models were misspecified, the number of deaths prevented, estimated using the predicted vs. estimated approaches, were very different, with the predicted approach yielding estimates closer to the real effect. The discrepancy was larger when the policy was applied early in the time-series.

Conclusion

Even when the sample size appears to be large enough, one should still be cautious when conducting ITS analyses, since the power also depends on when in the series the intervention occurs. In addition, the intervention lagged effect needs to be fully considered at the study design stage (i.e., when developing the models).

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01716-4

“We’re already doing this work”: ethical research with community-based organizations

Abstract

Background

Public health research frequently relies on collaborations with community-based organizations, and these partnerships can be essential to the success of a project. However, while public health ethics and oversight policies have historically focused on ensuring that individual subjects are protected from unethical or unfair practices, there are few guidelines to protect the organizations which facilitate relationships with – and are frequently composed of – these same vulnerable populations. As universities, governments, and donors place a renewed emphasis on the need for community engaged research to address systematic drivers of health inequity, it is vital that the ways in which research is conducted does not uphold the same intersecting systems of gender, race, and class oppression which led to the very same health inequities of interest.

Methods

To understand how traditional notions of public health research ethics might be expanded to encompass partnerships with organizations as well as individuals, we conducted qualitative interviews with 39 staff members (executive directors and frontline) at community-based organizations that primarily serve people who use drugs, Black men who have sex with men, and sex workers across the United States from January 2016 – August 2017. We also conducted 11 in-depth interviews with professional academic researchers with experience partnering with CBOs that serve similar populations. Transcripts were analyzed thematically using emergent codes and a priori codes derived from the Belmont Report.

Results

The concepts of respect, beneficence, and justice are a starting point for collaboration with CBOs, but participants deepened them beyond traditional regulatory concepts to consider the ethics of relationships, care, and solidarity. These concepts could and should apply to the treatment of organizations that participate in research just as they apply to individual human subjects, although their implementation will differ when applied to CBOs vs individual human subjects.

Conclusions

Academic-CBO partnerships are likely to be more successful for both academics and CBOs if academic researchers work to center individual-level relationship building that is mutually respectful and grounded in cultural humility. More support from academic institutions and ethical oversight entities can enable more ethically grounded relationships between academic researchers, academic institutions, and community based organizations.

Prospective sampling bias in COVID-19 recruitment methods: experimental evidence from a national randomized survey testing recruitment materials

Abstract

Background

In the context of the COVID-19 pandemic, social science research has required recruiting many prospective participants. Many researchers have explicitly taken advantage of widespread public interest in COVID-19 to advertise their studies. Leveraging this interest, however, risks creating unrepresentative samples due to differential interest in the topic. In this study, we investigate the design of survey recruitment materials with respect to the views of resultant participants.

Methods

Within a pan-Canadian survey (stratified random mail sampling, n = 1969), the design of recruitment invitations to prospective respondents was experimentally varied, with some prospective respondents receiving COVID-specific recruitment messages and others receiving more general recruitment messages (described as research about health and health policy). All respondents participated, however, in the same survey, allowing comparison of both demographic and attitudinal features between these groups.

Results

Respondents recruited via COVID-19 specific postcards were more likely to agree that COVID-19 is serious and believe that they were likely to contract COVID-19 compared to non-COVID respondents (odds = 0.71, p = 0.04; odds = 0.74, p = 0.03 respectively; comparing health to COVID-19 framed respondents). COVID-19 specific respondents were more likely to disagree that the COVID-19 threat was exaggerated compared to the non-COVID survey respondents (odds = 1.44, p = 0.02).

Conclusions

COVID-19 recruitment framing garnered a higher response rate, as well as a sample with greater concern about coronavirus risks and impacts than respondents who received more neutrally framed recruitment materials.

Qualitative longitudinal research in health research: a method study

Abstract

Background

Qualitative longitudinal research (QLR) comprises qualitative studies, with repeated data collection, that focus on the temporality (e.g., time and change) of a phenomenon. The use of QLR is increasing in health research since many topics within health involve change (e.g., progressive illness, rehabilitation). A method study can provide an insightful understanding of the use, trends and variations within this approach. The aim of this study was to map how QLR articles within the existing health research literature are designed to capture aspects of time and/or change.

Methods

This method study used an adapted scoping review design. Articles were eligible if they were written in English, published between 2017 and 2019, and reported results from qualitative data collected at different time points/time waves with the same sample or in the same setting. Articles were identified using EBSCOhost. Two independent reviewers performed the screening, selection and charting.

Results

A total of 299 articles were included. There was great variation among the articles in the use of methodological traditions, type of data, length of data collection, and components of longitudinal data collection. However, the majority of articles represented large studies and were based on individual interview data. Approximately half of the articles self-identified as QLR studies or as following a QLR design, although slightly less than 20% of them included QLR method literature in their method sections.

Conclusions

QLR is often used in large complex studies. Some articles were thoroughly designed to capture time/change throughout the methodology, aim and data collection, while other articles included few elements of QLR. Longitudinal data collection includes several components, such as what entities are followed across time, the tempo of data collection, and to what extent the data collection is preplanned or adapted across time. Therefore, there are several practices and possibilities researchers should consider before starting a QLR project.

A bayesian approach to model the underlying predictors of early recurrence and postoperative death in patients with colorectal Cancer

Abstract

Objective

This study aimed at utilizing a Bayesian approach semi-competing risks technique to model the underlying predictors of early recurrence and postoperative Death in patients with colorectal cancer (CRC).

Methods

In this prospective cohort study, 284 patients with colorectal cancer, who underwent surgery, referred to Imam Khomeini clinic in Hamadan from 2001 to 2017. The primary outcomes were the probability of recurrence, the probability of Mortality without recurrence, and the probability of Mortality after recurrence. The patients ‘recurrence status was determined from patients’ records. The Bayesian survival modeling was carried out by semi-competing risks illness-death models, with accelerated failure time (AFT) approach, in R 4.1 software. The best model was chosen according to the lowest deviance information criterion (DIC) and highest logarithm of the pseudo marginal likelihood (LPML).

Results

The log-normal model (DIC = 1633, LPML = -811), was the optimal model. The results showed that gender(Time Ratio = 0.764: 95% Confidence Interval = 0.456–0.855), age at diagnosis (0.764: 0.538–0.935 ), T₃ stage (0601: 0.530–0.713), N₂ stage (0.714: 0.577–0.935 ), tumor size (0.709: 0.610–0.929), grade of differentiation at poor (0.856: 0.733–0.988), and moderate (0.648: 0.503–0.955) levels, and the number of chemotherapies (1.583: 1.367–1.863) were significantly related to recurrence. Also, age at diagnosis (0.396: 0.313–0.532), metastasis to other sites (0.566: 0.490–0.835), T₃ stage (0.363: 0.592 − 0.301), T₄ stage (0.434: 0.347–0.545), grade of differentiation at moderate level (0.527: 0.387–0.674), tumor size (0.595: 0.500–0.679), and the number of chemotherapies (1.541: 1.332–2.243) were the significantly predicted the death. Also, age at diagnosis (0.659: 0.559–0.803), and the number of chemotherapies (2.029: 1.792–2.191) were significantly related to mortality after recurrence.

Conclusion

According to specific results obtained from the optimal Bayesian log-normal model for terminal and non-terminal events, appropriate screening strategies and the earlier detection of CRC leads to substantial improvements in the survival of patients.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01746-y

Statistical methods and graphical displays of quality of life with survival outcomes in oncology clinical trials for supporting the estimand framework

Abstract

Background

Although there are discussions regarding standards of the analysis of patient-reported outcomes and quality of life (QOL) in oncology clinical trials, that of QOL with death events is not within their scope. For example, ignoring death can lead to bias in the QOL analysis for patients with moderate or high mortality rates in the palliative care setting. This is discussed in the estimand framework but is controversial. Information loss by summary measures under the estimand framework may make it challenging for clinicians to interpret the QOL analysis results. This study illustrated the use of graphical displays in the framework. They can be helpful for discussions between clinicians and statisticians and decision-making by stakeholders.

Methods

We reviewed the time-to-deterioration analysis, prioritized composite outcome approach, semi-competing risk analysis, survivor analysis, linear mixed model for repeated measures, and principal stratification approach. We summarized attributes of estimands and graphs in the statistical analysis and evaluated them in various hypothetical randomized controlled trials.

Results

Graphs for each analysis method provide different information and impressions. In the time-to-deterioration analysis, it was not easy to interpret the difference in the curves as an effect on QOL. The prioritized composite outcome approach provided new insights for QOL considering death by defining better conditions based on the distinction of OS and QOL. The semi-competing risk analysis provided different insights compared with the time-to-deterioration analysis and prioritized composite outcome approach. Due to the missing assumption, graphs by the linear mixed model for repeated measures should be carefully interpreted, even for descriptive purposes. The principal stratification approach provided pure comparison, but the interpretation was difficult because the target population was unknown.

Conclusions

Graphical displays can capture different aspects of treatment effects that should be described in the estimand framework.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01735-1

Reliability of the evidence to guide decision-making in foot ulcer prevention in diabetes: an overview of systematic reviews

Abstract

Background

Reliable evidence on the effectiveness of interventions to prevent diabetes-related foot ulceration is essential to inform clinical practice. Well-conducted systematic reviews that synthesise evidence from all relevant trials offer the most robust evidence for decision-making. We conducted an overview to assess the comprehensiveness and utility of the available secondary evidence as a reliable source of robust estimates of effect with the aim of informing a cost-effective care pathway using an economic model. Here we report the details of the overview. [PROSPERO Database (CRD42016052324)].

Methods

Medline (Ovid), Embase (Ovid), Epistomonikos, Cochrane Database of Systematic Reviews (CDSR), Database of Abstracts of Reviews of Effectiveness (DARE), and the Health Technology Assessment Journals Library were searched to 17th May 2021, without restrictions, for systematic reviews of randomised controlled trials (RCTs) of preventive interventions in people with diabetes. The primary outcomes of interest were new primary or recurrent foot ulcers. Two reviewers independently extracted data and assessed the risk of bias in the included reviews.

Findings

The overview identified 30 systematic reviews of patient education, footwear and off-loading, complex and other interventions. Many are poorly reported and have fundamental methodological shortcomings associated with increased risk of bias. Most concerns relate to vague inclusion criteria (60%), weak search or selection strategies (70%) and quality appraisal methods (53%) and inexpert conduct and interpretation of quantitative and narrative evidence syntheses (57%). The 30 reviews have collectively assessed 26 largely poor-quality RCTs with substantial overlap.

Interpretation

The majority of these systematic reviews of the effectiveness of interventions to prevent diabetic foot ulceration are at high risk of bias and fail to provide reliable evidence for decision-making. Adherence to the core principles of conducting and reporting systematic reviews is needed to improve the reliability of the evidence generated to inform clinical practice.

A stratified adaptive two-stage design with co-primary endpoints for phase II clinical oncology trials

Abstract

Background

Given the inherent challenges of conducting randomized phase III trials in older cancer patients, single-arm phase II trials which assess the feasibility of a treatment that has already been shown to be effective in a younger population may provide a compelling alternative. Such an approach would need to evaluate treatment feasibility based on a composite endpoint that combines multiple clinical dimensions and to stratify older patients as fit or frail to account for the heterogeneity of the study population to recommend an appropriate treatment approach. In this context, stratified adaptive two-stage designs for binary or composite endpoints, initially developed for biomarker studies, allow to include two subgroups whilst maintaining competitive statistical performances. In practice, heterogeneity may indeed affect more than one dimension and incorporating co-primary endpoints, which independently assess each individual clinical dimension, would therefore appear quite pertinent. The current paper presents a novel phase II design for co-primary endpoints which takes into account the heterogeneity of a population.

Methods

We developed a stratified adaptive Bryant & Day design based on the Jones et al. and Parashar et al. algorithm. This two-stage design allows to jointly assess two dimensions (e.g. activity and toxicity) in two different subgroups. The operating characteristics of this new design were evaluated using examples and simulation comparisons with the Bryant & Day design in the context where the study population is stratified according to a pre-defined criterion.

Results

Simulation results demonstrated that the new design minimized the expected and maximum sample sizes as compared to parallel Bryant & Day designs (one in each subgroup), whilst controlling type I error rates and maintaining a competitive statistical power as well as a high probability of detecting heterogeneity.

Conclusions

In a heterogeneous population, this two-stage stratified adaptive phase II design provides a useful alternative to classical one and allows to identify a subgroup of interest without dramatically increasing sample size. As heterogeneity is not limited to older populations, this new design may also be relevant to other study populations such as children or adolescents and young adults or the development of targeted therapies based on a biomarker.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01748-w

Real-world data: a brief review of the methods, applications, challenges and opportunities

Abstract

Background

The increased adoption of the internet, social media, wearable devices, e-health services, and other technology-driven services in medicine and healthcare has led to the rapid generation of various types of digital data, providing a valuable data source beyond the confines of traditional clinical trials, epidemiological studies, and lab-based experiments.

Methods

We provide a brief overview on the type and sources of real-world data and the common models and approaches to utilize and analyze real-world data. We discuss the challenges and opportunities of using real-world data for evidence-based decision making This review does not aim to be comprehensive or cover all aspects of the intriguing topic on RWD (from both the research and practical perspectives) but serves as a primer and provides useful sources for readers who interested in this topic.

Results and Conclusions

Real-world hold great potential for generating real-world evidence for designing and conducting confirmatory trials and answering questions that may not be addressed otherwise. The voluminosity and complexity of real-world data also call for development of more appropriate, sophisticated, and innovative data processing and analysis techniques while maintaining scientific rigor in research findings, and attentions to data ethics to harness the power of real-world data.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01768-6

A proposed methodology for uncertainty extraction and verification in priority setting partnerships with the James Lind Alliance: an example from the Common Conditions Affecting the Hand and Wrist Priority Setting Partnership

Abstract

Background

To report our recommended methodology for extracting and then confirming research uncertainties – areas where research has failed to answer a research question – derived from previously published literature during a broad scope Priority Setting Partnership (PSP) with the James Lind Alliance (JLA).

Methods

This process was completed in the UK as part of the PSP for “Common Conditions Affecting the Hand and Wrist”, comprising of health professionals, patients and carers and reports the data (uncertainty) extraction phase of this. The PSP followed the robust methodology dictated by the JLA and sought to identify knowledge gaps, termed “uncertainties” by the JLA. Published Cochrane Systematic Reviews, Guidelines and Protocols, NICE (National Institute for Health and Care Excellence) Guidelines, and SIGN (Scottish Intercollegiate Guidelines Network) Guidelines were screened for documented “uncertainties”. A robust method of screening, internally verifying and then checking uncertainties was adopted. This included independent screening and data extraction by multiple researchers and use of a PRISMA flowchart, alongside steering group consensus processes.

Selection of research uncertainties was guided by the scope of the Common Conditions Affecting the Hand and Wrist PSP which focused on “common” hand conditions routinely treated by hand specialists, including hand surgeons and hand therapists limited to identifying questions concerning the results of intervention, and not the basic science or epidemiology behind disease.

Results

Of the 2358 records identified (after removal of duplicates) which entered the screening process, 186 records were presented to the PSP steering group for eligibility assessment; 79 were deemed within scope and included for the purpose of research uncertainty extraction (45 full Cochrane Reviews, 18 Cochrane Review protocols, 16 Guidelines). These yielded 89 research uncertainties, which were compared to the stakeholder survey, and added to the longlist where necessary; before derived uncertainties were checked against non-Cochrane published systematic reviews.

Conclusions

In carrying out this work, beyond reporting on output of the Common Conditions Affecting the Hand and Wrist PSP, we detail the methodology and processes we hope can inform and facilitate the work of future PSPs and other evidence reviews, especially those with a broader scope beyond a single disease or condition.

Using a cohort study of diabetes and peripheral artery disease to compare logistic regression and machine learning via random forest modeling

Abstract

Background

This study illustrates the use of logistic regression and machine learning methods, specifically random forest models, in health services research by analyzing outcomes for a cohort of patients with concomitant peripheral artery disease and diabetes mellitus.

Methods

Cohort study using fee-for-service Medicare beneficiaries in 2015 who were newly diagnosed with peripheral artery disease and diabetes mellitus. Exposure variables include whether patients received preventive measures in the 6 months following their index date: HbA1c test, foot exam, or vascular imaging study. Outcomes include any reintervention, lower extremity amputation, and death. We fit both logistic regression models as well as random forest models.

Results

There were 88,898 fee-for-service Medicare beneficiaries diagnosed with peripheral artery disease and diabetes mellitus in our cohort. The rate of preventative treatments in the first six months following diagnosis were 52% (n = 45,971) with foot exams, 43% (n = 38,393) had vascular imaging, and 50% (n = 44,181) had an HbA1c test. The directionality of the influence for all covariates considered matched those results found with the random forest and logistic regression models. The most predictive covariate in each approach differs as determined by the t-statistics from logistic regression and variable importance (VI) in the random forest model. For amputation we see age 85 + (t = 53.17) urban-residing (VI = 83.42), and for death (t = 65.84, VI = 88.76) and reintervention (t = 34.40, VI = 81.22) both models indicate age is most predictive.

Conclusions

The use of random forest models to analyze data and provide predictions for patients holds great potential in identifying modifiable patient-level and health-system factors and cohorts for increased surveillance and intervention to improve outcomes for patients. Random forests are incredibly high performing models with difficult interpretation most ideally suited for times when accurate prediction is most desirable and can be used in tandem with more common approaches to provide a more thorough analysis of observational data.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01774-8

Evaluating the sensitivity of jurisdictional heterogeneity and jurisdictional mixing in national level HIV prevention analyses: context of the U.S. ending the HIV epidemic plan

Abstract

Background

The U.S. Ending the HIV epidemic (EHE) plan aims to reduce annual HIV incidence by 90% by 2030, by first focusing interventions on 57 regions (EHE jurisdictions) that contributed to more than 50% of annual HIV diagnoses. Mathematical models that project HIV incidence evaluate the impact of interventions and inform intervention decisions. However, current models are either national level, which do not consider jurisdictional heterogeneity, or independent jurisdiction-specific, which do not consider cross jurisdictional interactions. Data suggests that a significant proportion of persons have sexual partnerships outside their own jurisdiction. However, the sensitivity of these jurisdictional interactions on model outcomes and intervention decisions hasn’t been studied.

Methods

We developed an ordinary differential equations based compartmental model to generate national-level projections of HIV in the U.S., through dynamic simulations of 96 epidemiological sub-models representing 54 EHE and 42 non-EHE jurisdictions. A Bernoulli equation modeled HIV-transmissions using a mixing matrix to simulate sexual partnerships within and outside jurisdictions. To evaluate sensitivity of jurisdictional interactions on model outputs, we analyzed 16 scenarios, combinations of a) proportion of sexual partnerships mixing outside jurisdiction: no-mixing, low-level-mixing-within-state, high-level-mixing-within-state, or high-level-mixing-within-and-outside-state; b) jurisdictional heterogeneity in care and demographics: homogenous or heterogeneous; and c) intervention assumptions for 2019–2030: baseline or EHE-plan (diagnose, treat, and prevent).

Results

Change in incidence in mixing compared to no-mixing scenarios varied by EHE and non-EHE jurisdictions and aggregation-level. When assuming jurisdictional heterogeneity and baseline-intervention, the change in aggregated incidence ranged from − 2 to 0% for EHE and 5 to 21% for non-EHE, but within each jurisdiction it ranged from − 31 to 46% for EHE and − 18 to 109% for non-EHE. Thus, incidence estimates were sensitive to jurisdictional mixing more at the jurisdictional level. As a result, jurisdiction-specific HIV-testing intervals inferred from the model to achieve the EHE-plan were also sensitive, e.g., when no-mixing scenarios suggested testing every 1 year (or 3 years), the three mixing-levels suggested testing every 0.8 to 1.2 years, 0.6 to 1.5 years, and 0.6 to 1.5 years, respectively (or 2.6 to 3.5 years, 2 to 4.8 years, and 2.2 to 4.1 years, respectively). Similar patterns were observed when assuming jurisdictional homogeneity, however, change in incidence in mixing compared to no-mixing scenarios were high even in aggregated incidence.

Conclusions

Accounting jurisdictional mixing and heterogeneity could improve model-based analyses.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01756-w

Performance of several types of beta-binomial models in comparison to standard approaches for meta-analyses with very few studies

Abstract

Background

Meta-analyses are used to summarise the results of several studies on a specific research question. Standard methods for meta-analyses, namely inverse variance random effects models, have unfavourable properties if only very few (2 – 4) studies are available. Therefore, alternative meta-analytic methods are needed. In the case of binary data, the “common-rho” beta-binomial model has shown good results in situations with sparse data or few studies. The major concern of this model is that it ignores the fact that each treatment arm is paired with a respective control arm from the same study. Thus, the randomisation to a study arm of a specific study is disrespected, which may lead to compromised estimates of the treatment effect. Therefore, we extended this model to a version that respects randomisation.

The aim of this simulation study was to compare the “common-rho” beta-binomial model and several other beta-binomial models with standard meta-analyses models, including generalised linear mixed models and several inverse variance random effects models.

Methods

We conducted a simulation study comparing beta-binomial models and various standard meta-analysis methods. The design of the simulation aimed to consider meta-analytic situations occurring in practice.

Results

No method performed well in scenarios with only 2 studies in the random effects scenario. In this situation, a fixed effect model or a qualitative summary of the study results may be preferable. In scenarios with 3 or 4 studies, most methods satisfied the nominal coverage probability. The “common-rho” beta-binomial model showed the highest power under the alternative hypothesis. The beta-binomial model respecting randomisation did not improve performance.

Conclusion

The “common-rho” beta-binomial appears to be a good option for meta-analyses of very few studies. As residual concerns about the consequences of disrespecting randomisation may still exist, we recommend a sensitivity analysis with a standard meta-analysis method that respects randomisation.

Machine learning-based techniques to improve lung transplantation outcomes and complications: a systematic review

Abstract

Background

Machine learning has been used to develop predictive models to support clinicians in making better and more reliable decisions. The high volume of collected data in the lung transplant process makes it possible to extract hidden patterns by applying machine learning methods. Our study aims to investigate the application of machine learning methods in lung transplantation.

Method

A systematic search was conducted in five electronic databases from January 2000 to June 2022. Then, the title, abstracts, and full text of extracted articles were screened based on the PRISMA checklist. Then, eligible articles were selected according to inclusion criteria. The information regarding developed models was extracted from reviewed articles using a data extraction sheet.

Results

Searches yielded 414 citations. Of them, 136 studies were excluded after the title and abstract screening. Finally, 16 articles were determined as eligible studies that met our inclusion criteria. The objectives of eligible articles are classified into eight main categories. The applied machine learning methods include the Support vector machine (SVM) (n = 5, 31.25%) technique, logistic regression (n = 4, 25%), Random Forests (RF) (n = 4, 25%), Bayesian network (BN) (n = 3, 18.75%), linear regression (LR) (n = 3, 18.75%), Decision Tree (DT) (n = 3, 18.75%), neural networks (n = 3, 18.75%), Markov Model (n = 1, 6.25%), KNN (n = 1, 6.25%), K-means (n = 1, 6.25%), Gradient Boosting trees (XGBoost) (n = 1, 6.25%), and Convolutional Neural Network (CNN) (n = 1, 6.25%). Most studies (n = 11) employed more than one machine learning technique or combination of different techniques to make their models. The data obtained from pulmonary function tests were the most used as input variables in predictive model development. Most studies (n = 10) used only post-transplant patient information to develop their models. Also, UNOS was recognized as the most desirable data source in the reviewed articles. In most cases, clinicians succeeded to predict acute diseases incidence after lung transplantation (n = 4) or estimate survival rate (n = 4) by developing machine learning models.

Conclusion

The outcomes of these developed prediction models could aid clinicians to make better and more reliable decisions by extracting new knowledge from the huge volume of lung transplantation data.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01823-2

Interpretable generalized neural additive models for mortality prediction of COVID-19 hospitalized patients in Hamadan, Iran

Background

The high number of COVID-19 deaths is a serious threat to the world. Demographic and clinical biomarkers are significantly associated with the mortality risk of this disease. This study aimed to implement Generalized Neural Additive Model (GNAM) as an interpretable machine learning method to predict the COVID-19 mortality of patients.

Methods

This cohort study included 2181 COVID-19 patients admitted from February 2020 to July 2021 in Sina and Besat hospitals in Hamadan, west of Iran. A total of 22 baseline features including patients' demographic information and clinical biomarkers were collected. Four strategies including removing missing values, mean, K-Nearest Neighbor (KNN), and Multivariate Imputation by Chained Equations (MICE) imputation methods were used to deal with missing data. Firstly, the important features for predicting binary outcome (1: death, 0: recovery) were selected using the Random Forest (RF) method. Also, synthetic minority over-sampling technique (SMOTE) method was used for handling imbalanced data. Next, considering the selected features, the predictive performance of GNAM for predicting mortality outcome was compared with logistic regression, RF, generalized additive model (GAMs), gradient boosting decision tree (GBDT), and deep neural networks (DNNs) classification models. Each model trained on fifty different subsets of a train-test dataset to ensure a model performance. The average accuracy, F1-score and area under the curve (AUC) evaluation indices were used for comparison of the predictive performance of the models.

Results

Out of the 2181 COVID-19 patients, 624 died during hospitalization and 1557 recovered. The missing rate was 3 percent for each patient. The mean age of dead patients (71.17 ± 14.44 years) was statistically significant higher than recovered patients (58.25 ± 16.52 years). Based on RF, 10 features with the highest relative importance were selected as the best influential features; including blood urea nitrogen (BUN), lymphocytes (Lym), age, blood sugar (BS), serum glutamic-oxaloacetic transaminase (SGOT), monocytes (Mono), blood creatinine (CR), neutrophils (NUT), alkaline phosphatase (ALP) and hematocrit (HCT). The results of predictive performance comparisons showed GNAM with the mean accuracy, F1-score, and mean AUC in the test dataset of 0.847, 0.691, and 0.774, respectively, had the best performance. The smooth function graphs learned from the GNAM were descending for the Lym and ascending for the other important features.

Conclusions

Interpretable GNAM can perform well in predicting the mortality of COVID-19 patients. Therefore, the use of such a reliable model can help physicians to prioritize some important demographic and clinical biomarkers by identifying the effective features and the type of predictive trend in disease progression.

Semiparametric modelling of diabetic retinopathy among people with type II diabetes mellitus

Abstract

Background

The proportion of patients with diabetic retinopathy (DR) has grown with increasing number of diabetes mellitus patients in the world. It is among the major causes of blindness worldwide. The main objective of this study was to identify contributing risk factors of DR among people with type II diabetes mellitus.

Method

A sample of 191 people with type II diabetes mellitus was selected from the Black Lion Specialized Hospital diabetic unit from 1 March 2018 to 1 April 2018. A multivariate stochastic regression imputation technique was applied to impute the missing values. The response variable, DR is a categorical variable with two outcomes. Based on the relationship derived from the exploratory analysis, the odds of having DR were not necessarily linearly related to the continuous predictors for this sample of patients. Therefore, a semiparametric model was proposed to identify the risk factors of DR.

Result

From the sample of 191 people with type II diabetes mellitus, 98 (51.3%) of them had DR. The results of semiparametric regression model revealed that being male, hypertension, insulin treatment, and frequency of clinical visits had a significant linear relationships with the odds of having DR. In addition, the log- odds of having DR has a significant nonlinear relation with the interaction of age by gender (for female patients), duration of diabetes, interaction of cholesterol level by gender (for female patients), haemoglobin A1c, and interaction of haemoglobin A1c by fasting blood glucose with degrees of freedom $3.2, 2.7, 3.6, 2.3 and 3.7$ , respectively. The interaction of age by gender and cholesterol level by gender appear non significant for male patients. The result from the interaction of haemoglobin A1c (HbA1c) by fasting blood glucose (FBG) showed that the risk of DR is high when the level of HbA1c and FBG were simultaneously high.

Conclusion

Clinical variables related to people with type II diabetes mellitus were strong predictive factors of DR. Hence, health professionals should be cautious about the possible nonlinear effects of clinical variables, interaction of clinical variables, and interaction of clinical variables with sociodemographic variables on the log odds of having DR. Furthermore, to improve intervention strategies similar studies should be conducted across the country.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01794-4

Patient-reported outcome measures for physical function in cancer patients: content comparison of the EORTC CAT Core, EORTC QLQ-C30, SF-36, FACT-G, and PROMIS measures using the International Classification of Functioning, Disability and Health

Abstract

Background

Patient-reported physical function (PF) is a key endpoint in cancer clinical trials. Using complex statistical methods, common metrics have been developed to compare scores from different patient-reported outcome (PRO) measures, but such methods do not account for possible differences in questionnaire content. Therefore, the aim of our study was a content comparison of frequently used PRO measures for PF in cancer patients.

Methods

Relying on the framework of the International Classification of Functioning, Disability and Health (ICF) we categorized the item content of the physical domains of the following measures: EORTC CAT Core, EORTC QLQ-C30, SF-36, PROMIS Cancer Item Bank for Physical Function, PROMIS Short Form for Physical Function 20a, and the FACT-G. Item content was linked to ICF categories by two independent reviewers.

Results

The 118 items investigated were assigned to 3 components (‘d – Activities and Participation’, ‘b – Body Functions’, and ‘e – Environmental Factors’) and 11 first-level ICF categories. All PF items of the EORTC measures but one were assigned to the first-level ICF categories ‘d4 – Mobility’ and ‘d5 – Self-care’, all within the component ‘d – Activities and Participation’. The SF-36 additionally included item content related to ‘d9 – Community, social and civic life’ and the PROMIS Short Form for Physical Function 20a also included content related to ‘d6 – domestic life’. The PROMIS Cancer Item Bank (v1.1) covered, in addition, two first-level categories within the component ‘b – Body Functions’. The FACT-G Physical Well-being scale was found to be the most diverse scale with item content partly not covered by the ICF framework.

Discussion

Our results provide information about conceptual differences between common PRO measures for the assessment of PF in cancer patients. Our results complement quantitative information on psychometric characteristics of these measures and provide a better understanding of the possibilities of establishing common metrics.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-022-01826-z

Impact of sampling and data collection methods on maternity survey response: a randomised controlled trial of paper and push-to-web surveys and a concurrent social media survey

Abstract

Background

Novel survey methods are needed to tackle declining response rates. The 2020 National Maternity Survey included a randomised controlled trial (RCT) and social media survey to compare different combinations of sampling and data collection methods with respect to: response rate, respondent representativeness, prevalence estimates of maternity indicators and cost.

Methods

A two-armed parallel RCT and concurrent social media survey were conducted. Women in the RCT were sampled from ONS birth registrations and randomised to either a paper or push-to-web survey. Women in the social media survey self-selected through online adverts. The primary outcome was response rate in the paper and push-to-web surveys. In all surveys, respondent representativeness was assessed by comparing distributions of sociodemographic characteristics in respondents with those of the target population. External validity of prevalence estimates of maternity indicators was assessed by comparing weighted survey estimates with estimates from national routine data. Cost was also compared across surveys.

Results

The response rate was higher in the paper survey (n = 2,446) compared to the push-to-web survey (n = 2,165)(30.6% versus 27.1%, difference = 3.5%, 95%CI = 2.1–4.9, p < 0.0001). Compared to the target population, respondents in all surveys were less likely to be aged < 25 years, of Black or Minority ethnicity, born outside the UK, living in disadvantaged areas, living without a partner and primiparous. Women in the social media survey (n = 1,316) were less representative of the target population compared to women in the paper and push-to-web surveys. For some maternity indicators, weighted survey estimates were close to estimates from routine data, for other indicators there were discrepancies; no survey demonstrated consistently higher external validity than the other two surveys. Compared to the paper survey, the cost saving per respondent was £5.45 for the push-to-web survey and £22.42 for the social media survey.

Conclusions

Push-to-web surveys may cost less than paper surveys but do not necessarily result in higher response rates. Social media surveys cost significantly less than paper and push-to-web surveys, but sample size may be limited by eligibility criteria and recruitment window and respondents may be less representative of the target population. However, reduced representativeness does not necessarily introduce more bias in weighted survey estimates.

https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-023-01833-8

Developing a Bayesian hierarchical model for a prospective individual patient data meta-analysis with continuous monitoring

Abstract

Background

Numerous clinical trials have been initiated to find effective treatments for COVID-19. These trials have often been initiated in regions where the pandemic has already peaked. Consequently, achieving full enrollment in a single trial might require additional COVID-19 surges in the same location over several years. This has inspired us to pool individual patient data (IPD) from ongoing, paused, prematurely-terminated, or completed randomized controlled trials (RCTs) in real-time, to find an effective treatment as quickly as possible in light of the pandemic crisis. However, pooling across trials introduces enormous uncertainties in study design (e.g., the number of RCTs and sample sizes might be unknown in advance). We sought to develop a versatile treatment efficacy assessment model that accounts for these uncertainties while allowing for continuous monitoring throughout the study using Bayesian monitoring techniques.

Methods

We provide a detailed look at the challenges and solutions for model development, describing the process that used extensive simulations to enable us to finalize the analysis plan. This includes establishing prior distribution assumptions, assessing and improving model convergence under different study composition scenarios, and assessing whether we can extend the model to accommodate multi-site RCTs and evaluate heterogeneous treatment effects. In addition, we recognized that we would need to assess our model for goodness-of-fit, so we explored an approach that used posterior predictive checking. Lastly, given the urgency of the research in the context of evolving pandemic, we were committed to frequent monitoring of the data to assess efficacy, and we set Bayesian monitoring rules calibrated for type 1 error rate and power.

Results

The primary outcome is an 11-point ordinal scale. We present the operating characteristics of the proposed cumulative proportional odds model for estimating treatment effectiveness. The model can estimate the treatment’s effect under enormous uncertainties in study design. We investigate to what degree the proportional odds assumption has to be violated to render the model inaccurate. We demonstrate the flexibility of a Bayesian monitoring approach by performing frequent interim analyses without increasing the probability of erroneous conclusions.

Conclusion

This paper describes a translatable framework using simulation to support the design of prospective IPD meta-analyses.

Upsells

Choose Pricing Plan:

$49.99

An Introduction to Qualitative Research

Buy Now
$49.99

Certificate in Research Methods (old course)

Buy Now
$14.99 / month

Membership

Buy Now