Top 10 Research Articles of 2024, #1 – 5

We have seen some truly excellent articles published in the BMJ Quality and Safety in 2024. The articles discussed in this blog represent the best of these, and have been selected based on engagement metrics and scores assigned by the editorial board. Further information about the process used to select these papers can be found in a past blog. This post will discuss the articles ranked first to fifth. The articles ranked six to tenth were discussed in a previous blog.

We would like to express our thanks to the authors of these excellent papers and to all of the BMJ Quality and Safety team who helped during the selection process – this was not an easy task!

5 – Care Under Pressure 2: a realist synthesis of causes and interventions to mitigate psychological ill health in nurses, midwives and paramedics Accompanying editorial available here

“Hard hats and protective equipment and protective equipment are mandatory on building sites” – the quote from this paper’s discussion perfectly illustrates the contrasting approaches taken by organisations to mitigate occupational risks to physical ill health, compared with psychological ill health. We have known about the risks of psychological ill health in healthcare workers for decades, and the Covid-19 pandemic brought these into laser sharp focus. Despite this, the problem remains endemic. The personal and organisational consequences of psychological ill health are substantial. For example, the 2024 National Health Service (NHS) Staff Survey in England suggested that over 40% of staff felt unwell due to work related stress within the last 12 months. Psychological ill health leads to presenteeism, absenteeism, and loss of healthcare staff and is estimated to cost the NHS over £12 billion per year. This review posed two broad questions – how, why and in what contexts do nurses, midwives, and paramedics experience work-related psychological ill health, and are existing interventions insufficient to mitigate it?

The review was informed by the findings of “Care Under Pressure 1”, a realist review of interventions to address mental ill-health in doctors and medical students. It was also informed by key reports from expert solicitation, and relevant literature. The authors of the review highlighted the negative consequences of blame cultures. Blame cultures prevent staff from speaking up and taking responsibility for mistakes, and have significant implications for patient safety and psychological wellbeing of staff.  On the contrary, where there are psychologically safe cultures in the workplace, staff feel empowered to speak up and learn from mistakes. The review also highlights the impact of the ‘serve and sacrifice’ culture in healthcare, where the needs of the system are prioritised at the expense of health care workers’ needs. This concept is all too familiar and expresses itself in various ways – pressure to take on extra-contractual work, the normalisation of high workloads and insufficient rest, and the moral distress of delivering sub-standard care because of organisational pressures and staff shortages. They found that interventions were frequently focussed on individuals and individual behaviours, with limited recognition of wider systemic factors such as the work environment. This may reflect the comparative difficulty of initiating and sustaining organisational change given the resource-constrained environment health care is delivered in, but does not detract from its importance. Finally, the review highlights the challenges of designing complex interventions to reduce psychological ill health in a way that meets the dynamic and diverse needs of workers. Fundamentally, psychological ill health is an occupational risk in health care workers. Despite the challenges, more must be done to ensure that working environments prevent psychological ill health, and support those who are experiencing it.

4 – Artificial intelligence-powered chatbots in search engines: a cross-sectional study on the quality and risks of drug information for patients Accompanying editorial available here

Patients frequently leave medical consultations with unanswered questions and conduct their own further research. The information they gather is likely derived from diverse sources including their support networks, relevant literature, and internet search engines. With the recent proliferation of Large Language Model (LLM) powered chatbots in search engines, it is inevitable that patients will use them to seek health information. However, they are not without risk – they are known to ‘hallucinate’ and to provide false information. This study sought to investigate the potential safety risk of patients acquiring drug information from such chatbots.

The authors posed 10 questions about each of 50 commonly prescribed drugs to Bing copilot. The answers given by the chatbot were evaluated in terms of their readability, completeness and accuracy, and the extent to which they were congruent with scientific consensus or could cause patient harm. Most answers provided by the chatbot required readers to have reasonably high levels of education to understand the text, with one question assessed as requiring college graduate level education. Completeness and accuracy metrics were based on a comparison of the content with answers derived from drugs.com, which is a peer-reviewed and up-to-date source of information. Most answers provided by the chatbot were complete, but there was variability between questions. Less-specific questions such as ‘what do I have to consider when taking the drug?’ were much less likely to be answered completely. Accuracy was also high across most questions, but not all. For example, the mean accuracy of answers to a question about whether the drug can be used in renal failure was only around 70%. The authors next took a subset of 20 chatbot answers based on low accuracy, low completeness, or their potential to pose a risk to patient safety. These answers were assessed by a group of medication safety experts to rate their alignment with scientific consensus, and their potential to cause patient harm. Amongst these answers, 39% opposed scientific consensus, and 22% were considered to have the potential to lead to severe harm or death if advice were followed.

This study clearly shows the potential dangers of patients using LLM chatbots to acquire medical information – it is imperative that patients are made aware of such risks. In day-to-day life, patients are likely to interact with chatbots more interactively, asking follow-up questions rather than single queries. Future research could explore whether these interactions are likely to increase or mitigate such risks from chatbots, and explore how these risks are related to health literacy. While there are some safety risks posed by artificial intelligence that may be mitigated through robust legal and regulatory frameworks, general-purpose LLM chatbots pose unique regulatory challenges. Whilst they are not specifically designed for medical use, many do provide medical information and the frameworks needed to regulate such technology is not yet clear.

3 – Locum doctor working and quality and safety: a qualitative study in English primary and secondary care Accompanying editorial available here.

Locum doctors are a transitory workforce that can be used to fill staffing gaps and bolster capacity at times of increased demand. High turnover in the medical workforce has increased organisational reliance on temporary staff in many countries. Several concerns have been raised about increasing dependence upon the locum workforce, including higher costs and potential implications for patient safety and quality. This qualitative study aimed to explore the relationships between locum working arrangements and quality and safety in the English NHS.

The authors conducted 130 semi-structured interviews and focus groups with locum doctors, their colleagues and patients across primary and secondary care, locum agencies, and statutory NHS bodies. The findings highlight how, given the transitory and often short-term nature of locum work, unfamiliarity with the work environment can have negative consequences. Locum doctors may find it harder to find and use critical equipment, and may have less knowledge about support structures within the organisation. They are also less likely to be integrated within the clinical team and may not be involved in team development exercises. Given the central role of multi-disciplinary teams in the delivery of healthcare, this can pose challenges. Locum doctors also report negative attitudes and behaviours directed towards them by other staff. As a result, some locums can feel stigmatised, marginalised, and excluded. There was an indication that clinical decision making may be systematically different amongst locum doctors, and negative attitudes directed towards them may contribute. Locums perceive that they are likely to be scapegoated if things go wrong, and therefore practise more defensive medicine. Finally, governance arrangements are not as robust for locums compared to permanent employees, and this makes it more difficult to ensure competency prior to employment.

This study highlights some of the potential quality and safety issues surrounding locum doctor working arrangements. As outlined in the accompanying editorial, this study clearly articulates the current situation but questions remain about how these challenges can be overcome. Based on the themes identified in this study there are several potential solutions. It is important to note that there is a clear need for the locum workforce and it is here to stay. However, conscious effort should be made to optimise the balance between substantive and temporary staff. Efforts also need to be made to ensure that locums are integrated within clinical teams and do not face discrimination, and that governance arrangements are improved.

2 – The good, the bad and the ugly: What do we really do when we identify the best and the worst organisations? 

Measuring organisational performance accurately is both complex and consequential. Organisations deemed to be poor performers may be subject to scrutiny and high-profile inquiries to investigate their failings, while high performing organisations may receive financial incentives and be used as exemplars of good practice. Given the consequences, it is imperative that organisations that are deemed to be poor performers, are actually poor performers.

This study explores the importance of reliability when measuring organisational performance. In this context, reliability refers to how consistently a measurement reflects true performance, rather than noise. High reliability suggests that the observed scores are more reflective of true performance. The authors assess the performance of two commonly used methods (standard z-scores & overdispersed z-scores) for measuring organisational performance, and use simulation to input differing amounts of noise within the data. They found that when reliability was very low, organisations are flagged as the best and worst performers almost randomly, irrespective of the method used. Using standard z-score methods, organisations identified as being the best and worst performers were actually performing averagely, even when reliability was high. Overdispersed z-score methods, which are considered gold standard, were able to identify organisations that were truly the best or worst when reliability was high. However, when reliability was low, misclassification was common. Based on their results, the authors suggest a minimum reliability of 0.7 to reduce the risks of misclassification.

This study provides compelling evidence that reliability should be considered when selecting measures to be used as performance indicators. Reliability is inevitably more difficult to measure with real world data, but given these findings, if we are to identify the best and worst performing organisations accurately then conscious effort must be made. These results also highlight the need to interpret such measures cautiously and in context with other factors.

1 – Implementation of an enhanced recovery after surgery protocol for colorectal cancer in a regional hospital network supported by audit and feedback: a stepped wedge, cluster randomised trial. Accompanying editorial available here

Enhanced recovery after surgery (ERAS) protocols are holistic perioperative care pathways that standardise perioperative care and promote early recovery. The authors note that these protocols challenge traditional surgical doctrine, likely contributing to the lack of universal adoption. This study aimed to evaluate the effectiveness of an audit and feedback intervention to support implementation of an ERAS protocol for colorectal cancer in the Piemonte region of Italy.

The intervention involved the formation of a local ERAS team to support implementation, and feedback to the team about implementation progress. Twenty-nine of the thirty-six general surgery departments in the region participated. The authors used a stepped wedge cluster randomised trial in which all departments started with a 3-month baseline period, and then the intervention was rolled out across departments every 3 months until it was implemented in all departments. The intervention was associated with a 13% increase in compliance with the ERAS protocol, and a 0.6 day decrease in length of hospital stay. Postoperative complications were broadly similar in the intervention and control groups, and there was no change in 30-day hospital readmission.

This study used a robust yet pragmatic methodology to evaluate the impact of a relatively simple and low-resource intervention that could be used in other settings. They showed that audit and feedback can result in modestly increased compliance with ERAS protocols and influence outcomes including length of hospital stay. The results also underscore the challenge of initiating cultural change in clinical practice – even with concerted efforts to improve compliance, approximately one third of clinical management remained non-compliant with the ERAS protocol. However, such change is worth working for, particularly when recommendations are grounded in high quality evidence.

(Visited 240 times, 1 visits today)