The subject of heterogeneity (mixed~up~ness) in systematic reviews is tricky. A bit like ‘significance‘ you can think about it as both a clinical and statistical concept, and in the same way, you can get results that aren’t always concordant.
Many old lags will remember a blog post about a statistically significant association between platelets and renal involvement in HSP. There, there was a statistical association that’s unlikely to be due to chance, but is clinically irrelevant.
The same queries need to be asked of heterogeneity within studies.
Ask first: are the clinically different, and any attempt to add them up is daft, or are they similar enough to try to combine? Then, if they make sense, you should look at if the results look like they are similar, or different, and perhaps have a look at the statistical measures of heterogeneity (which in this setting means “more different than you’d expect by chance”).
But a quick “hold on” before the answer arrives too simply.
What does ‘clinically too mixed up mean’? Remember, the differences between the studies – to make them too heterogenous – should mean that we expect the treatment* to actually have a different effect in the different study groups.
Then if you can explain why they should, and why you’d be expecting to need to do something different with those groups of patients, you can quite reasonably say that any sort of lumping is the wrong thing to do and you’ll ignore any meta-analytic results that emerge.
* clearly, its not always treatment but give a writer a break…