### Epistemology of Causal Inference in Pharmacology. Towards a Framework for the Assessment of Harms

Posted on Updated on

Jürgen Landes, Barbara Osimani and Roland Poellinger

Published in EJPS (2017): http://doi.org/10.1007/s13194-017-0169-1

Recorded presentation of the framework on LMUcast (R. Poellinger)

Abstract:

Philosophical discussions on causal inference in medicine are stuck in dyadic camps, each defending one kind of evidence or method rather than another as best support for causal hypotheses. Whereas Evidence Based Medicine advocates invoke the use of Randomised Controlled Trials and systematic reviews of RCTs as gold standard, philosophers of science emphasise the importance of mechanisms and their distinctive informational contribution to causal inference and assessment. Some have suggested the adoption of a pluralistic approach to causal inference, and an inductive rather than hypothetico-deductive inferential paradigm. However, these proposals deliver no clear guidelines about how such plurality of evidence sources should jointly justify hypotheses of causal associations.  In this paper, we develop the pluralistic approach along Hill’s (1965) famous criteria for discerning causal associations by employing Bovens’ and Hartmann’s general Bayes net reconstruction of scientific inference to model the assessment of harms in an evidence-amalgamation framework.

The Bayesian network we here propose allows the amalgamation of various pieces of evidence from heterogeneous sources and methods and to provide an overall estimate of the causal hypothesis. In particular, our approach

1. identifies possible indicators of causality on the basis of the methodological and philosophical literature on causality, evidence, and causal inference;
2. embeds them in a topological framework of probabilistic dependencies and independencies grounded in assumptions regarding their reciprocal epistemic interconnections;
3. weakly orders some of these probabilistic dependencies as a function of their inferential strength with respect to the confirmation of causal hypotheses.

For this, we adopt the Bovens and Hartmann proposal using Bayesian confirmation theory in order to account for (and mathematically explain) some phenomena related to scientific inference; such as the confirmatory power of the coherence of the body of evidence, the epistemic interaction of consistency of measurements and reliability of information sources, as well as the modular contribution of different “lines of evidence” related to diverse observable consequences of the investigated hypothesis. We adapt this framework to situations of causal inference and consequently specified a concrete structure for that purpose. We the illustrated its epistemic and heuristic virtues as an instrument for evidence amalgamation, in the context of causal inference of drug-induced harm.

Our approach thereby satisfies the desiderata listed at the end of the Section on Decision Making (see below): probabilistic hypothesis confirmation, incorporation of heterogeneous kinds of data, facilitation of diverse types of inferential patterns (more on this on future work below) with a particular focus on causal assessment in pharmacology.

Summary

Decision Making in Pharmacology

The decision to withdraw the drug or not, depends on some threshold which reflects the nature of the medication, the pharmaceutical environment (i.e., the availability of alternative treatments for the same condition), policy and ethical dimensions, as well as the perceived acceptability of the risk.

In order to avoid commitment (for the moment) to any of the many notions of causation offered by the philosophical-methodological literature we use the formula: D©H in order to express the proposition: “D causes H”, sometimes abbreviated by © when no ambiguities arise. Hence by adopting the classical cost-effectiveness analysis formula, we can infer the probability threshold for causality p*, at which the expected utility of withdrawing equals the expected utility of keeping the drug in the market.

Let w stand for the act of withdrawing the drug D from the market while ¬w stands for not withdrawing D. The utility of (not) withdrawing given that © or ¬© holds is denoted by the two-place utility function U. At the probability threshold p*, the expected utility for withdrawing the drug equals the expected utility for not withdrawing it. p* can hence be obtained by solving

for p*. Therefore, we can find p* to be determined by the utilities

see Figure~\ref{fig:XX}. \caption{p* partitions degrees of belief in © into two intervals.}

There is a fact of the matter: either © is true or the opposite holds. So, in order to make the best decision it is necessary and sufficient to adopt degrees of belief which fall in the interval between p* and the truth value of © — where a truth value of 1 stands for true’ and a truth value of 0 stands for false’. Therefore, p* allows for a certain margin of error; however, the chances to fall into the right interval get higher and higher the more evidence one takes into account: the more one “samples” from reality, the closer one’s beliefs get to the truth (Edwards 1963). This is the standard justification of inductive inference (Carnap 1947, Howson2006); we do not enter into the related philosophical debate here. However, nothing hinges on the particular philosophical position about truth.

A straightforward consequence of this state of affairs is that there is a need of instruments which allow a probabilistic assessment of the suspected causal link between drug and side-effect, by taking into account all available evidence at the time of decision. In particular, four desiderata are essential for a framework of causal assessment of drug induced harm:

1. It must allow for probabilistic hypothesis confirmation.
2. It must be able to incorporate heterogeneous kinds of data.
3. It must be able to integrate diverse types of inferential patterns, in order to optmise the epistemic import of available evidence.
4. The framework should be particularly focused on causal assessment in pharmacology and therefore consider the specific issues which arise in this context.

Philosophical underpinnings of Hill’s Criteria

In the following, we discuss the rationales which epistemologically underpin Hill’s viewpoints on causality by appealing to the philosophical literature, and derive our list of indicators of causality. While there may be further indicators of causality (in pharmacology), we think that those presented below are the most pertinent ones discussed in the philosophical literature. Figure XXX presents a mapping from Hill’s list onto our framework which will be discussed in detail below. Not every viewpoint is mapped onto an indicator of causality in our sense — e.g., we locate analogy among the set of inferential patterns.

Neither coherence of evidence nor the subsequent viewpoints listed by Hill (“experiment” and “analogy”) are strictly speaking indicators of the presence of causal relationships themselves. Rather, they refer to the inferential/methodological process itself, and they appeal to particular methods (experimental), or kinds of reasoning (“analogy”), or theoretical/epistemological virtues (“coherence”) which may be adopted to “optimise causal inference”.

Strength of association

Strength of association between a putative cause and an effect results from diverse (statistical) phenomena. Hence, we relate Hill’s strength of association to the following phenomena:

• Probabilistic Dependence: causality has been interpreted by some philosophers in probabilistic terms. In such a framework causes change the probability of their effect. According to (Reichenbach 1971), probabilistic dependencies can only be explained by the presence of some causal connection.
• Rate of Growth: Strength of the association as measured for instance by the (regression) coefficient, indicates how much, changes in the putative cause variable (treatment or exposure) bring about changes in the effect variable. In this respect, strength refers to the functional relationship itself, independently of whether it holds universally or only in some subgroup (see below the notion of stability). We refer to a strong association of this kind as “high rate of growth” or simply “rate of growth”.
• Dose-Response relationship: the presence of a dose-response relationship between a cause x and an effect y, dy/dx not equal to zero (for most x in the domain), suggests that there is a systematic relationship between the putative cause and the effect. The relationship can be linear or nonlinear (in which case it can be monotonic or non-monotonic).

A large difference in effect size, may result from the following two facts: 1. The amount of change in the effect per unit produced by the cause is large (high rate of growth). 2. The causal relationship holds under various background conditions (i.e., in various subgroups). This phenomenon has also been termed “stability” (Woodward, Pearl).

Dose-response relationship and (high) rate of growth are epistemically related not only because the latter conceptually subsumes the former, but also because the higher the rate of growth, the more likely it is that both will be detected in observational or experimental studies. However, it is important to not conflate them, because they point to distinct properties of the causal relationship.

Consistency

Consistency relates to replication of results in studies with identical methods, or in (systematically) varied study settings.

Systematic variation of study design and setting serves to provide evidence that the result is not an artifact of the particular circumstances in which a given study has been carried out, or of the particular method, or theoretical model adopted, and related assumptions (e.g., sensitivity threshold). Hence, the role of consistency should be distinguished along the following lines:

1. Replication of (ideally) identical studies (same “background conditions” — same inclusion and exclusion criteria, mode of administration/exposure, etc. — and same design, e.g., cohort study, RCT, etc.): this is a means to increase accuracy of measurement.
2. Replication of the observation through different methods, but analogous background conditions: this tests the results against the suspicion of being created by the specific study design/setting (“study artifacts”) and guarantee methodological robustness.
3. Replication of the observation through similar methods, but in different background conditions: this tests the stability of the causal link itself  in different populations/circumstances and show the extent to which it is “ontologically robust” \cite{Wimsatt2012,OSC2015,Meehl1990,Woodward2006}.

Consequently, from a confirmatory point of view, consistency across studies may mean very different things depending on whether such studies share the same design, the same kind of population or both, see Table XXX MAYBE.

Specificity: quantitative and qualitative versions

Specificity also refers to diverse phenomena. The traditional concept of specificity (following Hill) approximates the classic conception of a cause as a necessary and sufficient condition for its effect to occur, in contrast to the possibility of it being produced by other candidate causes. It is a sort of bi-conditional relationship where knowledge of the cause event (in a given class of events C), ideally predicts the effect event (in another class of events E), and vice versa.

Specificity as a property of causality has been discussed by (Lewis2000,Waters2007,Woodward2010) as the sort of functional relationship which systematically holds between the values of a variable and the values of another variable, such that changing the value of the former in specific ways also changes the values of the latter in specific ways, ideally in a bijective fashion, see for instance (Lewis2000). It is very unlikely that such bi-conditional relationships can occur by chance alone, therefore, specificity may be considered as an indicator of a causal association being present.

Another, related notion of specificity refers to the “geometrical complementarity” on which many biological phenomena are based; such as the key-hole relationship between antigens and antibodies, or between target receptors and drug molecules see (Weber2006) also known as “affinity”.

Indeed, affinity is a function of stereometric properties (structural and biological similarity); see for instance (Xie2009). This kind of specificity pertains to mechanistic reasoning, hence it should be considered insofar as it is used to glean causal evidence from knowledge about mechanisms (see below, biological plausibility: evidence of mechanisms).

Specificity and stability are independent concepts: a causal relationship may be at the same time highly specific and unstable (e.g., multi-factorial genetic diseases), or it may be highly stable but show a low degree of specificity (e.g., intestinal inflammation caused by various kinds of antibiotics).

Neither stability nor specificity will be used in our framework to distinguish different types of causes, but rather as signs of the presence of a causal connection.

In our framework, specificity is encoded as difference-making, since those very studies aiming at detecting causal efficacy of a drug under investigation also yield information about the variance of the effect under different tests. Observing a bijective difference making relationship between a class of causes and a class of effects is taken to raise the probability that the association between the cause–effect pairs is in indeed causal. Evidence about stability may be gleaned through studies showing the same cause–effect association in different sample populations.

Temporal order, distance, and duration

In his question “which is the cart and which the horse?”, Hill addresses the aspect of temporal precedence, seen as one of the most important markers of causality. The importance of this criterion is mirrored by the fact that many theorists of causality consider it a necessary prerequisite, to postulate alignment of the causal and the temporal direction, or even explicitly incorporate it in their formal framework. Precedence in time is considered so essential to causes that Russel bases his denial of their existence on the temporal symmetry of laws in physics (Russell1912).

(Suppes1970) goes beyond Reichenbach’s common cause principle in explicitly building on the direction of time in his probabilistic definition of causality: an event genuinely causes a subsequent second event if it is identified as a “prima facie cause” — i.e., it precedes the effect and raises its probability — and guaranteed not to be a side effect, i.e., an instance of “spurious causation”. Hence, although temporal precedence is a necessary condition for causality, it is not sufficient for it because of the possibility of confounding (reverse causation, residual confounding, confounding by indication).

In our framework, temporality is expressed straightforwardly by a temporality variable to sum up the above distinctive criteria.

Related to specificity as influence is also the dose-response relationship, where a clear pattern of quantitative dependency manifests. Quite in line with Lewis’ idea of fine-grained relevance as a characteristic feature of causation (Lewis2000), Hill sees it as a strong indicator of a causal relation if an investigation reveals a biological gradient, in that a clear dose-response curve admits a simple explanation.

Drawing causal inferences from functional or statistical relations alone is a hard task and in many cases not feasible. If a functional description (like a structural equation) or a statistical connection (like a high measure of covariance) is available though (and has proven stable), it can be used for intervention and prediction — two hallmarks of causal knowledge. Although David Freedman criticises the Spirtes-Glymour-Scheines approach (Spirtes et al. 2000) towards automatically inferring causal claims from raw data, he points precisely to the practical use of formal dose-response relations when he writes that “[t]hree possible uses for regression equations are (i) to summarise data, or (ii) to predict values of the dependent variable, or (iii) to predict the results of interventions” (Freedman 1997, p. 62).

In our framework, the existence of a dose-response model is an important indicator of causality especially because the detection of a clear dose-response curve (e.g., a linear relation between treatment or exposure and observed effect values with small error terms) is best explained by positing a truly ontological influence structure along Reichenbach’s principle: The variables are either directly causally related or correlated due to the presence of a common cause. Further research (e.g., RCTs) and the inclusion of other causal indicators (e.g., temporal order) will be useful in disambiguating explanatory candidates.

Inferential Patterns

Whereas Hill does not make an explicit epistemic distinction among his guideline, we demarcate causal indicators (points 1 to 5 above) from inferential patterns: plausibility of biological mechanisms, coherent evidence, support by experiment and analogy.

Plausibility of the biological mechanism

The role of evidence about possible/plausible/actual mechanisms (Craver2007,Darden2006,Machamer2000) linking the putative cause to its phenotypic effect is strongly debated in philosophy, especially in relation to evidence standards. Philosophers closer to the Evidence Based Medicine approach, even in recognising some value to knowledge about mechanisms, still doubt that they can complement statistical black-box evidence because of the limited and fragmentary knowledge of the “causal web” in which they are embedded (see for instance (Howick2011)). Other philosophers instead generally recognise that knowledge about mechanisms plays a plurality of roles both in combination with statistical information and in a stand-alone fashion.

In the epidemiological literature, especially in narrative reviews, it is typical to combine evidence about mechanisms with statistical evidence at the population level, as an argumentative move in favour of causal hypotheses which are only suggested by statistical associations. This latter evidence is meant to provide support for the “physical” connection, if the observed statistical association was due to a causal association. In particular, concerning risk assessment, the role of evidence about mechanisms of chemical substances has been recently analysed by (Lujan2015). Two issues are particularly relevant for causal inference: 1) the questioned applicability of animal data in humans; 2) the lack of guarantee that similarity of modes of action may warrant extrapolation of phenotypic effects from one chemical to another. Indeed, both issues relate to the problem of extrapolation: the former regards whether a given chemical will produce the same effect in the study and in the target population; the latter refers to whether similar chemicals produce similar effects (on a given population). We address the problem of extrapolation by considering the following: I. In the case of risk assessment the focus is on false negatives, that is, on discovery and risk detection; hence any signal should be accounted for as a possible sign unveiling latent risks — if it happened here, it can also happen elsewhere. II. Warrant for extrapolation is also taken to come in degrees and therefore  incorporated in a probabilistic approach. This lets the degree of confidence in such warrant guide the decision at hand in combination with other available evidence and with the degree of expected harmfulness.

The plausibility part of the “plausibility of mechanisms” indicator refers to the general fit of the hypothesised mechanisms to available background knowledge and this leads us directly to the subsequent “viewpoint” on causality (as Hill would call it), namely: “coherence of evidence”.

Coherent evidence

Coherence is a property of the body of evidence, rather than of the phenomenon under investigation (here, the causal link between drug and side effect).

In the epidemiological literature, consistency and coherence are not explicitly distinguished and evaluation of the latter is left to informal/implicit judgment in narrative reviews. Instead, the philosophical literature has investigated coherence in several respects: 1) general epistemology offers “coherentism” as a response to skepticism in alternative to “foundationalism”: according to this view beliefs are justified by their fitting together in a system, and standing in a relation of mutual support \cite{BonJour2010}, like the stones of an arc (simul stabunt, simul cadent); 2) formal epistemology has investigated the confirmatory value of coherence of beliefs, also by developing various measures of coherence, in the attempt both to formalise its content and to track its truth-conduciveness; see for example (Bovens2003,Crupi2014,Dietrich2005,Fitelson2003,Moretti2007).

As a theoretical virtue, coherence is particularly relevant in risk assessment, where evidence may come from different sources and relate to diverse levels/dimensions of the suspected causal association between drug and side-effects.

Support by experiment

The scientific method strongly relies on systematic observation and experiment. In particular, carefully controlled experiments are considered a privileged way to inquire nature, in that they ideally allow the scientist to isolate the phenomenon under investigation from interfering and disturbance factors. In clinical medicine randomized controlled studies represent the standard approach to testing causal hypothesis experimentally.

Since (Rubin 1974), the standard conceptualisation of causal claims resulting from RCTs (and comparative studies) is counterfactual: the “causal effect” is the difference between what would have happened to the subject, had it been exposed to the treatment and what would have happened to it, had it been exposed to the control. Since the subject cannot undergo the same experimental conditions at the same time, the causal effect is calculated as the average difference of the effects observed in the group of exposed and the group of unexposed subjects (this is also known as the “potential outcome approach” to causal inference).

Causality has indeed been analysed in terms of counterfactuals in several respects. Lewis (Lewis 1973,Lewis 1986,Lewis 2000) proposes a possible-worlds semantics for the truth conditions of individual causal claims in terms of counterfactual dependence; Woodward (Woodward2003) identifies necessary and sufficient conditions for causality in terms of invariance under intervention, where his notion of intervention captures the counterfactual gist of causal graphs and structural equation modeling. Finally, (Pearl 2000) focused on counterfactuals related to potential effects of interventions, and relies on causal knowledge to predict the effect of such interventions (e.g., policy interventions). Hence, counterfactuals have different roles in analysing causal claims, defining causality or using causal knowledge for predicting the effect of interventions. However, they share the intuition that a cause must make some difference to whether the effects occur or not (holding other variables fixed).

In our framework, we identify RCTs as particularly reliable sources of evidence for the difference-making effect attributed to causes by the philosophical literature; and difference-making itself will be taken on its turn as a specific causal indicator.

Support by analogy

Hill briefly mentions reasoning by analogy as additionally contributing to the assessment of the causal claim: if a specific drug is on trial, available evidence of a similar second drug’s effects might be used for inference about the former. This touches upon central and notorious epistemological questions: What does it mean to be sufficiently similar in the case under consideration? In what way does the difference between the first and the second drug influence changes in expected outcome values? How specific are a drug’s properties? If they are highly specific — to what extent can this drug be used in an analogical argument, if at all? Although similarity seems to be a concept difficult to spell out in formal terms the applicability and fruitfulness of parallel reasoning is of great interest (Bartha 2010) and (Hesse1959)), and analogical arguments are employed across disciplines. When coupled with a suitable theory of confirmation, analogy can finally be used to support a scientific hypothesis where only evidence from an analogue system is obtained (see, e.g., (Hesse 1964) or (Beebe & Poellinger forthcoming)).

In epidemiology, explanation and prediction by analogy rest both on sufficiently well-described background conditions and knowledge about the relevant biological mechanisms at work. Describing all relevant differences between two drugs might be the first step towards justifying assessment by analogy — the second step might then be inference in a unified model where all the relevant differences are integrated as parameters. Formal models relating different pieces of evidence can precisely be of help for this task. Once relevant influences are distinguished from irrelevant ones and the contribution of differences in the relevant factors are determined, analogy will justifiedly help in identifying causation.

Reasoning by analogy is also at the basis of inductive inferences from study to target population. Indeed, because of the context sensitivity of many causal associations in the biological realm, these can hold only in specific populations, and therefore evidence about causal effects related to one population may not license similar conclusions about another population, unless the two population are analogous.

A Probabilistic Framework for Causal Assessment

By borrowing the quite general reconstruction of scientific inference from Bovens & Hartmann we import their direct and indirect probabilistic dependencies and conditional independencies (see our summary of the relevant independencies from (Bovens2003) in Section 4.2e). Beyond that, our choice of causal indicators and subsequently their formalisation as variables (i.e., nodes in the graph) requires us to make the theoretical, implicatory dependencies transparent by expressing them as links in the network. The following list motivates our modelling choices for the formalisation of theconceptually related indicators RoG, PD, DR, and Δ:

1. We model the implication relations between the four indicators as edges on the second level of our network: the detection of a high rate of growth implies a dose-response relationship which in turn means that the variables under consideration are probabilistically dependent.
Note that the edges on the second level are not a  superfluous addition: If probabilistic dependence is measured but the causal hypothesis is known to be false (i.e., © is fixed to FALSE), the variables PD, DR, and RoG remain dependent since they overlap conceptually. Inserting direct edges on the second level precisely expresses this overlap. The categorical independence between the hypothesis level and the indicator level becomes apparent in this structure: PD, DR, and RoG are not dependent via the © node, but directly linked to one another on the indicator level.
2. No direct edge links RoG and PD since an observed high rate of growth implies an observed dose-response which implies an observed probabilistic dependence in turn and mediates the inference from RoG to PD (in other words, DR screens off PD from RoG).
3. Our choice to not insert an edge between DR, RoG, PD and Δ reflects our intention to clearly demarcate the conceptual/methodological dividing line between observational/static and interventional/dynamic support for the causal hypothesis: © screens off Δ from the observational/static indicators. By observational/static’ we refer to inference from observation alone, whereas by interventional/dynamic’ we refer to inference from data collected in interaction with the investigated system or population. For example, this contrast becomes evident in the difference between standard probabilistic conditioning (which amounts to shifting the focus in a probabilistic model) and conditioning with Pearl’s do-operator (which amounts to transforming the probabilistic model). And formally: Δ is independent of DR, RoG, PD  given copyright. This principled distinction is already laid out in Hume’s famous twofold definition of causation which can be seen as a point of reference both for regularity/supervenience as well as for counterfactual/manipulationist theories of causation. Finally, all indicators listed above are imperfect ones, except for difference-making. Nevertheless, we are not collapsing Δ and © into a single node: Following the philosophical literature on causality, we consider that when a difference-making relationship between two events or variables holds, then this is a sufficient — although not necessary — condition for causality. This can be characterized in logical terms as an entailment relationship: Δ⊃ © . Hence, in our system, the probability of a causal relationship, given a genuine difference making relationship is 1: P(©|Δ) =1. The inverse entailment, © ⊃ Δ, does not hold however.
Although Δ — representing the possibility of ideal controlled variance — implies © in a definitional way, knowledge of © does not necessitate the existence difference-making — e.g., in cases of “holistic causation”. The latter case makes the conceptual divide even more obvious: If one knows the hypothesis to be true, learning that there is no difference-making would not change one’s belief in a positive dose-response. In this case the causal relation under investigation would then be explained as holistic causation. We are thankful to an anonymous reviewer for pointing this case out to us.

Note also that we are purposely choosing to direct the edge between © and M towards M: We understand the existence of a mechanism as a testable consequence of the causal hypothesis, i.e., as a constitutive element of © rather than a pre-requisite (or even somehow causally prior) — in accordance with all other indicator nodes.

The Figures~8–10 graphically represent aspects of the graph of our Bayesian network. Figure~8 displays epistemic dimensions at stake, Figure~9 shows the causal indicators, their reciprocal relations, and the studies which inform us about single indicators. Figure~10 depicts a case in which studies are informative about more than one indicator.

Success of Bayesian reasoning hinges on the choice of a suitable prior. Incorporating domain knowledge plays a major role in the choice of a prior. Domain knowledge may be elicited from experts. Elicitations of parts of priors from experts in a medical context has recently been reviewed in (Johnson2010). Determination of prior distributions combining expert opinion with historical data is reported in (Hampson2014).

While the prior has to satisfy the conditional independences discussed above and incorporate prior domain knowledge, there are further properties in the problem specification that a sensible prior has to satisfy; which we shall now discuss.

In (16) all report, reliability and relevance variables pertain to the same indicator variable Ind.

(9) means that the conditioning on one causal indicator boosts the belief in the causal hypothesis being true, while (10) means that conditioning on the negation of an indicator lowers the belief in the causal hypothesis. Similarly, the same holds in the presence of another instantiated indicator, see (11) and (12). The inequality in (11) is strict, if Ind_k is not a descendant of Ind_i. (13) and (14) express that probabilistic dependence is a weaker causal indicator than difference-making or high rate of growth. Consequently, conditioning on difference-making, dose-response relationship, or high rate of growth gives a greater boost to the belief in the causal hypothesis than probabilistic dependence. Vice versa, conditioning on the negation of probabilistic dependence reduces belief in the causal hypothesis more strongly than conditioning on the negation of a dose-response relationship or high rate of growth.

One reliable study, which is reliable and relevant for the target population, and finds that, say, there is probabilistic dependence between the drug an adverse drug reactions, significantly boosts the belief that there is probabilistic dependence in the target population; (15).

(16) formalises the following thought: Belief in the inconsistency of a report and the respective indicator is boosted if the study is either irrelevant, unreliable, or both (¬(Rel& Rlv) serves to explain the inconsistency).

Discussion

Limitations

Our model, as well as every other model, is a simplification of the phenomenon of interest which entails a regrettable but unavoidable loss of information. One simplification was the use of binary propositional variables which stands in tension with concepts such as probabilistic dependence and rate of growth which clearly come in degrees. For the sake of simple exposition and tractable calculation, we made this simplification. Nothing hinges on this, the machinery of Bayesian networks can be applied to non-binary variables in the same manner. In general, the more information available the stronger the need for variables with more values.

Furthermore, we have limited the model to indicators of causality in pharmacology which derive from Hill’s guidelines. We freely admitted in Section 3.2 that there may be further indicators. No suggestion is made here that all these further indicators can be accounted for in the model; in the outlook below we talk about a relevant inference which cannot be drawn within our model.

Virtues and Context

With respect to other proposals for evidence evaluation and causal assessment, our approach has the following virtues:

1.  Our method accommodates many intuitions already expressed by philosophers of medicine regarding pluralistic approaches to evidence evaluation.
2. Our method allows for many inferential patterns to contribute to the overall causal assessment of drug-induced harm (coherence, consistency, reasoning by analogy, etc.) and explicitly (as well as formally) accommodates these patterns in the belief propagation network.

The first point is especially relevant for the current debate on evidence standards in medicine. For instance, we can see how the EBM paradigm takes difference making as a highly reliable indicator for causality and the others as very low indicators; hence it concentrates its efforts on having as reliable as possible evidence for that kind of indicator. The contending view is that different indicators may have complementary epistemic roles in supporting the hypothesis of causality. For instance (Clarke et al. 2014) claim that evidence about difference making helps in de-masking causes which might be canceled out by back-up/compensatory mechanisms in the organ system, whereas evidence about mechanisms is needed in order to design and interpret statistical studies. Hence, such different kinds of evidence reciprocally support each other and jointly (dis-)confirm the causal claim under investigation. This proposal stems from (Russo 2007), where both statistical evidence at the phenotypic/population level and evidence about (molecular-cellular) mechanisms is required to establish causal claims.

In general, the “new mechanists” (henceforth the “Kentians”), led (among others) by Jon Williamson, maintain that a sample which is small compared to the target population on its own is not sufficient to license causal claims about the target population. In addition, evidence that there exists some mechanism responsible for the phenomenon in the observed sample which is also present in the target population is required, cf. (Russo 2007). The Kentians construe of the term “mechanism” in slightly different fashion than we did here. For them, a mechanism need not be described on the (sub-)molecular level. This detail is not relevant for our current discussion.

(Howick2011, p. 939) expresses the opposing “it There are many cases where patient-relevant effects of medical therapies have been established by comparative clinical studies alone.” Our framework captures this dissent: Howick takes it to be the case that P(Hyp|Δ) is large enough to establish the causal claim, while for the Kentians P(Hyp|Δ) is too small to establish the causal claim. For them, only after also updating with the mechanistic indicator is the posterior probability, P(Hyp|Δ& M), large enough to establish the causal claim.

Furthermore, our framework also responds to the view defended by methodological pluralists such as Cartwright and Stegenga, among others, according to whom classical “linear” approaches to causal inference cannot do justice to the complexity of causal phenomena in the biological and social sciences, characterized by nonlinear causation and causal interactions. Stegenga, in particular, explicitly mentions Hill’s viewpoint on causal inference and claims that: “a plurality of  {reasoning strategies appealed to by the epidemiologist Sir Bradford Hill is a superior strategy for assessing a large volume and diversity of evidence” (Stegenga2011).}

In relation to evidence hierarchies — which is a strong point of contention among philosophers of medicine and methodologists — our inequalities (13), (14), nicely  parallel the ranking proposed there. Evidence hierarchies have been developed as a decision tool to help clinicians pressed by time constraints, to integrate their clinical expertise with evidence coming from basic and clinical research (Sackett 1996, Straus2000). In these rankings, randomised studies are ceteris paribus  preferred to non-randomised studies. The rationale for this ranking is provided by methodological-foundational considerations mainly developed within standard statistics and follow a kind of hypothetico-deductive approach to scientific inference. However, also the strength of the effect magnitude and dose-response gradient are considered essential features in evaluating evidence (Howick 2011, Glasziou 2007). Hence, our inequality constraints mirror the categorical ordering recommended in such hierarchies.

What differentiates our framework from standard evidence rankings however is that these have predominantly been formalised as lexicographic decision rules. This means that higher-level studies trump lower-level ones: when two studies of different levels deliver contradictory findings, then the higher in the evidence hierarchy is considered more reliable and one is allowed to discard the lower level one. A somewhat unwanted consequence of this “take the best” approach is that it has become commonplace to assume an uncommitted attitude towards observed associations least they are “proved” by gold standard evidence (see the still ongoing debate on the possible causal association between paracetamol and asthma; (Shaheen 2000, Eneli 2005, Shaheen 2008, Henderson 2013, Allmers 2009, McBride 2011, Heintze 2013, Martinez 2013). This runs counter to the precautionary principle in risk assessment and to how decisions should be made in health settings, see above. Our framework incapsulates the rationale for ranking evidence in 13 and 14 but at the same time allows one to take into account all evidence and to act  accordingly as soon as the probability of the causal hypothesis goes above the threshold established by the other dimensions of the decision (utility of withdrawing/not withdrawing the drug, conditional on the probability of it causing the suspected harm).

Furthermore, the Bayesian network and its nodes — representing epistemic categories (relevance, reliability, various causal indicators, etc.) — provide us a greater insight into the philosophical dissent around EBM. For instance whereas Worral’s criticism (Worrall 2007, Worrall 2010) against the privilege accorded to RCTs for causal inference in EBM insists on questioning their high reliability, Cartwright’s view (Cartwright 2011) is that they provide very limited information on the effect of the intervention in other populations than the study population (“external validity”). Hence, these criticisms address different nodes in the causal inference, although they regard the same study type. Moreover, our approach addresses explicitly the issue of external validity by formally incorporating reasoning by analogy.

Pragmatically, the here presented model has the virtue of being computationally simple in the following sense. Defining prior probabilities on rich structures can be a hard practical problem. A graphical representation of the conditional independencies in terms of the graph of Bayesian network allows one to specify the full prior by specifying all conditional probabilities at the variables. For a relatively sparse graph such as ours (the number of parents of every variable is at most three), specifying all conditional probabilities requires far less input than specifying the full prior by assigning probabilities to all states.

Concerning the second point, our framework promises to provide a fruitful platform for integrating insights developed in the philosophy of science around such topics as the role of replication in assessing the reliability of evidence (Meehl 1990, Lamal 1990, Hempel 1968, Platt 1964), the confirmatory role of explanatory power (McGrew 2003, Crupi 2013, Cohen 2015, Lipton 2003) and coherence (Dietrich 2005, Moretti 2007, Wheeler 2013, Fitelson 2003, Bovens2003). Our approach indeed lends itself not only to accommodate heterogeneous evidence, but also various patterns of inference.