Most frequently asked questions about NCA are answered in more depth in the online NCA book or on the online NCA Forum. If your question is not answered in these sources, or if you want to elaborate on an answer, please raise your question or make your comment on the NCA Forum, or contact a member of the NCA Development team.
NCA Theory
A necessary condition is a (level of a) condition that must be present to have a given (level of the) outcome. Without the condition the outcome will not occur.
A necessary condition allows the outcome to exist; a sufficient condition produces the outcome.
A necessary condition can be formulated in two ways:
- The presence or high level of the condition is necessary for the presence or high level of the outcome
- The absence or low level of the condition is sufficient for the absence or low level of the outcome
If we use “X is necessary for Y” then we mean that the presence or high level of X is necessary for the presence or high level of Y.
It is also possible to formulate necessity relationships with combinations or presence/high level and absence/low level of the condition and the outcome.
A dichotomous necessary condition is a necessary condition that can have only two discrete levels (0/1, absent/present, Low/High, etc.). The outcome can have any number of levels (dichotomous, discrete or continuous).
A discrete necessary condition is a necessary condition that can have more than two discrete levels, for example trichotomous (0/1/2, absent/neutral/present, Low/Medium/High, etc.). The outcome can have any number of levels (dichotomous, discrete or continuous).
A continuous necessary condition is a necessary condition that can have an infinite number of levels between a minimum value and a maximum value. The outcome can have any number of levels (dichotomous, discrete or continuous).
A necessary condition “in kind” is a qualitative expression of necessity of the condition for the outcome: “X is necessary for Y”.
A necessary condition “in degree” is a quantitative expression of necessity of the condition for the outcome: “X=Xc is necessary for Y=Yc ”, where Xc and Yc are specific levels of the condition and the outcome, respectively.
A necessary configuration is a combination of conditions that is necessary for an outcome (necessary AND configuration, or necessary OR configuration).
A necessary AND configuration is a necessary configuration where each single condition is necessary. For example, Apple and Flour are single necessary conditions and both are necessary for an apple pie. A combination of single necessary conditions is a necessary AND configuration. A necessary AND configuration consists of single nececessary conditions.
A necessary OR configuration is a necessary configuration where the conditions are substitutable. For example the combination Green apple or Red apple is necessary for an apple pie, but the separate conditions are not necessary (the conditions are substitutable). A necessary OR configuration may indicate that a higher level condition is necessary (Apple).
The general mathematical description of necessary conditions is Y ≤ fi(Xi), where Y is the outcome, Xi is the condition, and fi(Xi) is the ceiling line for Xi. NCA assumes that the ceiling line is non-decreasing. This allows making statements like "Xi ≥ Xic is necessary for Y = Yc", where C is a point (Xic ,Yc) on the ceiling line.
With one condition, NCA puts a line on the data and this line is the ceiling line f1(X1), such that X1 ≥ X1c is necessary for Y = Yc. The maximum possible Y = yc for a given value X1 = x1, is yc = f1(x1).
With two conditions (X1 and X2), there is a three dimensional space (X1,X2,Y) in which an imaginary blanket is put on the data (ceiling surface). NCA considers only the projection of the ceiling surface on the X1,Y plane to get he ceiling line f1(X1) for X1 , and the projection of the ceiling surface on the X2,Y plane to get the ceiling line f2(X2) for X2. Both X1 ≥ X1c AND X2 ≥ X2c are necessary for Y = Yc. The maximum possible Y = yc for given values X1 = x1, and X2= x2 is yc = min {f1(x1), f2(x2)}.
With more than two conditions there is an imaginary multidimensional ceiling with projections fi(Xi). The mathematical descriptions of a necessary AND configuration with several conditions are: Xi ≥ Xic for Y = Yc and the maximum possible Y = yc for given values of Xi = xi is yc = min { fi(Xi) }, where Y is the outcome, Xi is the i-th condition, fi(Xi) is the i-th ceiling line and C is a point on the ceiling line.
Necessary conditions theories are simple (parsimonious) because even a single factor can explain the absence of the outcome if the necessary condition is absent. This contrasts sufficiency theories and models (e.g., as with structural equation models) where a many factors together predict the presence of the outcome.
One reason for the complexity of sufficiency theories and models is that for multi-causal phenomena the outcome can only be properly explained when many factors that help to produce the outcome are included. Furthermore, when population parameters are estimated with regression models, not including relevant variables (confounders) may cause “omitted variable bias” (incorrect estimation of the regression coefficients of the variables that are included in the model). This bias occurs when variables that correlate with other variables in the model and with the outcome are omitted. By adding “control” variables, the explanatory power of the model can be improved, and omitted variable bias can be reduced. As a result sufficiency models can become complex.
Necessity theory and necessity conceptual models are unaffected by omitted variable bias because the necessary condition operates in isolation from the rest of the causal structure. The NCA parameters of the necessary condition are unaffected by adding or deleting other variables from the theory/conceptual model. One reason is that NCA takes the projections of the ceiling surface in XY planes, rather than that it estimates the multidimensional ceiling surface. NCA estimates the 'absolute necessity' rather than 'necessity dependent on other variables'.
Necessity theory and necessity conceptual models do not need control variables because the NCA parameters are the same with and without having control variables in the theory/model.
NCA Research design
In an experiment the condition (X) is manipulated and the effect on the outcome (Y) is observed. In the regular sufficiency experiment cases without (or with a low level of) the outcome are selected and the condition is added (or increased) to observe whether the outcome appears (“Gain of function experiment”) by looking at the average effect compared to a control group where the manipulation was not done. In the necessity experiment cases with (or with a high level of) the outcome are selected and the condition is eliminated (or reduced) to observe whether the outcome disappears (“Loss of function experiment”) by looking at the maximum effect compared to a control group where the manipulation was not done.
In a (large N) observational study the condition (X) and the outcome (Y) are observed in real-life context without manipulation of the condition. There is no difference between a sufficiency observational research design, and necessity observational research design. Only the data analysis differs. For causal interpretation a theoretical explanation is essential (necessity hypothesis).
In a (small N) case study the condition (X) and the outcome (Y) are observed in real-life context. The regular sufficiency case study explores conditions (X) that may produce the outcome (Y). The necessity case study identifies the common (hence necessary) conditions (X), in one or more cases with the outcome (Y). For exploration (theory building) there is no difference between a sufficiency case study research design, and necessity case study research design. Only the data analysis differs. However, whereas the case study can not be used for testing sufficiency theory, it can be used for testing necessity theory. Even one case (and a deterministic view on necessity) can reject a necessity theory, when the assumed necessary condition is not present in a case where the outcome is present.
In an archival study, an existing data set (usually from an observational study) is used to perform a theory building or testing study. In the regular archival study the data are (re)analysed for building or testing sufficiency theory. In the necessity archival study the data are analysed using NCA for building or testing necessity theory. Using NCA with archival data is particularly useful since the data are usually only analysed with a probabilistic sufficiency causal perspective on XY relationships, whereas NCA can add the necessity perspective.
Sampling is the selection of cases from a population of cases for subsequent measurement and data analysis. For NCA the common sampling methods are the same as for any other research approach and data analysis technique (e.g., census, probability sampling, convenience sampling). Only purposive sampling (usually done in qualitative research) is performed differently.
A census is a “sample” in which all cases of the population become part of the “sample” (and the data base). With a census statistical inference from sample to population is not relevant and unnecessary because the population information is available in the “sample”. For NCA the census is the same as for any other research approach and data analysis technique.
A probability sample (random sample) is a sample in which all cases of the population had the same probability to become part of the sample (and the data base). This can be achieved by random sampling from a complete list of cases of the population (sampling frame). A probability sample is a requirement for statistical inference from sample to population. For NCA the probability sample is the same as for any other research approaches and data analysis technique.
A convenience sample is a sample in which cases from the population are selected for convenience of the researcher, for example, because cases are easily accessible. When a convenience sample is used as a substitute of a probability sample, statistical inference is flawed. For NCA the convenience sample is the same as for any other research approach and data analysis technique.
A purposive sample is a sample in which cases from the population are specifically selected with a certain purpose. In NCA purposive sampling can be applied for testing necessity theory by sampling only cases with the outcome present With an necessity experiment this sample is used to observe if the outcome disappears when the condition is removed. In an necessity observational study this sample is used to observe if the condition is present in all sampled cases where the outcome is present. If not, necessity is rejected.
With an observational study it is also possible to sample only cases where the condition is absent and to observe if the outcome is present, which results in a rejection.
With a purposive sample, estimations for general parameters of the population (e.g., mean) cannot be obtained because these estimations require information about the entire population.
The short answer is “At least 1, but the larger the better”.
NCA may be performed with a single case. This is possible when the condition and outcome can have only two values (absent/present, low/high, 0/1, etc.) and with a deterministic view on necessity causality. For testing the hypothesis that the presence of X is necessary for the presence of Y, a case with the outcome present should be selected. Then the researcher observes if the condition is present or not. If the condition is not present, the hypothesis is rejected (falsification). This is one test of the hypothesis with a single case. Such test could be replicated with multiple cases, but the test itself is done with a single case.
In most NCA applications X and Y can have several levels, and samples are drawn from a population for the estimation of the NCA parameters, for example the effect size. Like with other methods, the estimation becomes better when sample size increases.
There is no recommended minimum, optimum or maximum sample size. The quality of the estimation depends on many factors.
NCA Data analysis
NCA is a specific approach and data analysis technique for identifying necessary conditions in data sets. It distinguishes different types of necessary condition analyses.
A bivariate necessary condition analysis is a necessary condition analysis with one condition and one outcome
A multiple necessary condition analysis is a necessary condition analysis with more than one conditions and one outcome. It consists of multiple bivariate necessary conditions analyses as the results of the analyses are independent of each other.
A “dichotomous” necessary condition is a necessary condition analysis in which the condition and/or the outcome is dichotomous.
A “discrete” necessary condition analysis is a necessary condition analysis in which the condition and/or the outcome is discrete with more than two levels.
A “continuous” necessary condition analysis is a necessary condition analysis where both the condition and the outcome are (nearly continuous (very large or infinity number of possible levels).
A ceiling is the border between an empty space without observations and a full space with observations. This can be mathematically expressed as Y ≤ f(X), where Y is the outcome, X are the conditions, and f(X) is the ceiling function.
A floor is the border between an empty space without observations and a full space with observations and that is mathematically expressed as Y ≥ f(X), where Y is the outcome, X are the conditions, and f(X) is the floor function.
A ceiling line is a ceiling in the two-dimensional space, where the ceiling function f(X) is a line.
A ceiling surface is a ceiling in the three-dimensional space, where the ceiling function f(X) is a surface.
A ceiling technique is a mathematical or statistical approach to approximate the ceiling (see table). In the NCA software package for R, the CE-FDH ceiling technique is the default ceiling technique for dichotomous and discrete (with few levels) necessary condition analysis or when a continuous ceiling is "jumpy". The CR-FDH ceiling technique is the default ceiling technique for discrete (with many levels) and continuous necessary condition analysis.
Common ceiling Techniques in NCA | Name in NCA software | Name |
---|---|---|
CE-FDH | ce_fdh | Ceiling Envelopment with Free Disposal Hull |
CR-FDH | cr_fdh | Ceiling Regression with Free Disposal Hull |
C-LP | c_lp | Ceiling - Linear Programming |
CE-VRS | ce_vrs | Ceiling Envelopment with Varying Return to Scale |
CR-VRS | cr_vrs | Ceiling Regression with Varying Return to Scale |
The Ceiling Envelopment – Free Disposal Hull (CE-FDH ) ceiling technique is a ceiling approximation obtained from the Free Disposal Hull (FDH) data envelopment technique that assumes that the ceiling is non-decreasing, resulting in a non-decreasing step function. In the NCA software package for R, the CE-FDH ceiling technique is the default ceiling technique for dichotomous and discrete (with few values) necessary conditions or when the border is 'jumpy'.
The Ceiling Regression - Free Disposal Hull (CR-FDH) ceiling technique is a ceiling approximation that smooths the step function obtained by the CE-FDH technique by using OLS regression through the upper-left corners of the step function. In the NCA software package for R, the CR-FDH ceiling technique is the default ceiling technique for discrete (with many values) and continuous necessary conditions.
The Ceiling - Linear Programming (C-LP) ceiling technique is a ceiling approximation that selects two upper left points from the step function obtained by the CE-FDH such that the empty space is maximum. The ceiling accuracy of this line is 100%. It can be used for simulations when there is no measurement error in the data.
The Ceiling Envelopment - Varying Return to Scale (CE-VRS ) ceiling technique is a ceiling approximation obtained from Data Envelopment Analysis (DEA) that assumes that the ceiling is convex, resulting in a piecewise linear convex ceiling function.
The Ceiling Regression - Varying Return to Scale (CE-VRS ) ceiling technique is a ceiling approximation that smooths the piecewise linear function obtained by the CE-VRS technique by using OLS regression through the corners of the piecewise linear function.
The ceiling zone (C) is the “empty” space above the ceiling. Because a small proportion of observations (e.g., 5%) may be in the “empty” space, the ceiling zone may be “virtually" or "typically" empty.
Ceiling accuracy is the percentage of observations that are on or below the ceiling.
The scope (S) is the total potential space with observations given the minimum and maximum values of the condition and outcome.
The theoretical scope is the scope with theoretical minimum and maximum values of the condition and the outcome.
The empirical scope is the scope with observed minimum and maximum values of the condition and the outcome.
The necessary condition effect size (d) is the proportion of the scope above the ceiling: d = C/S. It ranges from 0 to 1 (0 ≤ d ≤ 1). The effect size indicates to what extent the condition is necessary for the outcome. In other words: to what extent the condition constrains the outcome, and the outcome is constrained by the condition.
“An effect size can be valued as important or not, depending on the context. A given effect size can be small in one context and large in another. General qualifications for the size of an effect as ‘small,’ ‘medium,’ or ‘large’ are therefore disputable. If, nevertheless, a researcher wishes to have a general benchmark for necessary condition effect size, I would offer 0 < d < 0.1 as a ‘small effect,’ 0.1 ≤ d < 0.3 as a ‘medium effect,’ 0.3 ≤ d < 0.5 as a ‘large effect,’ and d ≥ 0.5 as a ‘very large effect’.”(Dul, 2016, p.30).
Condition inefficiency is the percentage of the range of the condition where the condition is not necessary for the outcome. In other words: where the condition does not constrain the outcome.
Outcome inefficiency is the percentage of the range of the outcome where the condition is not necessary for the outcome. In other words: where the outcome is not constrained by the condition.
The bottleneck table is a tabular representation of the ceiling. It indicates the required necessary level of the condition(s) for a given level of the outcome ("necessity in degree". The bottleneck table is particularly useful for interpreting multiple necessary conditions.
NCA's statistical significance test is an 'appoximate permutation test' that estimates the p values by randomly drawing samples from the permutation distribution. This test can be activated in the NCA software using the argument test.rep = 10000 (or another number) in the nca_analysis function.
A simulation of the power of the test for estimating a desirable sample size for a new NCA study can be activated in the NCA software using the nca_power function.
NCA’s significance test has the following parts
- Calculate the necessity effect size for the observed sample.
- Formulate the null-hypothesis that suggests that X and Y in the population are not related. Any effect size is a random effect.
- Create a large set of random resamples (e.g., 10,000) using approximate permutation. In a permutation test the X and Y values that are observed in the sample are shuffled to create new resamples (same sample size) with ‘cases’ where X and Y are unrelated.
- Calculate the effect size of all resamples. The set of effect sizes comprises an estimated distribution of effect size under the assumption that X and Y are not related.
- Compare the effect size of the observed sample (see part 1) with the distribution of effect sizes of the random resamples. The fraction of random resamples for which the effect size is equal to, or greater than the observed effect size (p value) informs us about the statistical (in)compatibility of the data with the null hypothesis.
NCA’s significance test (p value) is only one part of the decision process about necessity (e.g. p < 0.05). The other parts are the availability of theoretical support (e.g., a formulated and justified necessity hypothesis) which focuses the analysis on the right expected empty corner of the XY plot, and a large enough effect size that is considered practically relevant (e.g., > 0.10). If one of these requirements is not satisfied NCA rejects necessity. NCA considers necessity being supported (not rejected) only if all three requirements are met. Simulations show that with this approach NCA has a high True Positive Rate (sensitivity) and a high True Negative Rate (specificity), which help to minimize the risk of false positive and false negative conclusions about necessity.
NCA limitations
No. NCA only can predict the absence of an outcome, not the presence of the outcome. NCA focuses on single conditions that each will prevent the outcome to occur when the conditions is absent or has a low level. Traditional sufficiency methods such as Multiple Regression, Structural Equation Modeling, Partial Least Squares, as well as methods like QCA consider the complex causal structures that produce the outcome. These methods must be used to predict the presence of the outcome from a set of conditions.
No. Just like other data analysis approaches NCA presumes that the data to be analysed are valid, reliable and meaningful. If this assumption is not correct the results of the NCA analysis can be flawed. NCA is not sensitive for measurement error of observations (far) below the ceiling line, but sensitive for measurement error of observations around the ceiling line.
No. Just like other data analysis approaches, NCA may be sensitive for outliers. NCA has a specific outlier analysis approach. In NCA an outlier is defined as case that -if removed from the dataset- has a large influence on the effect size. Two types of outliers exist: ceiling outlliers are cases that define the ceiling and scope outliers that define the scope. If the outlier is caused by measurement error that cannot be corrected or by sampling error because the case does not belong to the theoretical domain, the outliers is removed from the data set. If there is a outlier without known reason the outlier is usually kept in the data set.
No. Just like other data analysis techniques, NCA alone cannot prove causality. It depends largely on the research design (see above) and the available theory (see above) whether or not it is plausible that the condition is a necessary cause.
No. Just like other quantitative data analysis approaches NCA presumes that the sample (see above) is a probability sample (e.g., random sample) from the population. If this assumption is not true and the sample is not representative for the population, the results of the NCA analysis (and any other data analysis approach for statistical inference) can be flawed.
NCA and other data analysis methods
A comparison between NCA and OLS regression is available in the downloads below.
A more comprehensive comparison between NCA and QCA is available in the downloads below.
Qualitative Comparative Analysis (QCA; Ragin 1987, 2000, 2008, see also http://www.compasss.org/) is a method for identifying necessary and sufficient conditions using fuzzy sets. Condition X and outcome Y are expressed in terms of set membership scores, rather than conventional variable scores. With respect to a certain characteristic a case can be fully out of the set (set membership score = 0) or fully in the set (set membership score = 1). For example, the Netherlands is a case (of all countries) that can be considered as “fully in the set” of rich countries (based on the economic variable Gross Domestic Product, GDP), and Ethiopia can be considered as a case that is “fully out of the set” of rich countries. In crisp-set QCA (csQCA) the set membership scores can only be 0 and 1. In fuzzy-set QCA the membership scores can also have values between 0 and 1. For example, Croatia could be allocated a set membership score of 0.7 indicating that it is “more in the set” than “out of the set” of rich countries.
Set membership scores are obtained by transforming variable data and other information into set membership scores. This transformation process is called “calibration”. Calibration can be based on the distribution of the data, the measurement scale, or expert knowledge. Users of QCA can evaluate the (potentially large) effect of calibration on necessity using calibration evaluation tool.
QCA performs two separate analyses: a necessity analysis for identifying necessary conditions, and a truth table analysis for identifying sufficient configurations.
Although in most NCA applications conventional variable scores are used for quantifying condition X and outcome Y, NCA can also employ set membership scores for the conditions and the outcome, allowing a comparison between NCA and QCA. CsQCA and fsQCA have different procedures for identifying necessary conditions.
Necessity analysis of csQCA compared to NCA
The necessity analysis of csQCA is basically the same as NCA’s necessity analysis for dichotomous scores of X and Y. A necessary condition is assumed to exist if the condition is present (X=1) in virtually all cases where the outcome is present (Y=1), hence virtually no cases exist in which the outcome is present (Y=1) and the condition is absent (X=0). This is illustrated with the contingency table (see figure).

When cases are present in the “empty” zone above the diagonal (open circles) fsQCA considers these cases as “deviant cases”. FsQCA accepts some deviant cases as long as the necessity consistency level, which is computed from the total vertical distances of the deviant cases to the diagonal, is not smaller than a certain threshold, usually 0.9. FsQCA makes a qualitative (“in kind”) statement about the necessity of X for Y: “X is necessary for Y” (e.g., the presence of X is necessary for the presence of Y).
In contrast, NCA uses the ceiling line (see above) as the reference line (see right figure) for evaluating the necessity of X for Y (with possibly some cases above the ceiling line; accuracy below 100%). In situations where fsQCA observes “deviant cases”, NCA includes these cases in the analysis by moving the reference line from the diagonal position to the boundary between the zone with cases and the zone without cases. NCA considers cases around the ceiling line (and usually above the diagonal) as “best practice” cases rather than “deviant” cases. These cases are able to reach a high level of outcome (e.g., an output that is desired) for a relatively low level of condition (e.g., an input that requires effort).
In NCA the size of the “empty” zone as a fraction of the total zone (empty plus full zone) is called the necessity effect size (see above). If the effect size is greater than zero (an empty zone is present) NCA has identified a necessary condition “in kind” that can be formulated as: “X is necessary for Y”, indicating that for at least a part of the range of X and the range of Y a certain level of X is necessary for a certain level of Y.
Additionally, NCA can quantitatively formulate necessary condition “in degree” by using the ceiling line: “level Xc of X is necessary for level Yc of Y”. The ceiling line represents all combinations X and Y where X is necessary for Y. (Although also fsQCA’s diagonal reference line allows for making quantitative necessary conditions statements, e.g. X>0.3 is necessary for Y=0.3, fsQCA does not make such statements).
When the ceiling line coincides with the diagonal (corresponding to the situation that fsQCA considers) the statement “X is necessary for Y” applies to all X-levels [0,1] and all Y-levels [0,1] and the results of the qualitative necessity analysis of fsQCA and NCA are the same. When the ceiling line is positioned above the diagonal “X is necessary for Y” only applies to a specific range of X and a specific range of Y. Outside these ranges X is not necessary for Y (“inefficiency”, see above). Then the results of the qualitative necessity analysis of fsQCA and NCA can be different.
Normally, NCA identifies more necessary conditions than fsQCA (see Comparing NCA and QCA in the downloads below). In the example NCA identifies that X is necessary for Y because there is an empty zone above the ceiling line. For example, for reaching an outcome level of Yc = 0.8 the necessary level of the condition is Xc = 0.6. Thus, when the condition level is below 0.6, it is not possible to reach an outcome level of 0.8. However, fsQCA would conclude that X is not necessary for Y, because the necessity consistency level is too small in this example (<0.9).
FsQCA’s necessity analysis can be considered as a special case of NCA (an NCA analysis with discrete or continuous fuzzy set membership scores for X and Y, a ceiling line that is diagonal, an allowance of a specific number of cases in the empty zone given by the necessity consistency threshold, and the formulation of a qualitative “in kind” necessity statement).
Although QCA can perform a necessity analysis as shown above, most QCA researchers focus on the sufficiency analysis. Whereas single necessary conditions are not uncommon (and very relevant because when the necessary levels of the condition are not in place the outcome will not occur), single sufficient conditions are extremely rare (in multicausal phenomena no single condition can produce the outcome). Therefore QCA focuses on combinations of conditions that are sufficient for the outcome (sufficient configurations). For identifying sufficient configurations QCA uses binary logic (using a truth table where the condition and outcome can have only two values: true of false). With k conditions of two values, a total of 2k possible combinations can be formulated. QCA identifies which of these combinations are observed in cases. Usually several configurations that can produce the outcome are observed(“equifinality”). When identified configurations are parsimonious (non-rerdundant) causal interpretations may be possible (Baumgartner, 2015).
QCA’s logical statements are expressed for example as follows (adapted from Goertz, 2003):
Y = X1*X2*X3 + X4*X5 (1)
In this example five conditions (X1, X2, X3, X4, X5) and one outcome Y are present. The symbol “+” means the logical “OR” and the symbol “*” means the logical “AND.” Equation (1) indicates that the presence of Y can be achieved via only two paths. The first paths is configuration X1*X2*X3 (the presence of X1 AND X2 AND X3) and the second possible path is configuration X4*X5 (the presence of X4 AND X5).
Each configuration has certain conditions (e.g., X1, X2,... ) that must be part of the configuration such that the configuration can produce the outcome. For example, X3 is part of the first configuration that can produce the outcome. The necessary parts of an outcome producing configuration are called INUS conditions (Mackie,1965): “Insufficient but Non-redundant (i.e., Necessary) part of an Unnecessary but Sufficient condition.” Non-redundant means that the conditions that are part of the configuration that produces (is sufficient for) the outcome, are essential (necessary).
Mackie (1965) repeatedly makes a distinction between an INUS condition and a necessary condition. For example he states on page 253: "Again, some causal statements pick out something that is not only an INUS condition but also a necessary condition". An INUS condition is only necessary for a specific configuration (non-redundant part of it), whereas a necessary condition is necessary for the outcome (hence must be present in all configurations that can produce the outcome). In equation (1) X3 is an INUS condition that is necessary for the configuration to produce the outcome, but it is not necessary for the outcome because configuration X4*X5 can also produce the outcome and does not contain X3. In contrast, in equation (2) below, where Y can be produced by only two configurations (X1*X2*X3 or X4*X3), X3 is part of both configurations that can produce the outcome.
Y = X1*X2*X3 + X4*X3 (2)
In equation (2) X3 is not only an INUS conditions for the two configurations, but also a necessary condition for the outcome. A condition is necessary if it is present in all configurations that produce the outcome. Normally, an INUS condition is not a necessary condition, but a necessary condition is always an INUS condition. Or as Mackie (1965, p. 253) puts it: "... some causal statements pick out something that is not only an INUS condition, but also a necessary condition."
It is possible that no necessary condition exist in the causal structure, but that only INUS conditions exist. For example in equation (1) there is no necessary condition and five INUS conditions. However, when a necessary condition is identified its practical relevance is clear-cut: if the condition is not in place in any possible configurations, the configuration will not produce the outcome. When the outcome is desired (e.g., performance in business applications) a practitioner must ensure that the necessary condition is always present. When the outcome is undesired (e.g., disease in medical applications) the practitioner must ensure that the necessary condition is never present (see NOTE below).
NCA is a method for identifying necessary conditions (hence its name). It is not designed for identifying sufficient configurations, nor for identifying INUS conditions of sufficient configurations. NCA only identifies necessary conditions for the outcome. These necessary conditions must be part of all sufficient configurations (and therefore become also INUS conditions for the configurations: insufficient but necessary parts of all unnecessary sufficient configurations).
NOTE: In his influential work on causes in epidemiological research, Rothman (1976) introduces the notion component causes of disease, which corresponds to Mackie’s INUS logic. “If there exists a component cause which is a member of every sufficient cause, such a component is termed a necessary cause. Necessary causes are often identifiable as part of the definition of effect” (p. 588, emphasis added). For example, certain bacteria and viruses are dichotomous necessary causes of infectious diseases: the tubercle bacillus is a necessary cause of tuberculosis, the Human Immunodeficiency Virus (HIV) is a necessary cause of AIDS, and the Human Papilloma Virus (HPV) is a necessary cause of cervical cancer (Dul, 2016b). Rothman et al. (2008) refer to a necessary cause (i.e., component cause of all sufficient causes) as a “universally necessary cause”. Prevention then can focus on the necessary cause (e.g., HPV screening and vaccination of women for prevention of cervical cancer).
Conditional statements are not the same as causal statements. According to the logic of conditional statements “A is necessary for B”, is equivalent to the conditional statements: “not A is sufficient for not B”, “B is sufficient for A”, and “not B is necessary for not A”. For example, when A is HIV and B is AIDS the following logical conditional statements apply if HIV is necessary for AIDS:
- HIV is necessary for AIDS
- No HIV is sufficient for no AIDS
- AIDS is sufficient for HIV
- No AIDS is necessary for no HIV
Conditional logic makes no assumption about the causal direction of the relationship between the concepts A and B. A fundamental difference between conditional statements and causal statements is that the latter presumes a temporal order between the concepts: first the cause (antecedent), then the effect (consequent). An infection of HIV precedes the disease AIDS (A causes B). Then it is not possible that AIDS precedes HIV (B causes A). Therefore conditional statements 3 and 4 are no correct causal statements. Only conditional statements 1 and 2 reflect that HIV is a necessary cause of AIDS. Statement 1 is the necessity of presence formulation of the necessary cause and statement 2 is the sufficiency of absence formulation of the necessary cause (Dul 2016b).
When researchers use NCA and other data analysis techniques for building and testing theory they presume a causal relationship between concepts. A fundamental characteristic of any scientific theory is that causal relationships exist between the concepts of the theory. These causal relationships are represented by arrows in a conceptual model, and allow for making predictions, which is one main goal of theory in applied sciences (“there is nothing as practical as a good theory”). In the context of theory causal statements are named “propositions”, and in the context of theory building or testing empirical research they are called “hypotheses”. In a necessary condition theory it is known or presumed that the condition precedes the outcome. This is emphasized by using the convention “X” for the condition (the cause), and “Y” for the outcome (the effect). Hence, in the context of theory and theory building or testing research “X is a necessary condition for Y” means “the presence of X is a necessary cause for the presence of Y”.
Empirical data can provide support or not for a causal necessary condition relationship. If the data suggest that A is a necessary condition for B, with the researcher’s information that A precedes B it is plausible that A is a necessary cause of B. Then the alternative reverse causality that B is a sufficient cause of A is not plausible, as B does not precede A. In other words, although the data may also support that B is a sufficient condition for A, the researcher’s information on the causal direction excludes the possibility that B is a sufficient cause of A. Information about the causal direction can be obtained, for example, by using a (quasi) experimental research design (where the condition X changes before the outcome Y), by providing theoretical arguments for the causal direction X --> Y, or by process tracing of cases to evaluate whether condition X changes before outcome Y. This additional information for making causal interpretations is similar to the additional information that is needed for making causal interpretations of correlations and associations in regression analyses.
A researcher in the applied sciences who uses a tool for analysing empirical data for building or testing theory assumes causal directions between the concepts of interest (reflected in propositions), based on additional information. Researchers who use NCA's (or QCA's) necessity analyses, usually assume that the condition causes the outcome, hence that A or B is the antecedent condition (X) and the other one is the outcome Y that follows. If it is assumed that B (assigned Y) follows A (assigned X), the data suggest that A (X) is necessary cause of B (Y). However, if it were assumed that A (assigned Y) follows B (assigned X) then the data suggest that B (X) is a sufficient cause of A (Y).
Baumgartner, M. (2015). Parsimony and causality. Quality & Quantity, 49(2), 839-856.
Dul, J. (2016a). ‘‘Identifying single necessary conditions with NCA and fsQCA.’’ Journal of Business Research 69(4):1516-1523.
Dul, J. (2016b). Necessary Condition Analysis (NCA). Logic and methodology of “necessary but not sufficient” causality. Organizational Research Methods, 19(1), 10-52.
Goertz, G. (2003). The substantive importance of necessary condition hypotheses. Necessary conditions: Theory, methodology, and applications, 65-94.
Mackie, J. L. (1965). Causes and conditions. American philosophical quarterly, 2(4), 245-264.
Ragin, C.C. (1987). The comparative method: Moving beyond qualitative and quantitative strategies. Los Angeles: University of California Press.
Ragin, C. C. (2000). Fuzzy-set Social Science. Chicago: The University of Chicago Press.
Ragin, C. C. (2008). Redesigning Social Inquiry: Fuzzy Sets and Beyond. Chicago:University of Chicago Press.
Rothman, K. J. (1976). Causes. American Journal of Epidemiology, 104(6), 587-592.
Rothman, K. J., Greenland, S., Poole, C., & Lash, T. L. (2008). Causation and causal inference. In: Rothman, K. J., Greenland, S., & Lash, T. L. (Eds.). Modern epidemiology. Lippincott Williams & Wilkins.
Vis, B. & Dul, J. (2016). Analyzing relationships of necessity not just in kind but also in degree: Complementing fsQCA with NCA. Sociological Methods and Research (in press).