Abstract
Signal detection theory (SDT) was developed to provide a measure of the discriminability of a signal against background noise, independently of response bias. However, equal discriminability over a range of bias is only achieved by the traditional signal detection measure d\(^{\prime }\) under a narrow set of conditions – i.e., binormal noise and signal distributions of equal variance and base rates. In response to observed departures from these conditions, more robust alternative measures of d\(^{\prime }\) have been developed, including da and, more recently, d\(^{\prime }_{p}\). Each of these alternatives addresses some, but not all, of the difficulties that arise when the assumptions of SDT are violated. Moreover, none of these measures directly follow from a central idea of discriminability by an observer that adopts a minimize error count (MEC) strategy. I propose a new d\(^{\prime }\) alternative, d\(^{\prime }_{o}\), that is robust to violations of the standard signal detection assumptions, remains consistent with varying bias, and is grounded in the principle of discriminability following a MEC strategy. Simulations illustrate how d\(^{\prime }_{o}\) is similar to the recently developed d\(^{\prime }_{p}\) when the observer optimizes their criterion placement to minimize the number of errors but, unlike d\(^{\prime }_{p}\), remains consistent irrespective of the observer’s criterion placement Moreover, unlike da, d\(^{\prime }_{o}\) reflects changes in discriminability related to base rates of signal vs. noise presentations. The use of d\(^{\prime }_{o}\) also has implications for the interpretation of bias metrics, such as β and c, which are examined at the optimal criterion under a variety of conditions.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
d\(^{\prime }_{o}\): Sensitivity at the optimal criterion location
Signal detection theory (SDT) aims to establish the discriminability between a noise and signal distribution independently of an observer’s bias (i.e., propensity towards saying “Yes” or “No”). The metric d\(^{\prime }\) was developed as an alternative to examining proportion correct in order to separate bias, which impacts proportion correct, from the observer’s sensitivity to the underlying noise and signal distributions. In the standard case, there are an equal number of noise and signal trials, the noise and signal distributions are normally distributed, and both are of equal variance (EVSDT). The subject makes a “Yes” or “No” decision using a fixed criterion along an “evidence” axis which determines the number of hits and false alarms. The decision space can be represented as a plot of the false alarm rate (FAR) on the x-axis against the hit rate (HR) on the y-axis and illustrates the receiver operator characteristics (ROC) of the observer (see Fig. 1A). The ROC may be presented in a z-transformed space (zROC) in which the inverse cumulative density scores of the HR and FAR are plotted against one another (see Fig. 1B). The transformed space provides convenience in illustrating SDT metrics. According to SDT, while performance changes as a function of the observer’s criterion placement, the difference between the z-transformed hit (zHR) and false alarm rates (zFAR) is constant independently of bias and is equal to the number of standard deviations between the means of the noise and signal distributions (Swets, Tanner, & Birdsall, 1961). While originally adopted in the perception literature, SDT has been adapted for classification tasks broadly – including recognition memory (Macmillan & Creelman, 2005; Egan, 1958), as well as type II metacognitive tasks (Barrett, Dienes, & Seth, 2013).
Rationally, one should expect an observer to adopt a strategy that minimizes the error count (MEC) in any given decision scenario. To do so, the observer needs to set the criterion at the point where the base-rate scaled noise and signal distributions intersect – the location at which the stimulus is equally likely to be from the noise or signal distribution – which minimizes the total number of errors (i.e., the number of misses and the number of false alarms) (Wickens, 2002). Note that the use of base-rate scaled distributions represents a departure from some traditional descriptions of SDT but is consistent with the visualization of a decision scenario with unequal base rates (Thomson, Besner, & Smilek, 2016; Mueser, Cowan, & Mueser, 1999) and the conceptualization of how the criterion location relates to base rates (Balakrishnan, 1998b; Balakrishnan & Macdonald, 2002; Balakrishnan, 1998a; Birnbaum, 1983; Mueser et al., 1999). An assumption of the current approach is that the base-rate scaled distributions reflect the decision scenario under consideration by the observer. An MEC strategy is consistent with numerous empirical results across a range of tasks including vigilance (Balakrishnan, 1998b; Balakrishnan & Macdonald, 2002), perceptual categorization (Maddox & Bohil, 1998a; 1998b), applied visual inspection (Madhavan, Gonzalez, & Lacson, 2007), and recognition memory (Stretch & Wixted, 1998). However, when describing decision strategy, the term neutral or unbiased observer is sometimes taken to mean that the observer sets the criterion, λ, at the midpoint between the means of the noise and signal distributions (Swets et al., 1961; Macmillan & Creelman, 2005) Footnote 1. Under EVSDT conditions, bias measures taken at the midpoint, such as c and β, relate to performance such that “unbiased” performance (i.e., c = 0 or β = 1) corresponds to optimal performance (Swets et al., 1961; Macmillan & Creelman, 2005). Equation 1 describes the relationship between c and the location along the decision axis on which the observer places their criterion. Equation 2 describes how β is simply the height of the probability density function of the signal distribution, fSN, relative to that of the noise distribution, fN, at a criterion location x. Equation 3 illustrates the relationship between d\(^{\prime }\) and the bias metrics β and c.
Swets et al., (1961) previously described an approach to identifying the optimal criterion by finding the point at which a stimulus is equally likely to be from the noise or signal distribution. Unlike β, they took the base rate of noise and signal presentations into account. Equation 4, which is adapted from Equation 9 from Swets et al., (1961), describes the relationship between the intersection of the probability density functions of the noise and signal distributions and the optimal criterion location Footnote 2. Specifically, the optimal decision criterion corresponds to the location where a stimulus is equally likely to be drawn from the noise and signal distributions. L(x) is the likelihood ratio of the probability density function of the signal-to-noise distribution at any location x along the decision axis, p(SN) is the base rate of signal presentations, and p(N) is the base rate of noise presentations.
Notably, Swets et al., (1961) assume equal noise and variance in their model despite noting that empirical studies suggest unequal variance, particularly with increasing distance between the noise and signal distributions. However, the correspondence between these bias measures and performance diverge once violations of the standard EVSDT case occur. Therefore, setting the criterion at the intersection of the base-rate scaled noise and signal distributions, as one should expect from a MEC strategy, results in bias metric values that depart from those that generally indicate unbiased performance. If one equates “optimal” and “unbiased”, this leads to potential misinterpretation of responding such that shifts towards an optimal strategy, as an observer gains information about the decision scenario, may be incorrectly identified as suboptimal performance. To avoid confusion around “bias”-related metrics, the phrase “MEC response strategy” will be used throughout the present manuscript to indicate that the criterion was set at equal-likelihood location of the base-rate scaled noise and signal distributions.
Two types of situations that d\(^{\prime }\) fails to handle well in real-world scenarios include that in which there is unequal variance between the noise and signal distributions (UVSDT) and that in which the base rates of signal and noise stimuli differ. With respect to the former, Wixted (2007) examined responding from subjects in recognition memory tasks and reported that the variance in the target distribution is greater that the variance in the lure distribution – typically σlure/σtarget ≈ 0.80. Therefore, d\(^{\prime }\) is not an equal discriminability function across levels of bias (Wixted, 2007; Mickes, Wixted, & Wais, 2007; Rotello, Masson, & Verde, 2008). With respect to the latter, a common scenario in which the number of stimuli presented from the noise and signal distributions diverges substantially is vigilance in which an observer is tasked with monitor for a rare signal against background noise (Jerison, 1967). However, because the equations that underlie SDT metrics are built around rates of hits (HR) and false alarms (FAR) rather the number of errors that an observer experiences, a strategy to minimize error count may not be interpreted appropriately. For example, Thomson et al., (2016) interpret increases in c, caused by a rightward shift in λ, as reflecting increasing conservativeness in a vigilance task. However, these data may also reflect a criterion shift towards the intersection of base-rate scaled noise and signal distributions, which also serves to minimize error, as would be expected from following a MEC strategy and similar to what is observed when base rates of signal trials are manipulated (Swets et al., 1961; Madhavan et al., 2007; Creelman, 1965; Maddox and Bohil, 2004). The distortion of bias metrics and decline in d\(^{\prime }\) arise because the SDT metrics reflect rates rather than counts of hit and false alarm. These diverge in Thomson et al., (2016) because the proportion of signal trials is 0.1 – an asymmetry that is typical in vigilance scenarios (Jerison, 1967; See, Warm, Dember, & Howe, 1997).
Alternatives to d\(^{\prime }\) have been developed which aim to provide stable measures of observer sensitivity, independently of bias, outside of EVSDT (Macmillan & Creelman, 2005). Two straightforward alternatives to d\(^{\prime }\) that accomplish this goal are d\(^{\prime }_{1}\) and d\(^{\prime }_{2}\) which correspond to the x- and y-intercepts of the z-transformed decision space (see Fig. 2). These values correspond to the distance between the noise and signal distributions in terms of number of noise and signal distribution standard deviations, respectively. However, because each only reflects the attributes of one distribution, neither provides a reasonable measure of the true discriminability of the base-rate scaled distributions from one another when the variances are unequal. Another alternative that has been developed, and which incorporates variance information from both signal and noise distributions, is da (Simpson & Fitter, 1973; Macmillan & Creelman, 2005). Equation 5 illustrates how da is based on mathematical compromise between the variances of the noise and signal distributions. In the z-transformed decision space, the slope, s, corresponds to the ratio of the standard deviation of the noise to the standard deviation of the signal.
Unlike d\(^{\prime }\), da provides stable values over a range of bias values that fall between d\(^{\prime }_{1}\) and d\(^{\prime }_{2}\) when the standard deviations of the noise and signal differ. Moreover, da is directly related to Az, the area under the ROC curve, which is an alternative measure of discriminability and one that is intended to correspond to the maximum proportion correct of an unbiased observer (see Eq. 6)
(Simpson & Fitter, 1973; Green & Swets, 1966).
A recently developed d\(^{\prime }\) alternative is d\(^{\prime }_{p}\) which is purported to produce values that are similar to da but is easier to compute as one only requires the means of the z-transformed hit and false alarm rates (Vokey, 2016). The principle underlying d\(^{\prime }_{p}\) is that the value corresponds to the point of intersection between three lines in the z-transformed decision space – the regression lines of zHR on zFAR, zFAR on zHR, and the best fit line taken from the first principal component (PC1) of the principal components analysis (PCA) of zHR and zFAR (Vokey, 2016). While the theoretical basis for d\(^{\prime }_{p}\) comes from the intersection of the two regression and PC1 lines, the same value can be obtained from the much simpler approach of subtracting the mean of the zFAR from the mean of the zHR over confidence cutpoints or individual scores (see Eq. 7 from (Vokey, 2016)).
Limitations of current sensitivity measures
While da provides stability over a range of bias values, it does not necessarily correspond to the performance of an observer following a MEC strategy. In their justification of the development of da (Simpson & Fitter, 1973) note that Green & Swets (1966) relate the area under the ROC curve to detectability – “the maximum percentage correct (i.e., that achieved by an unbiased observer) in the two-alternative forced-choice situation is equal to the area under the yes–no receiver operating characteristic curve. This area, P(A), is an acceptable index of detectability because two-alternative forced-choice percentage is itself an intuitively acceptable measure of performance.” Despite the intention of (Simpson & Fitter, 1973) to develop a measure that corresponds to the maximum percentage correct – i.e., the area under the ROC curve – the lack of consideration of noise and signal trial base rates prevents the area under the ROC curve, da, or Az from doing so in scenarios where the number of noise and signal trials are not equal.
The strategy of the observer is important because, under unequal variance conditions where d\(^{\prime }\) fails to provide equal discriminability across observer bias, the value of the sensitivity metric at the criterion location that corresponds to the “maximum percentage correct” is in line with the characterization of detectability of Green and Swets (1966). Recall that when examining a best fit line through the space with zFAR on the x-axis and zHR on the y-axis, the x-intercept corresponds to d\(^{\prime }\) when the criterion is set to the mean of the signal distribution and the y-intercept corresponds to d\(^{\prime }\) when the criterion is set to the mean of the noise distribution (see Fig. 2). Typically, most observers set their criteria somewhere between these points.
A) Receiver operator characteristic (ROC) curve and B fit of the line from the first PCA component to z-transformed ROC data from Hanley and MacNeil (1982)
Adapted from Figure 3.3 in Macmillan and Creelman (2005) illustrating that da is an intermediate point between d\(^{\prime }_{1}\) and d\(^{\prime }_{2}\)
Unlike d\(^{\prime }\), calculation of da does not correspond to a decision criterion but is implicitly related to the decision process via the stated goal of Simpson and Fitter (1973) of corresponding to the “maximum percentage correct” (Green & Swets, 1966). This relationship between the area under the ROC curve and the proportion correct is also noted by Stanislaw and Todorov (1999). However, while Az provides an estimate for the area under the ROC curve (AUC), the AUC may not necessarily correspond to proportion of correct responses in scenarios involving unequal numbers of noise and signal trials for observers that adopt a MEC strategy. Moreover if one considers the value of a sensitivity measure as a reflection of distributional overlap, as suggested by Turner and Van Zandt (2014), then da fails to differentiate changes in base-rate scaled distributional overlap between scenarios in which the μsignal and σsignal remain constant but the base rates of noise and signal trials change, such as vigilance situations (Jerison, 1967; Thomson et al., 2016; See et al., 1997). This presents a problem because there is evidence that base rate manipulations shift observers’ criteria and, consequently, d\(^{\prime }\) when there is also unequal noise and signal variance (Macmillan & Creelman, 2005; Dorfman, 1969; Maddox & Bohil, 2004; 1998b). A case for da could be made on the basis that incorporating both the signal and noise variance captures discriminability across criteria over the entire decision space. However, it is not the case that observers set decision criteria across such a broad range but, rather, tend to select criteria near locations which optimize performance (Wixted & Gaitan, 2002; Brown, Steyvers, & Hemmer, 2007; Arnold, Higham, & Martín-Luengo, 2013). Even when collecting confidence ratings, the far tails of the noise and signal distributions are not necessarily reflected in the confidence cutpoints yielding “truncated” Gaussian noise and signal distributions (Mickes et al., 2007). Moreover, in the UVSDT model, the crossover problem describes issue of the likelihood of a signal exceeding that of noise if one sets the criterion towards leftward extreme of the noise distribution. Unlike da, the recently introduced d\(^{\prime }_{p}\) does correspond to performance following a MEC strategy when the data are centered on the criterion that corresponds to the intersection of the of base-rate scaled noise and signal distributions and minimizes error. However, it is not clear how d\(^{\prime }_{p}\) performs when bias is introduced. Lastly, failure to take base rates into consideration leads to distorted interpretation of response bias measures, such as β and c, if one conflates “unbiased” with a MEC strategy.
An alternative scaled distance metric: d\(^{\prime }_{o}\)
I suggest an alternative approach to assessing observer sensitivity and discriminability of the base-rate scaled signal and noise distributions. The proposed sensitivity metric is termed \(d^{\prime }_{o}\), where the subscripted “o” stands for “optimal”. The assumption underlying d\(^{\prime }_{o}\) is that observers generally employ a decision strategy that minimizes the error count (MEC) in a detection or classification task. To do so, the observer selects the fixed criterion that corresponds to the intersection of the base-rate scaled noise and signal distributions which minimizes the number of errors in any given decision scenario. As noted earlier, a MEC strategy is consistent empirical data across a broad range of tasks (Balakrishnan, 1998b; Balakrishnan & Macdonald, 2002; Maddox & Bohil, 1998a; 1998b; Madhavan et al., 2007; Stretch & Wixted, 1998). In a recent review of the SDT literature (Wixted, 2020) notes: “If humans always placed their decision criterion at that point, then they would be behaving as ideal observers. Moreover, when d’ decreased, the hit rate would decrease and the false alarm rate would increase. In studies of recognition memory, this predicted phenomenon is so universally observed that it is considered a lawful regularity, one known as the mirror effect (Glanzer & Adams, 1985; Glanzer & Adams, 1990; Glanzer, Adams, Iverson, & Kim, 1993). Thus, in this regard, humans behave a lot like (though not exactly like) ideal observers, and that fact is now incorporated into most models of recognition memory (McClelland & Chappell, 1998; Osth, Bora, Dennis, & Heathcote, 2017; Shiffrin & Steyvers, 1997)”. Moreover, (Dorfman & Alf, 1969) provide a brief analysis of how probability matching provides a better account for observed data than a “conservative” shift in decision criterion. Thus, the SDT goal of separating sensitivity from decision processes is achieved, in the proposed approach, by standardizing the decision process on a MEC strategy with the parameters of the base-rate scaled noise and signal distributions derived from the subject’s own data following the same μsignal and σsignal estimation method as is done for d\(^{\prime }_{p}\) (Vokey, 2016). It is important to note that the suggested approach is not equivalent to ideal observer analysis which seeks to describe the performance of a system under specified parameters for the purpose of solving a practical detection or classification problem (Edwards, 2004) or for providing a benchmark against which to compare human observers (Legge, Hooven, Klitz, Stephen Mansfield, & Tjan, 2002; Sims, Jacobs, & Knill, 2012; Ziebell, Collin, Rainville, Mazalu, & Weippert, 2020; MacDonald, 2011). The proposed solution provides a consistent value for each subject, with a given sensitivity, by determining d\(^{\prime }\) at the criterion location that corresponds to a MEC strategy – i.e., the optimal criterion, λ∗. The key to doing so across SDT scenarios in which there may be unequal variances, unequal base rates of signal and noise trials, or both, relies on taking these factors into account when identifying the criterion location that corresponds to the intersection of the base-rate scaled noise and signal distributions and which minimizes error. In order to do this, performance must be evaluated across the full range of decision criteria for any given decision scenario. Given Gaussian noise and signal distributions, as is often assumed in SDT, the function that describes the total number of errors is given in Eq. 8.
Eλ is the number of errors (i.e., misses + false alarms) committed at a particular criterion, λ. Φ(μ,σ) is the cumulative density function for the normal distribution with mean, μ, and standard deviation, σ. μs is distance of the signal distribution from the noise distribution in terms of the number of noise distribution standard deviations, σs is the standard deviation of the signal distribution relative to the noise distribution. Ns is the number of signal trials and Nn is the number of noise trials.
The computational resources that are now available make this approach accessible, in contrast to when d\(^{\prime }\) measures were originally developed. Computer code is provided that implements the method in R (Core Team, 2021), a widely using statistical programming environment, as well as in Julia, a freely available high-level programming language (Bezanson, Edelman, Karpinski, & Shah, 2017). Materials may be accessed at https://osf.io/x52aw/?view_only=4f2ed814b852484d9de2a4377f7eaeb1. In addition, an analytic alternative to calculating the optimal criterion location, λ∗, and associated metrics, is provided in Appendix A as an option for the R implementation. The analytic method produces the same results as the optimization method. I use d\(^{\prime }_{o}\) to refer to this proposed discriminability measure. The subscripted “o” stands for optimal to acknowledge that the value corresponds to d\(^{\prime }\) when the criterion is set to the optimal point – i.e., the point that minimizes the number of errors. In order to evaluate the suitability of d\(^{\prime }_{o}\) as a sensitivity measure, I compare the value of d\(^{\prime }_{o}\) to d\(^{\prime }_{1}\), d\(^{\prime }_{2}\), da, and d\(^{\prime }_{p}\) across a number of simulations in which bias, the relative variance of the signal distribution, and the base rate of signal trials are systematically varied. Moreover, I examine the relationship between these sensitivity measures, the area under the ROC curve, and proportion correct of an observer adopting a MEC strategy.
Method
The principle behind d\(^{\prime }_{o}\) is to provide stable sensitivity metric, independent of bias, by calculating d\(\prime\) for the observer that follows a MEC strategy in any given decision scenario. The core assumption is that the MEC strategy requires the subject to set their decision criterion at the intersection of the base-rate scaled noise and signal distributions under two constraints: 1) the noise and signal distributions are Gaussian and, 2) the subject uses a fixed criterion. No further assumptions are made regarding the relative variance of the signal-to-noise distribution, the base rates of stimuli presented from each distribution, or the response strategies of the subjects that provide the data. Prior to being able to identify the criterion that minimizes error, accurate representations of the noise and signal distributions must be generated. This requires working backwards from the data to reconstruct the signal distribution \(\mathcal {N}(\mu , \sigma )\) relative to a reference noise distribution \(\mathcal {N}(0,1)\).
The slope of the ROC line in a z-transformed decision space provides the relative variance of the noise distribution to the signal distribution. In order to find the slope, a principal components analysis (PCA) line is fit through this space, as described by (Vokey, 2016). The mean of the signal distribution is the x-intercept (i.e., where zHR = 0) and σ = 1/slope. See Fig. 2 for an illustration of the PCA fit through the data from a single observer from (Hanley & MacNeil, 1982) and Fig. 3 for an illustration of the base-rate scaled noise and signal distributions generated using the data from (Hanley & MacNeil, 1982).
A) Optimal criterion corresponds to the intersection point of the base-rate scaled noise and signal distributions. Data from Hanley and MacNeil (1982). The number of misses and false alarms are reflected in the dark and light grey shaded regions. The total number of errors, i.e., misses + false alarms is minimized at λ∗. B Optimal criterion minimizes total number of errors. Data from Hanley and MacNeil (1982)
The optimal criterion is established by finding the intersection of the base-rate scaled noise and signal distributions which minimizes the number of errors, in keeping with the goal of simulating a MEC strategy Note, that this is not the same point that minimizes the rates of misses and false alarms when there are an unequal number of stimuli presented from the noise and signal distributions. The computer code provided (Aujla, 2022) determines the optimal criterion using Optim package in Julia (Mogensen & Riseth, 2018). While the Optim package provides a variety of optimization algorithms, the present simulations were conducted with the default Nelder–Mead algorithm – a direct search approach that uses a simplex to iteratively identify the optimal location (Nelder & Mead, 1965). Similar packages exist for R (e.g., “Optimization”) (Husmann, Lange, & Spiegel, 2017). In addition, under the assumption of Gaussian distributions, the optimal criterion may be determined analytically for both UVSDT following Eq. 10 (Stretch & Wixted, 1998) and following Eq. 11 for EVSDT (Wixted & Gaitan, 2002). R code in Appendix A provides option for both analytic and optimization-based generation of d\(^{\prime }_{o}\) and associated bias metrics. See panel A of Fig. 3 for an illustration of how the optimal criterion corresponds to the intersection between the base-rate scaled noise and signal distributions, scaled according to the number of stimuli from each distribution, and the right panel for an illustration of how the optimal criterion corresponds to the point of the minimum number of errors using the data from (Hanley & MacNeil, 1982) The sensitivity measure d\(^{\prime }_{o}\) is simply zHR - zFAR at the optimal criterion, λ∗. It is critical to note that the optimal criterion does not necessarily correspond to one that would be unbiased as determined by traditional SDT metrics. For example, as computed, the optimal criterion produces c = 0.238 and β = 1.137 which both indicate a conservatively biased observer. The reason that these metrics do not reflect the “optimal” criterion location in the present scenario is that the data are drawn from distributions with unequal variance and with unequal numbers of noise and signal trials (i.e., Nnoise = 58, Nsignal = 51) (Hanley & MacNeil, 1982).
Simulation 1: d\({^{\prime }}\) across bias levels in equal and unequal variance scenarios
Greater signal vs noise variance has been observed in recognition memory experiments (Wixted, 2007; Mickes et al., 2007; Macmillan & Creelman, 2005) and poses difficulty for traditional signal detection analysis as d\(^{\prime }\) no longer is an equal discriminability measure over changes in bias (Macmillan & Creelman, 2005). Specifically, increasingly conservative criteria generate higher d\(^{\prime }\) values and increasingly liberal criteria generate lower d\(^{\prime }\) values with the same underlying distributions (Rotello et al., 2008).
The goal of Simulation 1 was to systematically evaluate how d\(^{\prime }_{o}\) compares to other sensitivity measures in handling UVSDT scenarios across levels of bias. Samples were generated from noise \(\mathcal {N}(0,1)\) and signal distributions \(\mathcal {N}(2,\sigma\)) with σsignal set to 0.5, 1.0, or 2.0. Each variance scenario included conditions for three levels of bias – liberal, optimal, and conservative. In all conditions the distance between decision cutpoints was set to 0.25. For example, in the EVSDT scenario with μ = 2.0 and σ = 1.0, the cutpoints were 0.5, 0.75, 1.0, 1.25, and 1.5. The optimal condition reflected simulated data centered on the optimal criterion, λ∗ (i.e., that which minimized the total number of errors). The liberal and conservative conditions reflected criterion and cutpoints shifted by 0.25 to the left or right from the optimal condition, respectively. That is, the cutpoints for the liberal condition were 0.25, 0.5, 0.75, 1.0, and 1.25 and the cutpoints for the conservative condition were 0.75, 1.0, 1.25, 1.5, and 1.75. See panel A from Fig. 4 for an illustration of λ∗ and the cutpoints used for the optimal condition in the equal variance scenario.
Results
When bias was introduced into the EVSDT scenario – i.e., binormal noise and signal distributions of equal variance – values for all sensitivity measures were equal to μ (i.e., the difference between the noise and signal means in noise standard deviation units), irrespective of bias (see panel B of Fig. 4).
In UVSDT scenarios, the intercept-based sensitivity measures (d\(^{\prime }_{1}\) and d\(^{\prime }_{2}\)) were unaffected by bias but over- or under-represented true discriminability due to only taking into account the variance in the noise or signal distributions, respectively (see panels C and D in Fig. 4). The intermediate scaled distance metric da is a mathematical compromise between the two intercept measures (see Eq. 5) and exhibited stability over levels of bias. In the optimal condition, the criterion corresponded to the point of minimal error and both d\(^{\prime }_{p}\) and d\(^{\prime }_{o}\) reflected discriminability by following a MEC strategy. However, unlike d\(^{\prime }_{o}\), d\(^{\prime }_{p}\) was not stable when a conservative or liberal bias was introduced (see panels C and D in Fig. 4). In other words, d\(^{\prime }_{o}\) provides a measure of discriminability of the observer that follows a MEC strategy even when the data are generated by observers that do not do so. While both d\(^{\prime }_{o}\) and da provide stability across response strategies – i.e., liberal or conservative biases – and produced intermediate values between d\(^{\prime }_{1}\) and d\(^{\prime }_{2}\), they were not identical. The difference between da and d\(^{\prime }_{o}\) indicates that da does not reflect nominal d\(^{\prime }\) for the observer that adopts a MEC strategy.
A) The location of the optimal criterion, λ∗, and cutpoints when \(noise = \mathcal {N}(0,1)\) and \(signal = \mathcal {N}(2,1)\). Simulations results for scaled distance metrics, d, across levels of bias when B \(noise = \mathcal {N}(0,1)\) and \(signal = \mathcal {N}(2,1)\), C \(noise = \mathcal {N}(0,1)\) and \(signal = \mathcal {N}(2,2)\), and D \(noise = \mathcal {N}(0,1)\) and \(signal = \mathcal {N}(2,0.5)\). “Liberal” bias was generated by decreasing λ and cutpoints by 0.25 and a “conservative” bias was generated by increasing λ and cutpoints by 0.25
Simulation 2: d\({^{\prime }}\) across base rates of signal trials in equal and unequal variance scenarios
An often-neglected consideration in signal detection scenarios is the base rate of signal trials. Vigilance scenarios, in particular, typically involve in a low proportion of signal trials (Jerison, 1967; Thomson et al., 2016; See et al., 1997). Interpretation of signal detection metrics in vigilance scenarios often fail to account for the impact of unequal base rates despite evidence that they influence how subjects set their decision criteria (Bohil & Wismer, 2015; Madhavan et al., 2007; Dorfman, 1969). The goal of simulation 2 was to systematically evaluate how d\(^{\prime }_{o}\) compares to other d\(^{\prime }\) measures in handling scenarios involving varying proportions of signal trials in both EVSDT and UVSDT scenarios.
Samples were generated from noise \(\mathcal {N}(0,1)\) and signal distributions \(\mathcal {N}(2,\sigma )\) with the base rate of signal trials set to 0.1, 0.25, or 0.5 and σ set to 0.5, 1.0, 1.5, or 2.0 In all conditions, as per Simulation 1, five confidence cutpoints were again utilized with the distances between cutpoints set at 0.25 and centered on the optimal criterion, λ∗.
An illustration of overlap between noise (left) and signal (right) base-rate scaled distributions with either a signal proportion of 0.5 (row A), 0.25 (row B), or 0.10 (row C) with σsignal = 0.50, 1.0, 1.5, 2.0 with a nominal μ = 2. Note that distributional overlap, as reflected in the total error proportion, 𝜖, increases monotonically with σsignal when signal proportion is 0.5 but not when signal proportion is 0.10
Implied ROC functions with signal proportion of 0.5 (row A), 0.25 (row B), or 0.10 (row C) with σsignal = 0.50, 1.0, 1.5, 2.0 with a nominal μ = 1, 2 or 3. Slopes of the tangent lines to the optimal criterion (indicated with a circle) correspond to optimal β following Eq. 4. Note the one exception to β corresponding to the slope of the tangent line to the optimal criterion when μ = 1.0 and σ = 0.5. This occurs because there is no intersection between the base-rate scaled distributions as the signal distribution is subsumed by the noise distribution
Results
Under equal variance conditions, all sensitivity measures reflected the mean difference between the base-rate scaled noise and signal distributions irrespective of the proportion of signal trials (see the equality of all sensitivity measures at σ = 1.0 in Fig. 7).
Under unequal variance conditions all sensitivity measures, excepting d\(^{\prime }_{1}\), declined with increasing variance, as was the case in Simulation 1. This decline reflected the decrease in discriminability that results from increased variance. However, the degree to which measured sensitivity declined varied across scaled distance metrics d. As expected, d\(^{\prime }_{2}\) showed the greatest decline with increasing variance because it represents the distance between the base-rate scaled noise and signal distributions as a function of the signal distribution unlike da which reflects both the noise and signal variance (see Eq. 5). While d\(^{\prime }_{1}\), d\(^{\prime }_{2}\), and da remained invariant over changes in the base rate of signal trials, d\(^{\prime }_{p}\) and d\(^{\prime }_{o}\) did not. Moreover, the degree of change between d\(^{\prime }_{o}\) or d\(^{\prime }_{p}\) and da was related to both signal base rate and signal variance. At σ = 2, d\(^{\prime }_{o}\) and d\(^{\prime }_{p}\) increased as the base rate of signal trials decreased while at σ = 0.5, d\(^{\prime }_{p}\) and d\(^{\prime }_{o}\) decreased with decreasing base rate of signal trials. More generally, Fig. 7 illustrates how the relationship between d\(^{\prime }_{o}\) or d\(^{\prime }_{p}\) and signal variance increasingly departed from that between da and signal variance such that values “flattened” as the base rate of signal trials decreased. This “flattening” is similar to the “flattening” pattern we see in proportion correct for an observer following a MEC strategy – i.e., one who sets their criterion at the intersection of the noise and signal base-rate scaled distributions (see left column of panels in Fig. 7).
Simulation 3: Correspondence between scaled distance metrics d, unbiased performance, and the area under the ROC curve
Results from Simulation 1 establish that da and d\(^{\prime }_{o}\) provide stable measures of sensitivity independently of bias while d\(^{\prime }_{p}\) is susceptible to bias. Results from Simulation 2 demonstrate that, for an observer that follows a MEC strategy, all measures of sensitivity are stable over changes in the EVSDT case. However, only da, d\(^{\prime }_{1}\), and d\(^{\prime }_{2}\) are stable across changes in the base rate of signal trials in the unequal variance case. The goal of Simulation 3 was to examine whether the observed changes in d\(^{\prime }_{p}\) and do across the base rate of signal trials from Simulation 2, in the unequal variance case, reflect changes in the discriminability of the underlying decision space.
A standard approach to establishing the discriminability of decision scenario is to calculate the area under the ROC curve (AUC), which is purported to correspond to the maximum proportion correct (Green & Swets, 1966; Simpson & Fitter, 1973; Hanley & MacNeil, 1982; Donaldson, 1996). This is consistent with the adoption of a MEC strategy. Equation 6 calculates Az from da which provides an estimate for the AUC and, therefore, also to the maximum proportion correct. However, results from Simulations 1 and 2 reveal differences between d\(^{\prime }_{o}\) and da under unequal variance conditions, with equal or unequal signal base rates. One difficulty with relating the area under the ROC curve to the maximum proportion correct achievable in a decision scenario is that it assumes an equal number of noise and signal trials. This is evident as the ROC curve is a function of the hit and false alarm rate rather than the number of hits and false alarms. As a result, changes in overlap of the base-rate scaled noise and signal distributions that occur with base rate changes are not reflected in the AUC. The actual proportion of correct responses is simply 1 - proportion of errors and the proportion of errors is given by the combined misses and false alarms which is equal to the overlap of the base-rate scaled noise and signal distributions when the criterion is set to the intersection of the two. Figure 5 illustrates how overlap of the base-rate scaled noise and signal distributions relates to σsignal with equal (top row) or unequal (middle and bottom rows) base rates of noise and signal trials. The corresponding implied ROC functions illustrate the correspondence between the hit and false alarm rate at the optimal criterion and optimal β as function of changes in the signal base rate (see Fig. 6). Note that, following Eq. 4, β at the optimal criterion is equal to the ratio of noise to signal trials (Swets et al., 1961).
Returning to the concept of discriminability following a MEC strategy suggests optimal performance provides a benchmark against which to evaluate sensitivity measures. Thus, the goal of Simulation 3 was to evaluate how changes in sensitivity metrics across levels of signal variance and base rate correspond to performance following a MEC strategy and the area under the ROC curve AUC. Pearson correlations were calculated between each of the scaled distance metrics and the AUC and optimal performance over levels of σsignal for each base rate condition. The parameters for the base rates of signal trials and σ were the same as that in Simulation 2 (i.e., base rates of 0.1, 0.25, or 0.5 and σ of 0.5, 1.0, 1.5, or 2.0).
Results from Simulation 3 examining changes in proportion correct for an observer following a MEC strategy and the area under the ROC curve across levels of signal variance when the base rate of signal trials is 0.5 (A) 0.25 (B), or 0.10 (C). Pearson correlation coefficients, r, between optimal performance and the AUC decline with decreasing signal base rates
Results
Figure 8 illustrates the optimal performance and the AUC results for Simulation 3. Under the equal base rate condition (i.e., signal proportion = 0.5), both optimal performance and the AUC exhibited a monotonic decline with increasing variance. Note that optimal performance reflects maximum proportion correct in a yes/no task while the AUC and related Az reflect the maximum proportion correct if subjects had performed a two-alternative forced-choice task (Stanislaw & Todorov, 1999). The difference between the tasks changes sensitivity such that \(d^{\prime }_{{2AFC}} = \sqrt {2}d^{\prime }_{Y/N}\).
However, while the AUC was unaffected by base rate changes in signal base rate, optimal performance was not stable across changes in the base rate of signal trials. Figure 8). Specifically, optimal performance increased as the base rate of signal trials declined. Across levels of variance, changes in d\(^{\prime }_{o}\) and d\(^{\prime }_{p}\) tracked changes in optimal performance in each signal proportion scenario (see Table 1). As expected, following Eq. 6, the AUC corresponded to Az derived from da but diverged from optimal performance (see Table 1). The correlation between optimal performance and the AUC diverged with decreasing signal base rate (see Fig. 8). This was driven by changes in optimal performance rather than the area under the ROC curve, which remained invariant across base rates. The correspondence of d\(^{\prime }_{o}\) and d\(^{\prime }_{p}\) with optimal performance and da with the area under the ROC curve was quantified using Pearson correlations between the scaled distance metric and the AUC or optimal performance (see Table 1). As expected, at the optimal criterion, β was equal to the proportion of signal-to-noise trials and consistent with a criterion at which the probability of the stimulus being from the base-rate scaled signal and noise distributions was equally likely (see Eq. 4).
Thus, there are two main patterns that emerge from Simulation 3. First, the correspondence between the area under the ROC curve and optimal performance only holds for the textbook SDT case in which the noise and signal distributions have equal base rates. Second, the difference between da and d\(^{\prime }_{o}\) across changes in variance reflects differences in two approaches to discriminability – the AUC and optimal performance, respectively.
Discussion
Signal detection theory provides a framework in which to understand the decision space in which observers operate. In the textbook case, in which the variances and number of stimuli drawn from each distribution are equal, d\(^{\prime }\) provides in an invariant measure of the distance between the means irrespective of decision criteria. When variances and base rates of signal and noise stimuli are not equal, da provides a stable sensitivity measure over the criterion range and across levels of bias, but is dissociated from discriminability of an observer that adopts a MEC strategy This is surprising as it is contrary to a central justification for da over other scaled distance metrics, including de, that it corresponds to the maximum performance of an “unbiased observer” (Simpson & Fitter, 1973). If one considers the degree of overlap between the base-rate scaled noise and signal distributions to reflect discriminability see Fig. 5), then da fails to capture the increased discriminability that accompanies the decrease in base-rate scaled distributional overlap when the base rate of signal trials decreases (see Figs. 7 and 8). The recently developed d\(^{\prime }_{p}\) does reflect changes in decision scenarios involving different base rates but does not provide a stable measure of discriminability across levels of bias in UVSDT (see Fig. 4). The proposed d\(^{\prime }_{o}\) has been developed to address these issues by providing a measure of sensitivity corresponding to the criterion set by following a MEC strategy, irrespective of the variance of the signal distribution relative to the noise distribution, the number of stimuli drawn from each distribution, or the strategy of the observer from which the data are obtained. Moreover, d\(^{\prime }_{o}\) is the only sensitivity measure that directly corresponds to base-rate scaled distributional overlap at the optimal criterion, λ∗, where the combined number of misses and false alarms is equal to the base-rate scaled overlap of the noise and signal distributions (see Fig. 5).
The principle guiding d\(^{\prime }_{o}\) is, in any scenario, to assess sensitivity for an observer that follows a MEC strategy. Providing that solution requires finding the criterion which minimizes the number of errors and computing d\(^{\prime }\) as the difference between zH and zFA at that criterion. Note that the criterion that minimizes the number of errors is not the same as that which minimizes the rate of errors – i.e., the combined miss and false-alarm rates. A previous analytic solution to finding the optimal criterion identified the location at which the base-rate scaled signal and noise distributions intersect. However, such solutions for the intersection often do not account base rates of stimulus type. For example, (Macmillan & Creelman, 2005) provide the following equation to solve for the intersection which does not include any consideration stimulus base rate.
Equation 9 does not take into account changes in discriminability that result from changes in the proportion of signal trials under unequal variance conditions. Figure 5 illustrates the difference in discriminability between unequal variance SDT scenarios that differ only in the proportion of signal trials. Not only does the intersection point change between these scenarios, but so does the discriminability, as reflected in the overlap between the base-rate scaled noise and signal distributions and, as a result, the number of errors made at the optimal criterion. While both d\(^{\prime }_{p}\) and d\(^{\prime }_{o}\) track the change in discriminability in such scenarios, only do is stable over changes in bias (see Fig. 4). (Stretch & Wixted, 1998) provide an alternative approach to finding the criterion location that does consider base rate information and that corresponds to optimal performance, under unequal variance conditions.
In Eq. 10, λ corresponds to a desired likelihood ratio of new to old, or noise to signal, L. The slope of the line in the z-transformed space of the hit to false alarm rate corresponds to the variance of the noise to signal distributions. The parameter r is equal to 1/slope. Note that because there are two intersections between the density functions of noise and signal in the unequal variance scenario, that the equation provides two solutions for any given set of parameters – this is known as the “crossover problem” (Kaernbach, 1991; Green & Swets, 1974). Only one intersection, however, provides the criterion location for the point of minimal errors, as would be expected from following a MEC strategy Also note that Eq. 10 does not provide a solution in the equal variance case as an r value of 1 results in the denominator being 0 and, therefore, a criterion value of \(\infty\). For the equal variance SDT case, (Wixted & Gaitan, 2002) provide Eq. 11 to calculate the familiarity value, f, that corresponds to a desired likelihood ratio of new to old, L(f), in recognition memory. When L(f) is set to the ratio of new to old, or noise to signal, trials, f corresponds to the location of the optimal criterion. In the context of signal detection theory, in general, λ is equivalent to f and d\(^{\prime }\) is equal to the mean of the signal distribution.
Alternatives to Gaussian distributions to represent signal and noise stimuli, such as the exponential (Green & Swets, 1974) or Poisson distributions, Kaernbach (1991) solve the “crossover problem” and, unlike the UVSDT model, produce symmetric ROC curves. An advantage of the computational approach described to finding the optimal criterion, as described in the present manuscript, is that it is flexible enough to accommodate such alternative distributions.
Changes in discriminability related to base rates in UVSDT are reflected in d\(^{\prime }_{o}\), consistent with the measured sensitivity of the observer that follows a MEC strategy. Unlike d\(^{\prime }_{o}\), da remains invariant with changes in discriminability that result from altering the base rates of signal trials. The failure of da to track optimal performance may be unexpected given the formal correspondence between da and Az, which reflects the area under the ROC curve. However, once the decision scenario deviates from the assumption of equal base rates of noise and signal stimuli, the area under the ROC curve no longer corresponds to optimal performance (see Fig. 8). In essence, da and d\(^{\prime }_{o}\) reflect different approaches to understanding discriminability. The area under the ROC curve reflects data collected from observers, with equal sensitivity, distributed over the full spectrum of criterion locations. The correspondence between Az, da, and the area under the ROC curve reflect this idea and, accordingly, take into account how the noise and signal distributions interact over a broad range of decision criteria. Alternatively, d\(^{\prime }_{o}\) considers the relationship between the base-rate scaled noise and signal distributions only at the criterion at which they intersect and at which the number of errors is minimized. This reflects the discriminability value that would be obtained by following a MEC strategy I propose that the latter approach is the principled one, as it reflects the connection between SDT and the minimizing error (Green and Swets, 1966; Swets et al., 1961).
In addition to the theoretical connection between SDT and the observer which achieves the “maximum percentage correct” (Green & Swets, 1966), there are empirical reports that observers optimize – i.e., are sensitive to errors and adjust their decision criterion to minimize the number of errors (Wixted & Gaitan, 2002; Brown et al., 2007; Arnold et al., 2013). However, it is important to note that there is variability in the willingness of some subjects to shift criterion (Miller & Kantner, 2020) and that factors, such as category discriminability, also influence the impact of base rates on criterion shifting (Bohil & Wismer, 2015). At face value, it may appear unrealistic that subjects are able to optimize responding in a signal detection scenario. Minimizing error by setting the criterion at the point at which the likelihood ratio of hits to false alarms appears to require the observer to have a priori knowledge of the underlying noise and signal distributions. However, (Wixted & Gaitan, 2002) show that such optimization is demonstrated both in non-human animal and human subjects and requires only standard learning theory to generate. Moreover, in vigilance scenarios, it has long been noted that observers generate large β scores (Jerison, 1967; See et al., 1997; Thomson et al., 2016). While these scores may be interpreted as inattention (increasing β reflects a decrease in hit rate), once the proportion of trials is accounted for, the optimal β, under equal variance conditions, is the proportion of noise to signal trials. The shift in propensity to say “yes” as a function of signal base rate has also been noted outside of vigilance scenarios (Creelman, 1965; Dorfman, 1969; Madhavan et al., 2007). In fact, the sensitivity of observers to base rate information has been shown to have a disproportionately large influence on decision making (Maddox & Bohil, 2004). (Brown et al., 2007) have also shown that subjects are able to strategically shift their criterion to optimize performance in response to changing task difficulty.
While both d\(^{\prime }_{o}\) and d\(^{\prime }_{p}\) capture changes in discriminability with unequal base rates combined with unequal noise and signal variance, only d\(^{\prime }_{o}\) maintains stability when response bias is introduced. Maintaining discriminability, in the unequal variance SDT case, over changes in decision criteria is particularly important in recognition memory where it has been proposed that “knowing” and “remembering” may represent different confidence levels along the same decision dimension (Donaldson, 1996). Given greater signal relative to noise variance and a higher decision criterion for a “remember” vs. “know” response, following a MEC strategy would be expected to shift the criterion to a greater extent than would be predicted by the difference in the mean of the signal distribution alone. A core feature of d\(^{\prime }_{o}\) is that, unlike d\(^{\prime }_{p}\), it is not affected by systemic bias, instructions, etc. because it is derived by simulating a subject that adopts a MEC strategy independently of the subject’s actual decision strategy.
As is the case with da, accurate reconstruction of the signal and noise distributions is critical to deriving d\(^{\prime }_{o}\). The reconstruction of the complete noise and signal distributions permits selection of any theoretical observer across levels of bias with the same sensitivity, including the observer that follows a MEC strategy. The parameters for reconstructing the distributions are derived from the slope and intercept of the best fit line through the z-transformed ROC space. Specifically, the x- and y-intercepts correspond to d\(^{\prime }_{1}\) and d\(^{\prime }_{2}\) and 1/slope corresponds to σsignal relative to the noise distribution. Unlike linear regression, the first principal component line (PC1) provides the least-squares solution to minimizing the sum of squares distances to the line in both dimensions. Moreover, the fit of the PC1 line is comparable to fits derived from maximum-likelihood solutions (Vokey, 2016).
Bayesian approaches to deriving μsignal and σsignal from hits and false alarm data provide an alternative to the PCA approach used in the present investigation. Researchers employing these (Rouder & Lu, 2005; Selker, van den Bergh, Criss, & Wagenmakers, 2019; DeCarlo, 2012; Turner & Van Zandt, 2014; Lee, 2008; Maddox & Bohil, 1998b)methods must choose an appropriate, typically uninformative, prior in the form of a uniform distribution or Gaussian distribution with a large variance, to generate an estimates of μsignal and σsignal (Rouder & Lu, 2005). However, (Rouder & Lu, 2005) also note that “vaguely informative” priors present an advantage over uninformative priors due to convenience and also to providing better estimation when some domain-specific knowledge is known. Bayes’ rule has also been applied to incorporate base-rate information into estimating the criterion location (Birnbaum, 1983; Mueser et al., 1999). (Birnbaum, 1983; Mueser et al., 1999) demonstrate that doing so results in criterion placement and performance consistent with that of an optimal observer in the “cab problem”, as described by (Tversky & Kahneman, 1980). Unlike the present approach, hierarchical Bayesian parameter estimation can improve parameter estimation by better accounting for item- and participant-level variability (Rouder & Lu, 2005). For example, when applying a hierarchical Bayesian approach to a group of subjects, it is assumed that a common process underlies the behavior of the subjects such that a “parent” distribution representing the group can be used to provide information that improves parameter estimation for individual subjects (Turner & Van Zandt, 2014). The hierarchical approach results in the adjustment of parameter estimates towards the group mean in cases where a high level of noise results in an extreme estimate (Rouder & Lu, 2005; Turner & Van Zandt, 2014).
However, the advantages of Bayesian parameter estimation come at the expense of requiring greater statistical sophistication on the part of the investigator. For example, in their application of Bayesian parameter estimation to SDT, Rouder and Lu (2005) choose a beta distribution, and specify \(a^{\prime }\) and \(b^{\prime }\) parameters, as a prior for estimating parameters of a binomial distribution. When estimating μ, Rouder and Lu (2005) select a Gaussian prior with large variance – a noninformative prior – but note that the degree of information about the dependent variable determines whether an informative or noninformative prior is preferred. The choice of prior is not trivial and can make the difference between whether a Bayesian approach provides an advantage or disadvantage vs. conventional parameter estimation – “Whether the Bayesian or the conventional estimate is better depends on the accuracy of the prior” (Rouder & Lu, 2005). In their application of Bayesian parameter estimation to SDT, Selker et al., (2019), for example, do not use a noninformative prior to estimate μ. For a typical SDT user, these types of considerations may lead to avoidance of Bayesian estimation or, worse, inferior estimates if an inappropriate prior is chosen. The use of Bayesian parameter estimation to more accurately derive μsignal and σsignal parameters is compatible with the present approach. The goal of d\(^{\prime }_{o}\) to derive a sensitivity estimate from an optimal criterion location is agnostic with respect to method from which μsignal and σsignal are derived. Thus, one could replace the PCA-based parameter estimation from Vokey (2016) with Bayesian estimation.
While I have proposed that d\(^{\prime }_{o}\) is superior to alternative scaled distance metrics d with respect to quantifying the discriminability of a decision scenario, there remain some outstanding issues that d\(^{\prime }_{o}\) does not fully resolve. First, while d\(^{\prime }_{o}\) tracks changes in discriminability that result from the interaction of unequal variance and unequal signal-to-noise base rates, it does not capture changes in discriminability, as reflected in decreased overlap between the base-rate scaled noise and signal distributions (see Fig. 5) and changes in optimal performance, that result from base rate changes in the equal variance case (see Fig. 7). This is a limitation shared by all discriminability measures assessed in the present investigation. Second, as is the case for da and d\(^{\prime }_{p}\), d\(^{\prime }_{o}\) can only be calculated when there is a range of confidence levels. Third, the present simulations did not take manipulate the cost/payoff structure of the decision scenario. However, the present approach can accommodate such information by “scaling” the noise and signal distributions appropriately (see Appendix B).
Previous investigations of the effects of cost/payoff structure on decision processes have revealed that incentives have an impact on criterion placement (Bohil and Maddox, 2001; 2003; Maddox & Bohil, 1998b; 1998a; Lynn & Barrett, 2014). (Lynn & Barrett, 2014) provide a framework in their “utilized” SDT model that takes such information into account following (Tanner & Swets, 1954) who describe a cost/payoff adjusted β.
At a particular criterion location, P(SN) is the probability of a signal, P(N) is the probability of noise, VN⋅CA is the value, or payoff, for a correct rejection, VSN⋅A is the value, or payoff, for a hit, KN⋅A is the cost of a false alarm, and KSN⋅CA is the cost of a miss. When accounting for base rate information in the present investigation, the size of the noise and signal distributions were “scaled” by their base rates (see Eq. 8). Essentially, the signal distribution determines the total cost of misses and payoffs from hits from the area under the curve to the left and right of the criterion location. Similarly, the noise distribution determines the total cost of false alarms and payoffs from correct rejections from the area under the curve to the left and right of the criterion location. To account for cost/payoff structure, an additional scaling of the distributions can be done, following Eq. 12, resulting in Eq. 13.
Applying the cost/payoff adjusted β to the present approach for calculating d\(^{\prime }_{o}\) produces changes that are similar to those seen with base rate manipulations (see Appendix ?? for methods and results). This approach is also consistent with previous work on base rate and payoff factors in which the decision criterion, reflected by β, is effectively scaled using information from the cost/payoff matrix (Maddox & Bohil, 1998a; Bohil & Maddox, 2001). As is the case with base rate manipulations, the extent to which the decision processes of subjects are sensitive to the cost/payoff structure can be inferred from the difference between a subject’s nominal bias and sensitivity metrics vs those generated by the present approach.
When to use d\(^{\prime }_{o}\)?
d\(^{\prime }_{o}\) and its associated bias metrics, co and βo, incorporate base rate information, variance of the signal distribution, and cost/payoff structure (see Table 2 and Appendix B). To the extent that this information influences how subjects set their decision criterion is directly related to how closely d\(^{\prime }_{o}\) matches a subject’s measured d\(^{\prime }\) or d\(^{\prime }_{p}\). However, it is important to note that, independently of such considerations, d\(^{\prime }_{o}\) still reflects a subject’s sensitivity following the same chain of reasoning as (Simpson & Fitter, 1973) provide for da – i.e., that it corresponds to the “maximum percentage correct”. However, as demonstrated in the results from Simulation 3, d\(^{\prime }_{o}\) more closely tracks optimal performance across σsignal and signal base rates than does da (see Table 1). For this reason, I suggest that d\(^{\prime }_{o}\) provides a principled measure of sensitivity with respect to correspondence with performance following a MEC strategy which can be used wherever confidence thresholds are available, particularly in scenarios in which subjects are known to be influenced by base rate or cost/payoff information.
Alternatively, one can use d\(^{\prime }_{o}\) and the bias metrics associated λ∗ to infer the degree to which subjects’ decision processes are influenced by base rate or cost/payoff information. The difference between a subject’s measure c and β and the optimal co and βo generated with the “optimal.d” code for the subject provides an indication of the impact the base rate and/or cost/payoff information for a particular SDT scenario.
Conclusions
Empirical data suggests that some SDT scenarios involve unequal signal and noise distribution variance, sometimes in combination with unequal base rates. While alternate sensitivity measures, such as d\(^{\prime }_{1}\), d\(^{\prime }_{2}\), and da, provide stable estimates across response bias, they do not reflect the discriminability of distributions, as reflected in base-rate scaled distributional overlap, by an observer that follows a MEC strategy using a fixed criterion. Moreover, none of these measures capture changes in discriminability, under unequal variance conditions, when base rates are manipulated, because they are calculated from parameters that do not account for such information. While d\(^{\prime }_{p}\) does reflect changes in noise and signal proportion, it lacks stability over changes in bias in the unequal variance case. The proposed d\(^{\prime }_{o}\) addresses both issues – it reflects changes in discriminability as a function of base rate while also remaining stable over changes in response bias. This is achieved by following the principle of providing the nominal d\(^{\prime }\) value at the optimal criterion – that which minimizes error or maximizes payoffs – irrespective of the distributional scenario. This removes the onus from the researcher to account for the impact of such changes on d\(^{\prime }\), particularly in scenarios in which subjects are known to be sensitive to such information.
Open Practices Statement
All materials, including the computer code used for the simulations, plotting the results, and generating bias metrics for the tables, are provided for complete transparency and replication. An Open Science Foundation repository housing these materials, along with instructions, may be accessed at:
https://osf.io/x52aw/?view_only=4f2ed814b852484d9de2a4377f7eaeb1
Notes
λ is adopted as a symbol for criterion following (Wickens, 2002).
L is substituted for λ to represent the likelihood ratio because λ is used to indicate the criterion value in the present manuscript.
References
Arnold, M. M., Higham, P. A., & Martín-Luengo, B. (2013). A little bias goes a long way: The effects of feedback on the strategic regulation of accuracy on formula-scored tests. Journal of Experimental Psychology: Applied, 19(4), 383–402. https://doi.org/10.1037/a0034833.
Aujla, H. (2022). Methods and code for \(\text {d}_{\prime }^{o}\): Sensitivity at the optimal criterion location. Retrieved June 16, 2022, from https://osf.io/x52aw/?view_only=4f2ed814b852484d9de2a4377f7eaeb1.
Balakrishnan, J. D. (1998a). Measures and interpretations of vigilance performance: Evidence against the detection criterion. Human Factors: The Journal of the Human Factors and Ergonomics Society, 40(4), 601–623. https://doi.org/10.1518/001872098779649337.
Balakrishnan, J. D. (1998b). Some more sensitive measures of sensitivity and response bias. Psychological Methods, 3, 68–90. https://doi.org/10.1037/1082-989X.3.1.68.
Balakrishnan, J. D., & Macdonald, J. A. (2002). Decision criteria do not shift: Reply to Treisman. Psychonomic Bulletin & Review, 9(4), 858–865. https://doi.org/10.3758/BF03196345.
Barrett, A. B., Dienes, Z., & Seth, A. K. (2013). Measures of metacognition on signal-detection theoretic models. Psychological Methods, 18(4), 535–552. https://doi.org/10.1037/a0033268.
Bezanson, J., Edelman, A., Karpinski, S., & Shah, V. B. (2017). Julia: A fresh approach to numerical computing. SIAM Review, 59(1), 65–98.
Birnbaum, M. H. (1983). Base rates in Bayesian inference: Signal detection analysis of the cab problem. The American Journal of Psychology, 96(1), 85. https://doi.org/10.2307/1422211.
Bohil, C. J., & Maddox, W. T. (2001). Category discriminability, base-rate, and payoff effects in perceptual categorization. Perception & Psychophysics, 63(2), 361–376. https://doi.org/10.3758/BF03194476.
Bohil, C. J., & Maddox, W. T. (2003). A test of the optimal classifiers independence assumption in perceptual categorization. Perception & Psychophysics, 65(3), 478–493. https://doi.org/10.3758/BF03194577.
Bohil, C. J., & Wismer, A. J. (2015). Implicit learning mediates base rate acquisition in perceptual categorization. Psychonomic Bulletin & Review, 22(2), 586–593. https://doi.org/10.3758/s13423-014-0694-2.
Brown, S., Steyvers, M., & Hemmer, P (2007). Modeling experimentally induced strategy shifts. Psychological Science, 18(1), 40–45. https://doi.org/10.1111/j.1467-9280.2007.01846.x.
Creelman, C. D. (1965). Discriminability and scaling of linear extent. Journal of Experimental Psychology, 70(2), 192–200. https://doi.org/10.1037/h0022193.
DeCarlo, L. T. (2012). On a signal detection approach to -alternative forced choice with bias, with maximum likelihood and Bayesian approaches to estimation. Journal of Mathematical Psychology, 56(3), 196–207. https://doi.org/10.1016/j.jmp.2012.02.004.
Donaldson, W. (1996). The role of decision processes in remembering and knowing. Memory & Cognition, 24(4), 523–533. https://doi.org/10.3758/bf03200940.
Dorfman, D. D. (1969). Probability matching in signal detection. Psychonomic Science, 17(2), 103–103. https://doi.org/10.3758/bf03336468.
Dorfman, D. D., & Alf, E. (1969). Maximum-likelihood estimation of parameters of signal-detection theory and determination of confidence intervals-rating method data. Journal of Mathematical Psychology, 6(3), 487–496. https://doi.org/10.1016/0022-2496(69)90019-4.
Edwards, D. C. (2004). Ideal observer estimation and generalized ROC analysis for computer-aided diagnosis. Medical Physics, 31(5), 1308–1308. https://doi.org/10.1118/1.1688038.
Egan, J. P. (1958). Recognition memory and the operating characteristic. USAF Operational Applications Laboratory Technical Note, 58-51, ii, 32-ii, 32.
Glanzer, M., & Adams, J. K. (1985). The mirror effect in recognition memory. Memory & Cognition, 13(1), 8–20. https://doi.org/10.3758/BF03198438.
Glanzer, M., & Adams, J. K. (1990). The mirror effect in recognition memory: Data and theory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16(1), 5–16. https://doi.org/10.1037/0278-7393.16.1.5.
Glanzer, M., Adams, J. K., Iverson, G. J., & Kim, K. (1993). The regularities of recognition memory. Psychological Review, 100(3), 546–567. https://doi.org/10.1037/0033-295X.100.3.546.
Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. [Pages: xi, 455]. John Wiley.
Green, D. M., & Swets, J. A. (1974). Signal detection and psychophysics [Section: XIII, 479 Seiten]. Krieger.
Hanley, J. A., & MacNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology, 143, 29–36.
Husmann, K., Lange, A., & Spiegel, E. (2017). The R package optimization. Flexible global optimization with simulated-annealing.
Jerison, H. J. (1967). Signal detection theory in the analysis of human vigilance. Human Factors: The Journal of the Human Factors and Ergonomics Society, 9(3), 285–288. https://doi.org/10.1177/001872086700900310.
Kaernbach, C. (1991). Poisson signal-detection theory: Link between threshold models and the gaussian assumption. Perception & Psychophysics, 50(5), 498–506. https://doi.org/10.3758/BF03205066.
Lee, M. D. (2008). BayesSDT: Software for Bayesian inference with signal detection theory. Behavior Research Methods, 40(2), 450–456. https://doi.org/10.3758/BRM.40.2.450.
Legge, G. E., Hooven, T. A., Klitz, T. S., Stephen Mansfield, J., & Tjan, B. S. (2002). Mr. chips 2002: New insights from an ideal-observer model of reading. Vision Research, 42(18), 2219–2234. https://doi.org/10.1016/S0042-6989(02)00131-1.
Lynn, S. K., & Barrett, L. F. (2014). “Utilizing” signal detection theory. Psychological Science, 25(9), 1663–1673. https://doi.org/10.1177/0956797614541991.
MacDonald, J. A. (2011). Using the ideal observer to predict performance in perceptual tasks: An example from the auditory temporal masking domain. Attention, Perception, & Psychophysics, 73(8), 2639–2648. https://doi.org/10.3758/s13414-011-0213-8.
Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user–s guide, 2nd ed. [Pages: xix, 492]. Lawrence Erlbaum Associates Publishers.
Maddox, W. T., & Bohil, C. J. (1998a). Base-rate and payoff effects in multidimensional perceptual categorization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 1459–1482.
Maddox, W. T., & Bohil, C. J. (1998b). Overestimation of base-rate differences in complex perceptual categories. Perception & Psychophysics, 60(4), 575–592. https://doi.org/10.3758/BF03206047.
Maddox, W. T., & Bohil, C. J. (2004). Probability matching, accuracy maximization, and a test of the optimal classifiers independence assumption in perceptual categorization. Perception & Psychophysics, 66, 104–118. https://doi.org/10.3758/BF03194865.
Madhavan, P., Gonzalez, C., & Lacson, F. C. (2007). Differential base rate training influences detection of novel targets in a complex visual inspection task. Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 51(4), 392–396. https://doi.org/10.1177/154193120705100451.
McClelland, J. L., & Chappell, M. (1998). Familiarity breeds differentiation: A subjective-likelihood approach to the effects of experience in recognition memory. Psychological Review, 105(4), 724–760.
Mickes, L., Wixted, J. T., & Wais, P. E. (2007). A direct test of the unequal-variance signal detection model of recognition memory. Psychonomic Bulletin & Review, 14(5), 858–865.
Miller, M. B., & Kantner, J. (2020). Not all people are cut out for strategic criterion shifting. Current Directions in Psychological Science, 29(1), 9–15. https://doi.org/10.1177/0963721419872747.
Mogensen, P. K., & Riseth, A. N. (2018). Optim: A mathematical optimization package for Julia. Journal of Open Source Software, 3(24), 615. https://doi.org/10.21105/joss.00615.
Mueser, P. R., Cowan, N., & Mueser, K. T. (1999). A generalized signal detection model to predict rational variation in base rate use. Cognition, 69(3), 267–312. https://doi.org/10.1016/S0010-0277(98)00072-9.
Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308–313. https://doi.org/10.1093/comjnl/7.4.308.
Osth, A. F., Bora, B., Dennis, S., & Heathcote, A. (2017). Diffusion vs. linear ballistic accumulation: Different models, different conclusions about the slope of the zROC in recognition memory. Journal of Memory and Language, 96, 36–61. https://doi.org/10.1016/j.jml.2017.04.003.
Core Team, R (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria.
Rotello, C. M., Masson, M. E. J., & Verde, M. F. (2008). Type i error rates and power analyses for single-point sensitivity measures. Perception & Psychophysics, 70(2), 389–401. https://doi.org/10.3758/PP.70.2.389.
Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. https://doi.org/10.3758/BF03196750.
See, J. E., Warm, J. S., Dember, W. N., & Howe, S. R. (1997). Vigilance and signal detection theory: An empirical evaluation of five measures of response bias. Human Factors: The Journal of the Human Factors and Ergonomics Society, 39(1), 14–29. https://doi.org/10.1518/001872097778940704.
Selker, R., van den Bergh, D., Criss, A. H., & Wagenmakers, E.-J. (2019). Parsimonious estimation of signal detection models from confidence ratings. Behavior Research Methods, 51(5), 1953–1967. https://doi.org/10.3758/s13428-019-01231-3.
Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM: Retrieving effectively from memory. Psychonomic Bulletin & Review, 4(2), 145–166. https://doi.org/10.3758/BF03209391.
Simpson, A. J., & Fitter, M. J. (1973). What is the best index of detectability?. Psychological Bulletin, 80(6), 481–488. https://doi.org/10.1037/h0035203.
Sims, C. R., Jacobs, R. A., & Knill, D. C. (2012). An ideal observer analysis of visual working memory. Psychological Review, 119(4), 807–830. https://doi.org/10.1037/a0029856.
Stanislaw, H., & Todorov, N. (1999). Calculation of signal detection theory measures. Behavior Research Methods, Instruments, & Computers, 31(1), 137–149. https://doi.org/10.3758/BF03207704.
Stretch, V., & Wixted, J. T. (1998). Decision rules for recognition memory confidence judgments. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24(6), 1397–1410. https://doi.org/10.1037/0278-7393.24.6.1397.
Swets, J. A., Tanner, W. P., & Birdsall, T. G. (1961). Decision processes in perception. Psychological Review, 68(5), 301–340. https://doi.org/10.1037/h0040547.
Tanner, W. P., & Swets, J. A. (1954). A decision-making theory of visual detection [Place: US Publisher: American Psychological Association]. Psychological Review, 61(6), 401–409. https://doi.org/10.1037/h0058700.
Thomson, D. R., Besner, D., & Smilek, D. (2016). A critical examination of the evidence for sensitivity loss in modern vigilance tasks. Psychological Review, 123(1), 70–83. https://doi.org/10.1037/rev0000021.
Turner, B. M., & Van Zandt, T. (2014). Hierarchical approximate Bayesian computation. Psychometrika, 79(2), 185–209. https://doi.org/10.1007/s11336-013-9381-x.
Tversky, A., & Kahneman, D. (1980). Causal schemas in judgments under uncertainty. In M. Fishbein (Ed.) Progress in social psychology (pp. 49–72). Psychology Press, Taylor & Francis Group.
Vokey, J. R. (2016). Single-step simple ROC curve fitting via PCA. Canadian Journal of Experimental Psychology/Revue canadienne de psychologie exprimentale, 70(4), 301–305. https://doi.org/10.1037/cep00000.
Wickens, T.D. (2002). Elementary signal detection theory. Oxford University Press.
Wixted, J. T. (2007). Dual-process theory and signal-detection theory of recognition memory. Psychological Review, 114(1), 152–176. https://doi.org/10.1037/0033-295X.114.1.152.
Wixted, J. T. (2020). The forgotten history of signal detection theory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 46(2), 201–233. https://doi.org/10.1037/xlm0000732.
Wixted, J. T., & Gaitan, S. C. (2002). Cognitive theories as reinforcement history surrogates: The case of likelihood ratio models of human recognition memory. Animal Learning & Behavior, 30(4), 289–305. https://doi.org/10.3758/BF03195955.
Ziebell, L., Collin, C., Rainville, S., Mazalu, M., & Weippert, M. (2020). Using an ideal observer analysis to investigate the visual perceptual efficiency of individuals with a history of non-suicidal self-injury when identifying emotional expressions (S. A. Arias, Ed.) PLOS ONE, 15(2), e0227019. https://doi.org/10.1371/journal.pone.0227019.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Code listing for optimal d




Appendix B: Cost/payoff structure
Following Eq. 13, a simulation was conducted to examine the effects of cost/payoff manipulations on d\(^{\prime }_{o}\) and associated bias metrics in three different base rate conditions (signal proportion = 0.5, 0.25, or 0.1) and across a range of σsignal (0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0). Cost/payoff scenarios were constructed as follows: a “liberal” structure that required a leftward shift to maximize gains had payoffs of 1.0 for hits and 0.5 for correct rejections and costs of 1.0 for misses and 0.5 for false alarms, a “conservative” structure that required a rightward shift to maximize gains had payoffs of 0.5 for hits and 1.0 for correct rejections and costs of 0.5 for misses and 1.0 for false alarms, and a “neutral” structure in which all payoffs and costs were set to 0.5. d\(^{\prime }_{o}\) was set at the location that optimizes utility (i.e., maximizes gains) following the adjustment to β described by (Tanner & Swets, 1954) and adapted by (Lynn & Barrett, 2014) in their “utilized” SDT model.
Tables B1, B2, and B3 summarize bias metrics and d\(^{\prime }_{o}\) for cost/payoff scenarios that favor a leftward shift in the decision criterion (“liberal”), no shift in the decision criterion (“neutral”), or a rightward shift in the decision criterion (“conservative”). Note that β corresponding to the optimal criterion reflected both base rate and cost/payoff structure consistent such that βo = Nnoise/Nsignal × (Vcr + Cfa)/(Vh −Cm)
Appendix C: Impact of sample size on zROC and scaled distance metrics d
Appendix D: Comprehensive Monte Carlo simulations
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Aujla, H. d\(^{\prime }_{o}\): Sensitivity at the optimal criterion location. Behav Res 55, 2532–2558 (2023). https://doi.org/10.3758/s13428-022-01913-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-022-01913-5