********************************************************************************.
* Description: Relative efficacy of a screening test - SPSS Calculator (Advanced).
*
*	       The diagnostic utility of a screening test is relative to the prior probability
*	       (or base rate/prevalence) of the condition/disease in the population.  
*	       You may use this calculator to see how changing the prevalence  
*	       at a given sensitivity and specificity impacts the tests predictive power and
*	       criterion cost value.  
*	       
*	       As an example, a clinical psychologist uses a test to screen for PTSD in a 
*	       young adult male with HIV who has self-presented in crisis to a mental health 
*              inpatient unit. She uses a PTSD screening test that has a specificity of 81% 
*              and sensitivity of 83% at the cut-off score (> 31) to make a positive diagnosis – 
*	       as recommended by the test developer. These parameters came from an analysis
*	       in which the prevalence of PTSD in the research sample was 50%. 
*	       The male scores 34 on the screen (which is associated with a specificity of 
*              84% and sensitivity of 86%) and she makes a provisional diagnosis of PTSD.  
*              She is unaware that the prevalence of PTSD in this population is rare at around 
*              2.5%.  When the prevalence of 2.5% is taken into account, the patient's score of 34 
*              has an associated positive predictive value (PPV) of just 12.11%. 
*	       This shows that the probability the adult male with HIV had PTSD was only 12.11%. 
*	       Had the prevalence of PTSD been 50% in population of patients the psychologist was treating
*	       then the PPV would have been the same as the sensitivity (i.e., 86%).
*
*	       For a useful discussion of these issues see:
*
*	       Rosenfeld, B. (2000). Have We Forgotten the Base Rate Problem? 
*	       Methodological Issues in the Detection of Distortion. Archives of 
*	       Clinical Neuropsychology, 15(4), 349–359. 
*	       https://doi.org/10.1016/s0887-6177(99)00025-6
*
* Instructions:
*	       Enter the number of cases in the diseased group that 
*              test positive (a) and negative (b); 
*              and the number of cases in the non-diseased group that test positive (c) 
*              and negative (d) and the costs associated with correct and incorrect classifications.
*
*	       If the sample sizes in the positive (diagnosis present) and the negative (diagnosis absent)
*	       groups do not reflect the real prevalence of the psychological condition, enter the prevalence 
*	       (expressed as a decimal percentage) under p to replace 999 otherwise leave as is.
*
* References:  
*	       Example data was taken from Table VIII (Mercaldo, Lau, & Zhou, 2007, p. 2179).
*	    
*	       Mercaldo, N. D., Lau, K. F., & Zhou, X. H. (2007). Confidence intervals for predictive 
*	       values with an emphasis to case–control studies. Statistics in Medicine, 26(10), 
*              2170–2183. https://doi.org/10.1002/sim.2677
*
*	       Altman, D.G., Machin, D., Bryant, T.N., & Gardner, M.J. (Eds) (2000). Statistics with confidence, 
*	       2nd ed. BMJ Books.
*
*	       The following is also a useful read on how to choose the optimal criterion threshold value taking 
*              into account the disease prevalence and cost of false and true positive and negative decisions.
*	      
*	       Zweig, M. H., & Campbell, G. (1993). Receiver-operating characteristic (ROC) plots: A fundamental 
*	       evaluation tool in clinical medicine. Clinical Chemistry, 39(4), 561–577. 
*	       https://doi.org/10.1093/clinchem/39.4.561
*
*	       Greiner, M., Pfeiffer, D., & Smith, R. D. (2000). Principles and practical application of the 
*              receiver-operating characteristic analysis for diagnostic tests. Preventive veterinary medicine, 
*              45(1-2), 23–41. https://doi.org/10.1016/s0167-5877(00)00115-x
*
*
*	       Cebul, R. D., Hershey, J. C., & Williams, S. V. (1982). Using Multiple Tests: Series and Parallel 
*	       Approaches. Clinics in Laboratory Medicine, 2(4), 871–890. https://doi.org/10.1016/s0272-2712(18)31018-7
*
* Warnings:    
*	       There is no confidence intervals for Accuracy - I couldn't find a suitable method to
*	       estimate these - at any rate Accuracy is not a terribly useful statistic.  Knowing the
*	       the Criterion Cost Value (s) is important but ideally this would be plotted by prevalence
*	       and threshold scores.  Nonetheless it has been provided for reasons of thoroughness.
*	       Note you will need a data file open with anything in it for this program to run. 
*	       Use this program at your own risk - I do not guarantee the accuracy of the results.
*
*Use: 	       Free for non-commercial use with acknowledgement.  This spss program is licensed under an Attribution-NonCommercial-ShareAlike 4.0 International.
*
*	       Suggested citation:
*
*	       Little, J (2015-2023). SPSS calculator for the efficacy of a screening test (Version 1.09). 
*	       https://www.statpsychservices.com.au/resources. Melbourne: Statistical Psychological Services.
*
*
* Writer: Jonathon Little
* Date: 	 08/12/2015
* Date Modified: 05/03/2023
* Version: 1.09
*********************************************************************************.

*
* Sample data:
*
* Screening  |	Diagnosis Present   n		Diagnosis Absent  n	  Total
* Test	     |	
* ___________|___________________________________________________________________________
* Positive   |	True Positive	a   = 240	False Positive	c = 87    a + c = 327
* Negative   |	False Negative	b   = 178       True Negative	d = 288   b + d = 466
* ___________|___________________________________________________________________________
* Total	     |			a+b = 418	 	      c+d = 375
*
*
*******************************ENTER USER VALUES BELOW************************************.

*Prevalence Condition/Disease/Diagnosis etc - enter your prevalence instead of 999 otherwise leave as is - 
*see instructions above.

COMPUTE p = 999.
*True positive.
COMPUTE a = 240.
*False negative.
COMPUTE b = 178.
*False positive.
COMPUTE c = 87.
*True negative.
COMPUTE d = 288.
EXECUTE.

*ENTER COSTS*.
*FPc: the cost of a false positive decision.
COMPUTE FPc=1.
*FNc: the cost of a false negative decision. 
COMPUTE FNc=1.
*TPc: the cost of a true positive decision. 
COMPUTE TPc=0. 
*TNc: the cost of a true negative decision. 
COMPUTE TNc=0. 


COMPUTE alpha = .05.
EXECUTE.
*Confidence level - default is .05 which will produce a 95% confidence interval.
*Note that if you change from .05 to something else the numerical values of the 
*confidence intervals will change accordingly; however, they will still be labelled
*95% confidence intervals in the output due to the inflexibility of this program.
******************************DO NOT ALTER THE CODE BELOW THIS LINE***********************.

DO IF p = 999.
COMPUTE p = ((a+b)/(a+b+c+d)).
END IF.
EXECUTE.

COMPUTE Se =  a / (a+b).	 	 
COMPUTE Sp =  d / (c+d). 
EXECUTE.	 
COMPUTE PLR = Se / (1-Sp).	 	 
COMPUTE NLR = (1-Se) / Sp.
	 	  
COMPUTE PPV =  ((Se*p)/((Se*p + (1-Sp) * (1 - p)))).	 
COMPUTE NPV =  ((Sp * (1-p))/((1-Se) * p + Sp * (1 - p))).	 	 
COMPUTE Accuracy = (Se * p) + Sp * (1 - p).
EXECUTE.

*(Greiner et al., 2000, p. 31 and 38).
COMPUTE s = (((FPc-TNc)/(FNc-TPc))*((1-p)/p)).
EXECUTE.

*Equation 3 (Mercaldo et al 2007).
COMPUTE logit_PPV = ln((Se*p)/((1-Sp)*(1-p))).
COMPUTE logit_NPV = ln((Sp*(1-p))/((1-Se)*p)).
EXECUTE.

*Equation 8 and 9 (Mercaldo et al 2007)..
COMPUTE VAR_logit_PPV = ((1-Se)/Se)*(1/(a+b))+(Sp/(1-Sp))*(1/(c+d)).
COMPUTE VAR_logit_NPV = (Se/(1-Se))*(1/(a+b))+((1-Sp)/Sp)*(1/(c+d)).
EXECUTE.

*Equation 10 - standard logit interval (Mercaldo et al 2007).
COMPUTE Upper_Logit_PPV = logit_PPV + (1.96*(SQRT(VAR_logit_PPV))).
COMPUTE Lower_Logit_PPV = logit_PPV - (1.96*(SQRT(VAR_logit_PPV))).
COMPUTE Upper_Logit_NPV = logit_NPV + (1.96*(SQRT(VAR_logit_NPV))).
COMPUTE Lower_Logit_NPV = logit_NPV - (1.96*(SQRT(VAR_logit_NPV))).
EXECUTE.

*Equation 11 - (Mercaldo et al 2007) - Standard logit interval transformed back to original scale for interpretability.
COMPUTE PPV_Upper = (exp(Upper_Logit_PPV))/(1+(exp(Upper_Logit_PPV))).
COMPUTE PPV_Lower = (exp(Lower_Logit_PPV))/(1+(exp(Lower_Logit_PPV))).
COMPUTE NPV_Upper = (exp(Upper_Logit_NPV))/(1+(exp(Upper_Logit_NPV))).
COMPUTE NPV_Lower = (exp(Lower_Logit_NPV))/(1+(exp(Lower_Logit_NPV))).
EXECUTE.

**************Exact Clopper-Pearson confidence intervals***START***************************.

COMPUTE n = a+b.
COMPUTE n1 = a.
COMPUTE f1 = IDF.F(1-alpha/2, 2*n1, 2*(n-n1+1)).
COMPUTE f2 = IDF.F(alpha/2, 2*(n1+1), 2*(n-n1)).
COMPUTE Se_Upper = (1+(n-n1+1)/(n1*f1))**(-1).
COMPUTE Se_Lower = (1+(n-n1)/((n1+1)*f2))**(-1).
EXECUTE.

COMPUTE n = c+d.
COMPUTE n1 = d.
COMPUTE f1 = IDF.F(1-alpha/2, 2*n1, 2*(n-n1+1)).
COMPUTE f2 = IDF.F(alpha/2, 2*(n1+1), 2*(n-n1)).
COMPUTE Sp_Upper = (1+(n-n1+1)/(n1*f1))**(-1).
COMPUTE Sp_Lower = (1+(n-n1)/((n1+1)*f2))**(-1).
EXECUTE.

**************Exact Clopper-Pearson confidence intervals***END***************************.

**************Confidence intervals for liklihood ratios****BEGIN*************************.
*Uses Log Method given by Altman (2000, p. 109-110)*.

*This small adjustment below overcomes the problem of a or b being zero which
*would create an infinite SE_log.

COMPUTE a = a+.01.
COMPUTE b = b+.01.
COMPUTE c = c+.01.
COMPUTE d = d+.01.
EXECUTE.

*Notice b and c are swapped compared with the NLR_SE_log forumla.
COMPUTE PLR_SE_log = SQRT((1/a)-(1/(a+b))+(1/c)-(1/(c+d))).
EXECUTE.

COMPUTE Upper_Logit_PLR = (ln(PLR)) + (1.96*(PLR_SE_log))).
COMPUTE Lower_Logit_PLR = (ln(PLR)) - (1.96*(PLR_SE_log))).
EXECUTE.

COMPUTE PLR_Upper = (exp(Upper_Logit_PLR)).
COMPUTE PLR_Lower = (exp(Lower_Logit_PLR)).
EXECUTE.

*Notice b and c are swapped compared with the PLR_SE_log forumla.
COMPUTE NLR_SE_log = SQRT((1/a)-(1/(a+c))+(1/b)-(1/(b+d))).
EXECUTE.

COMPUTE Upper_Logit_NLR = (ln(NLR)) + (1.96*(NLR_SE_log))).
COMPUTE Lower_Logit_NLR = (ln(NLR)) - (1.96*(NLR_SE_log))).
EXECUTE.

COMPUTE NLR_Upper = (exp(Upper_Logit_NLR)).
COMPUTE NLR_Lower = (exp(Lower_Logit_NLR)).
EXECUTE.

**************Confidence intervals for liklihood ratios****END***************************.

COMPUTE Se = se*100.
COMPUTE Sp = sp*100.
COMPUTE Se_upper = se_upper*100.
COMPUTE Se_lower = se_lower*100.
COMPUTE Sp_upper = sp_upper*100.
COMPUTE Sp_lower = sp_lower*100.
COMPUTE PPV = PPV*100.
COMPUTE NPV = NPV*100.
COMPUTE Accuracy = Accuracy*100.
COMPUTE P = p *100.
COMPUTE PPV_Upper = PPV_upper*100.
COMPUTE PPV_Lower = PPV_lower*100.
COMPUTE NPV_upper = NPV_upper*100.
COMPUTE NPV_lower = NPV_lower*100.
EXECUTE.

FORMAT SE Sp Accuracy p PPV NPV (PCT6.2).
FORMAT PLR NLR s (F4.2).

PRINT /'Sensitivity: 'Se'(95% CI'Se_lower','Se_Upper')'.
PRINT /'Specificity: 'Sp'(95% CI'Sp_lower','Sp_Upper')'.
PRINT /'Positive Likelihood Ratio: 'PLR'(95% CI'PLR_lower','PLR_Upper')'.
PRINT /'Negative Likelihood Ratio: 'NLR'(95% CI'NLR_lower','NLR_Upper')'.
PRINT /'Accuracy: 'Accuracy''.
PRINT /'Prevalence: 'p''.
PRINT /'Positive Predictive Value: 'PPV'(95% CI'PPV_lower','PPV_Upper')'.
PRINT /'Negative Predictive Value: 'NPV'(95% CI'NPV_lower','NPV_Upper')'.
PRINT /'Criterion Cost Value: 's''.
PRINT /''.
PRINT /'Have a nice day!'.
PRINT /''.
PRINT /''.
EXECUTE.