There may be large variations in sensitivity and specificity of clinical and digital structural examinations which depend on the comparison test. The risk of bias in study designs is significant.
Systematic review
A systematic review was conducted through October 6, 2011 and existing databases were screend to identify relevant systematic reviews. The quantity, quality, and consistency of the body of available evidence was assesses for answering the question ‘What is the predictive value of screening tests for open-angle glaucoma?'
One systematic review (Burr et al., 2007) addressed the diagnostic test accuracy of candidate screening tests for the detection of OAG. Burr et al. (2007) conducted a diagnostic test accuracy review of candidate diagnostic and screening tests for OAG. Highly sensitive systematic electronic searches were undertaken by December 2005. The investigators included 40 studies totaling more than 48,000 participants 40 years of age and older and those at high risk for the development of OAG based on demographic characteristics or comorbidities. The focus was on studies of participants likely to be encountered in a routine screening setting. The primary reference standard was confirmation of OAG at followup examination. Also considered was diagnosis of OAG requiring treatment. No studies were at low risk of bias. A small subset of eight studies was judged to have higher quality.
After the Burr et al. 2007 systematic review, 4,960 studies were identified, of which 83 studies addressing the accuracy of screening tests were eligible. The sensitivity of standard automated perimetry (SAP) was higher than Goldmann tonometry, similar to the Heidelberg retina tomograph (HRT), and lower than disc photos or frequency doubling technology (FDT) visual field testing. The specificity of SAP was higher than disc photos and FDT, similar to HRT, and lower than Goldmann tonometry. Some comparisons of tests could not be performed due to variability in populations and reported thresholds. No other studies were identified.
68% of studies were at high risk of spectrum bias (not representative of those who would receive the test in practice). 6% had differential verification bias (different reference standards). The candidate tests were interpreted without knowledge of reference standard in only 29% of studies. 48% of the studies did not include an explanation of withdrawals from the study, and 46% of the studies reported the number of uninterpretable test results. Only 3 of 83 studies included a population-based sample.
1. Tests of Optic Nerve Structure
1.1. Heidelberg Retina Tomograph II
Evidence From Burr et al., 2007
HRT II was a diagnostic test of interest in 3 studies. Using the common criterion of one or more results that are borderline or outside normal limits, the pooled sensitivity was 86 percent (95% credible interval [CrI], 55 to 97) and the pooled specificity was 89 percent (95% CrI, 66 to 98).
Evidence From Primary Studies
Seventeen primary studies included measures of diagnostic accuracy for HRT II. Two studies. specifically focused on detecting early or moderate glaucoma. One study enrolled 60 participants with glaucoma (30 with early defects and 30 with moderate visual field defects) and 60 healthy volunteers. AUC values were reported to be in the range of 0.474 (disc area ratio parameter) to 0.852 (vertical cup-to-disc ratio parameter). Another study enrolled 70 participants with early or moderate glaucoma and 70 healthy volunteers. The range of sensitivity across 12 parameters was from 47 percent (RNFL cross-sectional area) to 74 percent (linear cup/disc area ratio), and the range of specificity was from 47 percent (mean RNFL thickness) to 71 percent (cup shape measure). The remaining 15 studies explored comparisons of HRT II with other devices, such as the GDx with VCC (variable corneal compensation), OCT, HRT III, and FDT. Overall, HRT II was found not to perform as well as GDx VCC, OCT, or FDT. HRT II and HRT III were found to have a similar diagnostic profile. Three of the included studies concluded that HRT II was not an appropriate tool for population-based glaucoma screening studies.
1.2. Heidelberg Retina Tomograph III
Evidence From Primary Studies
Eleven studies examined the diagnostic accuracy of HRT III. One study identified 81 participants with early visual field loss (out of 247 participants with glaucoma) and 142 healthy volunteers. Early visual field loss was defined as a mean deviation less than 5dB. The sensitivity of the Glaucoma Probability Score for distinguishing eyes with early field loss from healthy eyes was 68 percent, and that of the Moorfields Regression Analysis was 72 (at a fixed specificity of 92 percent). The investigators concluded: "Moorfields Regression Analysis and Glaucoma Probability Score have similar ability to detect glaucomatous changes, and typically agree. The relative ease and sensitivity of the operator-independent Glaucoma Probability Score function of the HRT III may facilitate glaucoma screening."
Another study compared four imaging methods for their ability to distinguish early glaucoma from healthy eyes. 46 eyes of 46 participants with early OAG and 46 eyes from healthy volunteers were enrolled. Sensitivity (parameter: reference height) ranged from 4 to 70 percent (Frederick S. Mikelberg discriminant function and Reinhard O. W. Burk discriminant function) when holding the specificity of the test constant at 95 percent.
2. Ophthalmoscopy
Evidence From Burr et al., 2007
Burr et al. (2007) included seven studies addressing the diagnostic accuracy of ophthalmoscopy. Using a common cutoff point of a vertical cup-to-disc ratio greater than or equal to 0.7, pooled sensitivity for the five studies with this common criterion was 60 percent (95% CrI, 34 to 82 percent), and specificity was 94 percent (95% CrI, 76 to 99). The diagnostic odds ratio (DOR) was 25.7 (95% CrI, 5.79 to 109.50), suggesting a 26-fold higher odds of a positive test among those with glaucoma than those without glaucoma.
3. Optical Coherence Tomography (OCT)
Evidence From Primary Studies
Of the 47 included studies that investigated the diagnostic accuracy of OCT, 34 considered the Stratus OCT, 10 included the Cirrus OCT, 6 considered the RTVue OCT, 2 included the Spectralis OCT, 2 examined the OTI OCT, and 1 included the OTI Spectral OCT/SLO. Across the 34 studies that examined the Stratus OCT, all were at high risk of spectrum bias because those with known disease as well as those with healthy eyes were enrolled in the studies. The sample size ranged from 26 to 95 participants with glaucoma or suspected glaucoma and 37 to 128 healthy volunteers, with one study also enrolling 130 participants with ocular hypertension. For the parameter average RNFL thickness, the range of sensitivity was 24 to 96 percent, suggesting appreciable heterogeneity among the studies. The range of specificity was 66 to 100 percent.
4. Optic Disc Photography
Evidence From Burr et al., 2007
There were six studies of optic disc photography. The range of sensitivity was from 65 to 77 percent, and the range of specificity was from 59 to 98 percent.
Evidence From Primary Studies
Two studies of the diagnostic accuracy of optic disc photography and one study of cup-to-disc ratio measurement as measured by an ophthalmologist using a slit-lamp biomicroscope and 78 Diopter lens were included. Danesh-Meyer et al. (2006) included participants with OAG as well as glaucoma suspects and healthy volunteers. The AUC (comparison of those deemed to have glaucoma and borderline disease vs. normal) was 0.84 (95% confidence interval [CI], 0.74 to 0.92) for the cup-to-disc ratio and 0.95 (95% CI, 0.80 to 0.98) for the Disc Damage Likelihood Score, suggesting that the Disc Damage Likelihood Score is a more effective means of discriminating people with and without disease. The diagnostic accuracy of cup-to-disc ratio measurement from the Francis et al. (2011) study is described in the section on FDT C-20 perimetry.
5. RNFL Photography
Evidence From Burr et al., 2007
The common cut-off point for the four included studies was diffuse and/or localized defect observed on RNFL photographs. The pooled diagnostic odds ratio was 23.1 (95% CrI, 4.41 to 123.50), and the pooled sensitivity and specificity were 75 and 88 percent, respectively.
Evidence From Primary Studies
Two studies examined the accuracy of RNFL photography. One study analyzed RNFL photographs of 72 glaucoma and 48 healthy participants. Results showed the RNFL defect score II, with an AUC of 0.75 (p < 0.001), was the best parameter for discriminating early glaucoma from healthy eyes (sensitivity, 58.3 percent; specificity, 95.8 percent). Another study compared RNFL photography with the GDx with VCC in 42 participants with OAG, 32 persons suspected of having OAG, and 40 healthy volunteers. The sensitivities of the global RNFL score were 36 and 81%, respectively, for fixed specificities of 95 and 80%. At a fixed specificity of 95%, the sensitivity of the Nerve Fiber Indicator was 71% versus the 36% reported above for red-free photos. Overall, the global RNFL score determined from red-free photos did not perform as well as scanning laser polarimetry. The AUC was 0.91 for the GDx with VCC Nerve Fiber Indicator versus 0.84 for the global RNFL score.
6. Scanning Laser Polarimetry (GDx)
Evidence From Primary Studies
Twenty-seven studies included an investigation of the GDx with VCC. The aim of eight studies was to discriminate early glaucoma from no disease. In the studies that focused on early OAG, the range of sensitivity across all comparisons and cutoffs for the most frequently reported parameter—Temporal, Superior, Nasal, Inferior, Temporal average—was 30 to 82%. Specificity was fixed at 80, 90, or 95% in three studies, and the lowest reported specificity was 66%. The range in sensitivity for the nerve fiber indicator parameter across all comparisons and cutoffs was from 28 to 93%. The lowest specificity reported was 53 percent or was fixed at 80, 90, or 95%.
Three studies examined the GDx with enhanced corneal compensation (ECC). The sample sizes of the included studies ranged from 63 to 92 glaucoma participants and 41 to 95 healthy volunteers. One study compared the AUCs for GDx with VCC and GDx with ECC, and reported that GDx with ECC performed significantly better than GDx with VCC for the parameters Temporal, Superior, Nasal, Inferior, Temporal average, Superior average, and Inferior average (p = <0.01). Two other studies and concurred that imaging with ECC appears to improve the ability to diagnose OAG.