Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr;11(4):385-92.
doi: 10.1038/nmeth.2855. Epub 2014 Feb 23.

Sleep-spindle detection: crowdsourcing and evaluating performance of experts, non-experts and automated methods

Affiliations

Sleep-spindle detection: crowdsourcing and evaluating performance of experts, non-experts and automated methods

Simon C Warby et al. Nat Methods. 2014 Apr.

Abstract

Sleep spindles are discrete, intermittent patterns of brain activity observed in human electroencephalographic data. Increasingly, these oscillations are of biological and clinical interest because of their role in development, learning and neurological disorders. We used an Internet interface to crowdsource spindle identification by human experts and non-experts, and we compared their performance with that of automated detection algorithms in data from middle- to older-aged subjects from the general population. We also refined methods for forming group consensus and evaluating the performance of event detectors in physiological data such as electroencephalographic recordings from polysomnography. Compared to the expert group consensus gold standard, the highest performance was by individual experts and the non-expert group consensus, followed by automated spindle detectors. This analysis showed that crowdsourcing the scoring of sleep data is an efficient method to collect large data sets, even for difficult tasks such as spindle identification. Further refinements to spindle detection algorithms are needed for middle- to older-aged subjects.

PubMed Disclaimer

Conflict of interest statement

COMPETING INTERESTS STATEMENT

All authors report no conflicts of interest.

Figures

Figure 1
Figure 1
Generation of the gold standard and spindle detection performance of individual experts. (a) Histogram of the number of epochs viewed by 24 expert scorers. Each bin represents one expert, and they are arranged in descending order. (b) Histogram of the number of times epochs were viewed by a specific number of experts. (c) Mean by-event performance (F1-score) of individual experts (shading is standard deviation) at varying thresholds of consensus. Average performance was maximized at Tegc = 0.25, and this level of group consensus was used to generate the gold standard expert dataset. (d) Number of spindles found at each Tegc threshold bin. Vertical line indicates optimal performance at Tegc = 0.25. (e) Cumulative number of spindles over the Tegc range. Horizontal line indicates the expert group identified 1987 spindles at Tegc = 0.25. (f) Precision-recall plot of individual expert performance. Each square is one expert; the intensity of the color is scaled according to how many epochs each expert viewed. The darkest squares are the experts that saw the most data. The line connected to each square indicates the decrease in performance in the leave-one-out analysis that excludes the individual from the expert group consensus to correct for reporting bias. The position of the square indicates the performance after correction.
Figure 2
Figure 2
By-event and by-subject characteristics of 1,988 spindles in the gold standard dataset. (a) Duration. (b) Frequency. (c) Maximum peak-to-peak amplitude in the 11–16 Hz band. (d) Symmetry, measured as location of the maximum peak-to-peak amplitude relative to the length of the spindle. Example spindles for each characteristic is provided above the histogram. Black bars indicate spindle identification. (e) Spindle density in the 110 subjects. (f) Correlation between spindle density and subject age (R2 = 0.055, p-value = 0.013). (g) Mean maximum peak-to-peak amplitude of spindles in females versus males (t-test p-value = 3.03e−6). (h) Correlation between maximum peak-to-peak amplitude and subject age (R2 = 0.059. p-value = 0.016). (i) Spindle oscillation frequency between subjects (ANOVA p-value = 9.93e−70), ordered by descending mean frequency.
Figure 3
Figure 3
Consensus and performance of the non-expert group for spindle detection. (a) Histogram of the number of epochs viewed by each of 114 non-expert scorers. Each bin represents one non-expert, and they are arranged in descending order. (b) Histogram of the number of times epochs are viewed by a specific number of non-experts. (c) By-event precision-recall plot of non-expert performance. Each circle is one non-expert; non-experts that viewed the most data are the darkest circles. Non-expert group consensus is plotted as a green line; performance at each consensus threshold (0–0.9) is indicated with a green circle. Performance of the group consensus is remarkably good, despite individuals with very low performance (bottom left). (d) F1-score performance of the non-expert group consensus at different consensus thresholds (Tngc) in the by-event analysis. Optimal performance was Tngc = 0.4. (e) Number of spindles found at each Tngc threshold bin. Vertical orange line indicates optimal performance of Tngc = 0.4. (f) Cumulative number of spindles over the Tngc range. Horizontal orange line indicates that the non-expert group identified 1669 spindles at Tngc= 0.4. (g) By-subject correlation between spindle density in the gold standard and spindle density of the non-expert group consensus (Tngc= 0.4). Each datapoint is one sleeping subject; darker circles indicate multiple subjects at the same position in the plot.
Figure 4
Figure 4
Automated spindle detector performance. (a) Precision-recall plot of 6 automated detectors (indicated by ‘a1’–‘a6’ text) and the automated group consensus curve (black line, labeled 0.1–0.9) at different levels of consensus. (b) F1-score of the automated group consensus at different levels of consensus. Maximum performance was at Tagc = 0.5. Spindle density correlation between the gold standard density (Tegc = 0.25) and auto detector a5 density estimate (c), relative sigma power of the same segments of N2 sleep (d), or the auto group consensus density estimate (Tagc = 0.5, e). Each datapoint is one subject, darker points indicate multiple subjects at the same position in the plot.
Figure 5
Figure 5
Performance of experts, non-experts and automated spindle detection algorithms. (a) Precision-recall plot of experts (red boxes, after correction with the leave-one-out analysis), non-expert group (Tngc = 0.0–0.9, green circles, also see Fig. 3) and automated methods (a1-a6) in the by-event analysis. Highest performance is closest to the top-right corner of the plot. (b) By-subject density estimates for each of the automated methods (a1-a6) and the non-expert group (ng) against the gold standard (gs). Each dot is one subject. Dotted line is the mean density in the gold standard. The mean and standard deviation of each detector is indicated by orange horizontal and vertical lines respectively. (c) By-subject spindle duration estimates; dotted line is the mean spindle duration in the gold standard. (d) The effect of varying the required amount of overlap (Toverlap) between event and detection in order to be determined a true positive on the performance of the automated detector (a1-a6), non-expert group (ng, n=114, at ngc = 0.4) or the mean individual expert (e, n=24, red shading is standard deviation). Vertical orange line indicates the Toverlap threshold used by default in this study (0.2).

Similar articles

Cited by

References

    1. Iber C, Ancoli-Israel S, Chesson A, Quan SF. AASM Manual for the Scoring of Sleep and Associated Events. 2007.
    1. Silverstein LD, Levy CM. The stability of the sigma sleep spindle. Electroencephalogr Clin Neurophysiol. 1976;40:666–670. - PubMed
    1. Tan X, Campbell IG, Feinberg I. Internight reliability and benchmark values for computer analyses of non-rapid eye movement (NREM) and REM EEG in normal young adult and elderly subjects. Clin Neurophysiol. 2001;112:1540–1552. - PubMed
    1. Werth E, Achermann P, Dijk DJ, Borbély AA. Spindle frequency activity in the sleep EEG: individual differences and topographic distribution. Electroencephalogr Clin Neurophysiol. 1997;103:535–542. - PubMed
    1. De Gennaro L, Ferrara M, Vecchio F, Curcio G, Bertini M. An electroencephalographic fingerprint of human sleep. Neuroimage. 2005;26:114–122. - PubMed

Publication types

LinkOut - more resources