How EEG preprocessing shapes decoding performance

Working title: Multiverse 4 Decoding (m4d)

Kessler et al., 2024, How EEG preprocessing shapes decoding performance. Arxiv. doi.org/10.48550/arXiv.2410.14453

Read preprint here
Feel free to send me feedback: via email
An interactive dashboard to explore the impact of changing single preprocessing steps on decoding performance can be found on streamlit.

Abstract:

EEG preprocessing varies widely between studies, but its impact on classification performance remains poorly understood. To address this gap, we analyzed seven experiments with 40 participants drawn from the public ERP CORE dataset. We systematically varied key preprocessing steps, such as filtering, referencing, baseline interval, detrending, and multiple artifact correction steps. Then we performed trial-wise binary classification (i.e., decoding) using neural networks (EEGNet), or time-resolved logistic regressions. Our findings demonstrate that preprocessing choices influenced decoding performance considerably. All artifact correction steps reduced decoding performance across experiments and models, while higher high-pass filter cutoffs consistently increased decoding performance. For EEGNet, baseline correction further increased decoding performance, and for time-resolved classifiers, linear detrending, and lower low-pass filter cutoffs increased decoding performance. The influence of other preprocessing choices was specific for each experiment or event-related potential component. The current results underline the importance of carefully selecting preprocessing steps for EEG-based decoding. While uncorrected artifacts may increase decoding performance, this comes at the expense of interpretability and model validity, as the model may exploit structured noise rather than the neural signal.

Structure of this repository

Subfolders will contain READMEs which are more specific.

Note: The multiverse-preprocessed epoch data comprises >15 TB of storage.

If you are interested in the TBs of epochs data, send me an email and we figure out a way of sharing.

Single large files can be assed via Zenodo, such as the summary csvs for analysis and modeling (single accuracy and T-sum values per participant, experiment, forking path).

If you reuse the scripts or pipeline, please adapt all the paths in the scripts! Paths are sometimes absolute in the scripts because data was shared across file servers for computing requirements.

General project structure adapted from cookiecutter:


├── README.md          <- The top-level README.
│
├── dashboard          <- dashboard submodule pointing to a different repository used for the streamlit app
│
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── env                <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `conda export >> env.json`
│
├── julia              <- Julia scripts.
│
├── manuscript         <- Manuscript submodule pointing to a different repository synced with Overleaf
│
├── models             <- Trained models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks and similar with one-off analyses
│
├── plots              <- Plots (other plots are directly plotted into the manuscript folder)
│
├── poster             <- Conference posters
│
├── presentation       <- Project presentations
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│
├── src                <- Python source code for use in this project.
│
├── targets            <- R (targets) Pipeline.

Environments / Packages

The conda environment is saved in the folder env. All python/bash/slurm scripts can be found in src.

The R environment, used in a targets pipeline and all related processing scripts can be found in targets, and a list of packages in env.

The Julia environment for LMM fitting is found in env, Julia scripts in julia.

The system architecture and hardware details of the HPC used for all Python and Bash scripts with SLURM job scheduling system can be found in MPCDF RAVEN user guide.

The system architecture and hardware details of the Macbook Pro (2020, M1) used to process the targets pipeline in R and Julia can be found here. A 16 GB RAM version was used.

Run analyses

The following is done on an HPC cluster with SLURM job scheduling system and the conda environment set-up.

Multiverse preprocessing and machine learning model fitting

Download the ERP CORE data for all participants and experiments. ⏳ several minutes to hours, depending on bandwidth

python3 src/0-download.py

Prepare the data ⏳ <1h

rearrange trigger values
rename annotations
get times
resample to 256 Hz
calculate artificial EOG channels
set montage

python3 src/1-pre-multiverse.py

Run multiverse preprocessing: For each experiment and participant, preprocess the raw data using >2500 different preprocessing pipelines. ⏳ 24h per participant and experiment

bash src/2-multiverse.sh

Calculate evoked responses, visualize particularly for an example forking path. ⏳ <1h

python3 src/3-evoked.py

Run decoding for each forking path, participant, and experiment:

EEGNet decoding ⏳ 24h per participant and experiment
Time-resolved decoding ⏳ <1h per participant and experiment

bash src/4a-eegnet.sh
bash src/4b-sliding.sh

Aggregate EEGNet results for analysis in R/targets. ⏳ <1h

python src/5a-aggregate_results.py

Aggregate time-resolved results on group-level for analysis in R/targets, and visualize for example forking path. ⏳ <1h

python src/5b-sliding_group.py

Fitting Linear Mixed Models in Julia

All the following steps were performed on a Macbook Pro (2020, M1).

From a terminal with Julia installed based on the environment. ⏳ <24h

julia julia/pretarget_model_fitting_en.jl
julia julia/pretarget_model_fitting_tr.jl

The model fitting in Julia is an infinite times faster than in R, especially for large models and data sets. The bottleneck however is the conversion from a Julia LMM object to an R LMM object, which takes a few hours per model (due to reasons that escape me).

The present steps were performed before the targets pipeline to prevent computationally intensive steps from running after pipeline invalidation. Other, less intensive steps shown in the manuscript appendix - run in Julia - are performed from within the targets pipeline.

Modeling the impact of preprocessing on decoding performance

The following is performed within an R targets pipeline, with access to Julia language. From within RStudio, source targets/renv/activate.R and targets/_targets.R. _targets.R contains the entire pipeline.

The pipeline (and the status of each node) can be visualized using

tar_visnetwork()

The complete pipeline is run using ⏳ <2h

tar_make()

The resulting plots are directly plotted into the manuscript folder (git submodule).

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

How EEG preprocessing shapes decoding performance

Structure of this repository

Environments / Packages

Run analyses

Multiverse preprocessing and machine learning model fitting

Fitting Linear Mixed Models in Julia

Modeling the impact of preprocessing on decoding performance

License

About

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 196 Commits
dashboard @ 133fde0		dashboard @ 133fde0
data		data
env		env
julia		julia
manuscript		manuscript
models		models
notebooks		notebooks
plots		plots
src		src
targets		targets
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

License

kesslerr/m4d

Folders and files

Latest commit

History

Repository files navigation

How EEG preprocessing shapes decoding performance

Structure of this repository

Environments / Packages

Run analyses

Multiverse preprocessing and machine learning model fitting

Fitting Linear Mixed Models in Julia

Modeling the impact of preprocessing on decoding performance

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages