Skip to content

EEG Preprocessing

MariusKlug edited this page May 16, 2022 · 2 revisions

Preprocessing of EEG data in the BeMoBIL pipeline can be done using the bemobil_process_all_preprocessing function. This is a wrapper that incorporates all necessary processing steps from the basic EEG set (e.g. all blocks merged together, and non-experiment parts removed) up to the preprocessed dataset which has line noise removed, channels interpolated, average reference, and relevant information stored in the EEG.etc struct. It stores intermediate files on disk in the location provided in bemobil_config.EEG_preprocessing_data_folder and plots several analytics plots which are saved alongside their respective files.

Basics

As a first step, some basics are being prepared in the bemobil_process_EEG_basics function:

  1. The EEG structure is filled with ur-data, mainly to make sure events are safe.
  2. Unused electrodes are removed. This can be set in the config file using the bemobil_config.channels_to_remove entry.
  3. The data is resampled (if it is not already at the correct sampling rate) to the frequency declared in bemobil_config.resample_freq.
  4. Noise is removed with ZapLine-plus. If bemobil_config.zaplineConfig.noisefreqs is declared empty ([]), the function will use the full automatic adaptation. This is the default and recommended. However, there are plenty of parameters that can be adjusted if the cleaning is not working as intended. See 'help clean_data_with_zapline_plus' for more info about parameter tweaking. This step can be avoided by setting the whole 'bemobil_config.zaplineConfig' field to [].
  5. Channel names can be changed in case they were named incorrectly or contain an unnecessary prefix. Use the bemobil_config.rename_channels setting for this. Here, a cell array ({}) of text should be entered. If this cell array has only one entry, it is assumed this entry is a prefix that should be removed (e.g. {'BrainVision RDA_'}), if it is a 2D matrix of size EEG.nbchans,2 the channels of the first column are renamed to their respective entry in the second column.
  6. A reference channel can be added with zeros when declared in bemobil_config.ref_channel. This allows feeding back the data of the reference channel when then data is re-referenced to the average in a later step.
  7. Channel locations are imported. If the bemobil_config.channel_locations_filename entry is empty, it is assumed that either channel locations have been added during import (using our xdf2bids and bids2set functions) or that channels are in the standard 10-20 system and standard locations will be looked up. If a filename is provided, the file is being loaded and channels will receive their respective locations. In that case, if a reference channel was declared before, the file must contain the location of the reference with the name specified above.
  8. If a reference was provided earlier, all channels get this information entered.
  9. The channel types are declared to be either EEG (default), EOG (as provided in bemobil_config.eog_channels, will be ignored in bad channel detection and re-referencing), REF (if entered above).

The data set will now be saved with the name provided in bemobil_config.basic_prepared_filename.

Bad channel detection

After the basics are done, bad channels are detected in the bemobil_detect_bad_channels function. The data is first re-referenced to the average using the bemobil_avref function in order to have an approximation of the final data. This average reference at this step will not be used later on, the function is only used to detect the bad channels.

It then repeatedly uses the clean_artifacts function of the clean_raw_data EEGLAB plugin, as specified by bemobil_config.chan_detect_num_iter. The original clean_artifacts function uses a random sample consensus (RANSAC) algorithm and stores the random sampling in a hidden microcache. When restarting the function it will thus appear as if the output is consistent, but after a restart of MATLAB, or clearing the microcache, the detected channels might differ. To allow consistent and reproducible detection, the process is repeated several times (recommended >10) and the cache is cleared after each run. Only channels that were flagged as "bad" more than a given proportion (specified in bemobil_config.chan_detected_fraction_threshold) are then removed.

Within the clean_artifacts function, the data is split into short windows and robust interpolations of each channel are computed. Four parameters can be adjusted:

  • bemobil_config.chancorr_crit is the main parameter. This is a correlation threshold. If a channel is correlated at less than this value to its own robust estimate (based on other channels), it is considered abnormal in the given time window. Recommended are values of 0.75 (rather lax) to 0.85 (rather strict).
  • bemobil_config.chan_max_broken_time sets the maximum proportion of time windows a given channel may be flagged as bad before it is detected as bad in the final output. Recommended are values from 0.2 (20% of the time max, strict) to 0.5 (50% of the time, lax)
  • bemobil_config.flatline_crit uses a criterion of detecting channels that are flat. This is recommended to be set to 'off' since a) flat channels will not correlate with their interpolation anyways, and b) sometimes, especially in MoBI, data may be lost, but this does not mean the complete channel should be discarded.
  • bemobil_config.line_noise_crit rejects channels that have increased noise. However, line noise should be removed by Zapline-plus, and this criterion may falsely reject also channels that are closer to muscles (like a neck band), hence it is recommended to be kept 'off', too.
  • bemobil_config.num_chan_rej_max_target determines the fraction of channels that can be maximally removed (e.g. 1/5). This is to ensure that even in the case of very noisy data or incorrect bad channel detection, the processing does not remove too many channels to reconstruct them.

After the channels were detected and the final bad channels are determined, channels that were declared as 'EOG' before are discarded from this list. This is done because EOG channels are often not highly correlated with their neighboring channels even if they contain perfectly fine data. The drawback of this approach is that EOG channels cannot be rejected and interpolated, even if they are in fact bad, but as they are ignored in the re-referencing step this should not be too problematic.

Interpolation and re-referencing

Subsequently, the bad channels are interpolated using spherical interpolation in EEGLAB. When this is done, the rank of the data matrix is reduced by the number of interpolated channels, which is important for AMICA later on, so this information is stored in EEG.etc.

As a final step, the data is re-reference to the average. Channels that were declared as EOG channels are ignored here since they might skew the data. Then, there are two options:

  1. The reference channel was declared previously, which means it was added with zero-entries. In this case it will now be filled and be available for analysis.
  2. No reference was declared. In this case we follow the approach of the fullrankaveref EEGLAB plugin: a new dummy channel with zeros is added, the data is re-referenced, then the dummy channel is deleted again.

In both options, the data rank stays the same.

The final preprocessed data set is then saved with the filename provided in bemobil_config.preprocessed_filename.