Rules File Schema¶
The rules file setups the way the conversion is done from a general point of view. It is intended to be configurable by a power-user, mainly at an institutional or organization level (or whatever level data seems to be collected uniformly).
To make the rules file by hand we need to understand the schema of the file. For starts, the file is written in yaml. As of now the purpose of this documentation is not to teach yaml (we may have a dedicated file for that in the future). For now, you can check this guide though.
The Typical Rules File¶
A typical rules file looks like this:
entities:
task : resting
session : ses1
dataset_description :
Name : MyDataset
Authors :
- Alice
- Bob
sidecar :
EEGReference : 50
PowerLineFrequency : FCz
channels :
name :
heo : HEO
veo : VEO
type :
HEO : HEOG
VEO : VEOG
non-bids:
eeg_extension : .vhdr
path_analysis:
pattern : _data/%dataset_description.Name%/ses-%entities.session%/%entities.task%/sub-%entities.subject%.vhdr
code_execution:
- print(raw.info)
- To understand the rules file you first need to know a bit of the bids specification for eeg . Mainly that it contains 6 major files:
dataset_description
sidecar
channels
electrodes
coordinate system
events
- The currently supported files are:
For each of these supported files, the yaml will have an object with its corresponding “child” properties. That is, we will have dataset_description, sidecar and channels objects.
Besides these objects, the yaml will have an entities object. Although it seems a bit obscure, entities is just the name the bids specification gives to the properties that affect file name structure; you can find more information of these entities here .
At last, we have the non-bids object which setups additional configuration that do not clearly belong to one of the previous objects.
So we will have something like this up to this point.
entities:
something
dataset_description:
something
channels:
something
sidecar:
something
non-bids:
something
We will now delve into each of these objects.
entities¶
The general schema of the entities object is:
entities:
subject : something
task : something
session : something
acquisition: something
run : something
In essence the entities object holds information that triangulates the file in the study typically subject, session and task. Assume the subject is “001”, the session is “ses1” and the task is “resting”. The configuration will be then:
entities:
subject : 001
task : resting
session : ses1
Tip
In general, you only configure what you have, it is not necessary to set up empty fields.
Warning
Notice at least a white-space separating key and value is important to avoid errors. This applies as a general principle for the rules file. That is, task:resting
may give errors, so it is recommended to use task : resting
.
Apart from subject, session and task, we can have acquisition and run; refer to the entities documentation to know more about these two; they are setup in the same way as the previous ones.
Now, remember this file is supposed to configure general information, so one wouldn’t tipically setup the subject in such a static way. In the path_analysis section of the non-bids object it will be explained how to infer varying properties like the subject directly from the path of the file.
Warning
In general, you should use each of these Rules File objects to configure properties that apply to the dataset as a whole. The only part (for now) of the Rules File that allows customization in a per-file basis is the path_analysis functionality of the non-bids object.
Note that the only properties that are obligatory for eeg are the subject and the task. Since the subject is usually inferred from the path (ie the filename), the entities object doesn’t need that property.
Warning
The Rules File must have in someway or another the obligatory properties of the specification.
So in the end you may end up with something like this:
entities:
task : resting
session : ses1
dataset_description¶
The dataset description describes the dataset; you must fill it using the properties and value formats of the given by the specification
As of now, we are only supporting the following info in the dataset_description object:
dataset_description:
Name : something
Authors : something
Suppose the dataset is named “MyDataset”, and that it has two authors: “Alice” and “Bob”. Then this part of the yaml will look like the following:
dataset_description :
Name : MyDataset
Authors :
- Alice
- Bob
Note
Notice that the “BIDSVersion” (which is a REQUIRED field) is automatically set up by mne-bids, so we dont put it here in the rules.
Warning
Some properties (like “Authors”) are arrays of strings rather than just strings. For these fields we use the yaml list notation (-
).
sidecar¶
The sidecar file that accompanies eegs in the bids specification describes some technical properties of it. You can find more information here.
The currently supported schema of the sidecar object is:
sidecar :
EEGReference : something
PowerLineFrequency : something
Note
If you read the bids specification you may notice that the only field left that is required is the “TaskName” one. Since this is taken care by the “entities.task” object , it is not included here. The difference between the task label (Task) and TaskName is that the task label is obtained from the TaskName by removing non-alphanumeric characters. This idea is not currently implemented on sovabids, rather, the Task and TaskName are written as if they were the same.
The sidecar object ends up looking something like the following:
sidecar :
EEGReference : 50
PowerLineFrequency : FCz
channels¶
Channel information is mostly inferred by MNE upon reading the eeg file so usually you wouldn’t need to set a rule for this. In this sense, the rules file setups whatever corrections need to be done on top of the assumptions MNE makes upon reading.
Tip
One strategy is to do the conversion without any corrections and see what was wrong, then changing the rules file accordingly.
The channels object currently supports the following functionality:
Renaming channels
Setting the type of channels
The schema of the channels object is :
channels :
name :
Name : NewName
type :
Name : NewType
Note
The types in NewType must correspond to the types mentioned in the specification . That is: EEG EOG ECG EMG EYEGAZE GSR HEOG MISC PPG PUPIL REF RESP SYSCLOCK TEMP TRIG VEOG .
Renaming Example¶
Here we rename channel “FCZ” to “FCz”, and “FPZ” to “Fpz”:
channels :
name :
FCZ : FCz
FPZ : Fpz
Note
Notice here we are not using the list notation (-
) of yaml; this is because of a technical reason. Internally these operations are encoded as python dictionaries rather than lists.
Type Example¶
Here we give the channel “heo” the type “HEOG” and the channel “veo” the type “VEOG”.
channels :
type :
heo : HEOG
veo : VEOG
Renaming and Typing simultaniously:
It is possible that we need to change the name and the type of a channel. In that case only to refer to such channel as the OldName in the name part of the channels object. In the other parts use the NewName.
To illustrate the procedure, suppose we want to rename “veo” to “VEO” , “heo” to “HEO” and also set their types to “VEOG” and “HEOG” respecitively. You would do then:
channels :
name :
heo : HEO
veo : VEO
type :
HEO : HEOG
VEO : VEOG
non-bids¶
This is the most complex object. As of now what is supported is:
non-bids :
eeg_extension : something
path_analysis :
something
code_execution :
something
eeg_extension¶
This property just defines the extension of the eeg files we want to read. If this property is non-existent then the eeg files will be any from the following extensions: [‘.set’ ,’.cnt’ ,’.vhdr’ ,’.bdf’ ,’.fif’].
Note
Notice it is preferable that you put the dot before the extension; the code should add it if you dont though.
Supposing you are using brainvision files, then you would configure it as:
non-bids :
eeg_extension : '.vhdr'
path_analysis¶
Is used to infer information from the path. Any of the fields from the previous objects are supported as long they consist of a single simple value (anything that is a single number or string). The pattern is applied to every file that has the eeg_extension mentioned before.
There are two ways to do the path_analysis, by a regex pattern or by a placeholder pattern.
regex pattern¶
For this you will need to know regex. Mainly you need to set a capture group for each property you want to infer from the path. You will also need to set the properties you want to infer through a list.
The schema we want to arrive at is :
non-bids:
path_analysis:
pattern : regex-pattern
fields :
- field1
- field2
This will be explained better with an example, suppose you want to extract information from the following file path:
Y:\code\sovabids\_data\lemon\ses-001\resting\sub-010002.vhdr
Intuitively you identify the following pattern:
some useless path\ dataset name \ ses-session \ task \ sub- subject.vhdr
You associate each of the properties you want to extract to the properties of the Rules File. That is:
dataset name is dataset_description.Name
session is entities.session
task is entities.task
subject is entities.subject
These have to be written in a property called fields and in the order as they appear in the regex pattern (from left to right).
Note
Notice sovabids uses dot notation to nest properties. That is `field1.field2
means that we get inside field1
and then inside field2
.
Now you make your regex using capture groups and the forward-slash as the path separator. In this case it would suffice to use:
_data\/(.+)\/ses-(.+)\/(.+)\/sub-(.+).vhdr
Tip
The capture group (.+) is recommended. The dot will match any character (except line terminators) and the plus sign will match it 1 to unlimited times. Basically it will try to match the longest strings it can given the pattern you gave.
Warning
Use the forward-slash as the path separator (/
) in your pattern regardless of the symbol your OS uses. This is to avoid problems when reading strings. This applies both to regex patterns and placeholder patterns.
Warning
We need to escape the forward slash in the regex pattern so /
becomes \/
.
So your path_analysis object is wrote in the Rules File as:
non-bids:
path_analysis:
pattern : _data\/(.+)\/ses-(.+)\/(.+)\/sub-(.+).vhdr
fields :
- dataset_description.Name
- entities.session
- entities.task
- entities.subject
placeholder pattern¶
The placeholder pattern is intended for less technical users as it is more intuitive (but less powerful) than regex. Internally it is regex nevertheless. Mainly we just need to pass a pattern that mimics the structure of the files by using placeholders to indicate what fields are there.
The schema we want to arrive at is :
non-bids:
path_analysis:
pattern : placeholder-pattern
The placeholder functionality is inspired by the mp3tag software feature of “format strings”:
Using the same example as before, suppose you want to extract information from the following file path:
Y:\code\sovabids\_data\lemon\ses-001\resting\sub-010002.vhdr
Again, you identify the following pattern:
some useless path\ dataset name \ ses-session \ task \ sub- subject.vhdr
You associate each of the properties you want to extract to the properties of the Rules File as done before. That is:
dataset name is dataset_description.Name
session is entities.session
task is entities.task
subject is entities.subject
Now you just need to set the pattern, remember we need to use forward-slash notation:
_data/%dataset_description.Name%/ses-%entities.session%/%entities.task%/sub-%entities.subject%.vhdr
Note
- Notice that the differences are:
that we enclose the desired fields between percentages.
that the fields are already in the pattern string
that there is no need to escape the forward-slash (
`/
) character
Tip
You can use %ignore% if that part of the pattern varies but you don’t care about its value.
The placeholder has two advanced configurations which define how the pattern is translated to a regex pattern:
non-bids:
path_analysis:
pattern : placeholder-pattern
matcher : something
encloser : something
The matcher is the regex string that replaces the fields in the placeholder pattern. By default is (.+)
The encloser is the character that encloses the fields. By default is %
.
So if you don’t set up these configurations, it is equivalent to having:
non-bids:
path_analysis:
pattern : placeholder-pattern
matcher : (.+)
encloser : "%"
What we need to write in the Rules File is then:
non-bids:
path_analysis:
pattern : _data/%dataset_description.Name%/ses-%entities.session%/%entities.task%/sub-%entities.subject%.vhdr
Tip
You don’t need to put the whole path structure, just from where it interests you. In the examples here we are only interested from the _data
folder.
Warning
It is advisable to include the folder just before the one that is of interest to you. This is so that the sofware is able to discriminate what is of interest in the first field. In this example we started from _data
although in reality we are interested is in the next folder (lemon
). If we do %dataset_description.Name%/ses-%entities.session%/%entities.task%/sub-%entities.subject%.vhdr
(this is the same pattern but without the _data
folder), the software will have trouble distinguishing what is of interest at the start of the string. This warning applies both to regex and placeholder patterns.
code_execution¶
Is used to hold a list of commands you want to run for additional flexibility but at the cost of knowing a bit about the backend of this package. Mainly that mne eeg object is called “raw”. This means you can manipulate the eeg object if you know how to use mne.
The schema of this property is:
non-bids:
code_execution:
- command1
- command2
A useless but simple way to illustrate this is just to print the information of the eeg object when the code is executed. For this you just need to put the following:
non-bids:
code_execution:
- print(raw.info)
Note
Notice it is a list of commands, so each command has a (-
); even if we execute only one command.
The complete non-bids object¶
Finally, we will have this non-bids object (if using the placeholder pattern option):
non-bids:
eeg_extension : .vhdr
path_analysis:
pattern : _data/%dataset_description.Name%/ses-%entities.session%/%entities.task%/sub-%entities.subject%.vhdr
code_execution:
- print(raw.info)