Shortcuts

Datasets

Datasets are used in Eisen to bring data into the training/validation/testing or serving pipeline. They are core functionality to Eisen together with transforms, I/O operations, models and other constructs.

Eisen Datasets are very similar to those commonly used in pytorch. In this sense they implement a __init__, __len__ and __getitem__ methods.

Users need only to instantiate these modules using the appropriate set of parameters and the rest will be handled by Eisen. An example on how to get started on Datasets can be found in the Eisen colab example and is summarized here.

from eisen.datasets import MSDData
from torch.utils.data import DataLoaderset

# ... define transform chain ...

dataset = MSDDataset(
    PATH_DATA,
    NAME_MSD_JSON,
    'training',
    transform=transform
)

# create data loader, this functionality is pure pytorch
data_loader = DataLoader(
    dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=4
)

Upon instantiation it is necessary to create and define the transforms that will manipulate the dataset. Futher documentation about the transforms currently implemented in Eisen, as well as general directions on how to implement new ones are included below.

It is important to know that data in Eisen is always represented as a list of dictionaries. Each entry of the list is a dictionary containing data belonging to the same datapoint instance.

my_example_dataset = [
    {"image": "/path/to/image1.jpg", "label": "/path/to/label1.jpg"},
    {"image": "/path/to/image2.jpg", "label": "/path/to/label2.jpg"},
    {"image": "/path/to/image3.jpg", "label": "/path/to/label3.jpg"},
]

The example above conveys the general form that a dataset assumes inside Eisen. This form has to be taken into account when implementing your own Datasets. Once the data is organized in this way it can be processed by Transforms. The transforms are fed individual entries of the list and act on one or multiple fields of the resulting dictionary.

class eisen.datasets.JsonDataset(data_dir, json_file, transform=None)[source]

This object implements the capability of reading arbitrary data contained in properly structured JSON file into Eisen. The expected JSON file structure is a list of dictionaries. Each entry of the list contains one element of the dataset. Each key of the dictionary stores different information about that data point.

Example of JSON structure:

[
    {'image': 'image_file1.png', 'label': 'label_file1.png'},
    {'image': 'image_file2.png', 'label': 'label_file2.png'}
]

Note

This dataset will generate data entries with fields corresponding to what is stored in each entry of the json dataset list.

from eisen.datasets import JsonDataset
dset = JsonDataset('/abs/path/to/data', '/abs/path/to/file.json', transform)
__init__(data_dir, json_file, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located

  • json_file (str) – the name of the json file containing the data

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import JsonDataset
dset = JsonDataset(
    data_dir='/abs/path/to/data',
    json_file='/abs/path/to/file.json',
    transform=transform
)
class eisen.datasets.MSDDataset(data_dir, json_file, phase, transform=None)[source]

This object allows Medical Segmentation Decathlon data to be easily impoted in Eisen. More information about the data can be found here http://medicaldecathlon.com

Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type MSDDataset which will make use of the directory structure and the descriptive json file included in it and make the data available to Eisen.

Note

This dataset will return data items with fields: ‘image’ and, optionally, ‘label’.

from eisen.datasets import MSDDataset

dataset = MSDDataset(
    '/abs/path/to/data',
    '/path/to/dataset.json',
    'training',
    transform,
)
__init__(data_dir, json_file, phase, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located (dataset location after unzipping)

  • json_file (str) – the name of the json file containing for the MSD dataset

  • phase (string) – training or test phase as per MSD dataset convention (look at MSD json file)

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import MSDDataset

dataset = MSDDataset(
    data_dir='/abs/path/to/data',
    json_file='/path/to/dataset.json',
    phase='training',
    transform=transform,
)
class eisen.datasets.PatchCamelyon(data_dir, x_h5_file, y_h5_file, mask_h5_file=None, transform=None)[source]

This object implements the capability of reading PatchCamelyon data. Further information about this dataset can be found on the official website https://patchcamelyon.grand-challenge.org/Introduction/

Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type PatchCamelyon which will make use of the data in the directory as well as the h5 files that are part of the dataset and make it available to Eisen.

Note

This dataset will generate data entries with keys: ‘image’, ‘label’ and optionally ‘mask’. The generated image and label are tensors.

from eisen.datasets import PatchCamelyon

dset = PatchCamelyon(
    '/data/root/path',
    'camelyon_patch_level_2_split_train_x.h5',
    'camelyon_patch_level_2_split_train_y.h5',
    'camelyon_patch_level_2_split_train_mask.h5'
)
__init__(data_dir, x_h5_file, y_h5_file, mask_h5_file=None, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located

  • x_h5_file (str) – the relative path of the H5 file containing x (the images)

  • y_h5_file (str) – the relative path of the H5 file containing y (the labels)

  • mask_h5_file (str) – the relative path of the H5 file containing masks

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import PatchCamelyon

dset = PatchCamelyon(
    data_dir='/data/root/path',
    x_h5_file='camelyon_patch_level_2_split_train_x.h5',
    y_h5_file='camelyon_patch_level_2_split_train_y.h5',
    mask_h5_file='camelyon_patch_level_2_split_train_mask.h5',
    transform=transform
)
class eisen.datasets.CAMUS(data_dir, with_ground_truth, with_2CH=True, with_4CH=True, with_entire_sequences=False, transform=None)[source]

This object implements the capability of reading CAMUS data. The CAMUS dataset is a dataset of ultrasound images of the heart. Further information about this dataset can be found on the official website https://www.creatis.insa-lyon.fr/Challenge/camus/index.html

Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type CAMUS which will make use of the data in the directory and make it available to Eisen.

Note

This dataset will generate data entries with keys: ‘type’, ‘image_2CH’, ‘label_2CH’, ‘sequence_2CH’, ‘image_4CH’, ‘label_4CH’, sequence_4CH depending on the selected input parameter configuration. The data generated consists of paths to images and type (string).

from eisen.datasets import CAMUS

dset = CAMUS('/data/root/path')
__init__(data_dir, with_ground_truth, with_2CH=True, with_4CH=True, with_entire_sequences=False, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located

  • with_ground_truth (bool) – whether ground truth annotation should be included (won’t work during testing)

  • with_2CH (bool) – whether 2 chambers data should be included (default True)

  • with_4CH (bool) – whether 4 chambers data should be included (default True)

  • with_entire_sequences (bool) – whether the entire sequences for 4CH and 2CH data should be included (default False)

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import CAMUS

dset = CAMUS(
    data_dir='/data/root/path',
    with_ground_truth=True,
    with_2CH=True,
    with_4CH=True,
    with_entire_sequences=False
    transform=None
)
class eisen.datasets.RSNAIntracranialHemorrhageDetection(data_dir, training, transform=None)[source]

This object implements the capability of reading the Kaggle RSNA Intracranial Hemorrhage Detection dataset. Further information about this dataset can be found on the official website https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/overview

Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type RSNAIntracranialHemorrhageDetection which will parse said directory and make the data available to Eisen.

Note

This dataset will return data points in form of a dictionary having keys: ‘image’ and during training ‘label’ as well.

from eisen.datasets import RSNAIntracranialHemorrhageDetection

dset = RSNAIntracranialHemorrhageDetection('/data/root/path', True)
__init__(data_dir, training, transform=None)[source]
Parameters
  • data_dir (str) – The dataset root path directory where the challenge dataset is stored

  • training (bool) – Boolean indicating whether training or test data should be loaded

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import RSNAIntracranialHemorrhageDetection

dset = RSNAIntracranialHemorrhageDetection(
    data_dir='/data/root/path',
    training=True
)
class eisen.datasets.RSNABoneAgeChallenge(data_dir, training, transform=None)[source]

This object implements the capability of reading the Kaggle RSNA Bone Age Estimation challenge dataset. Further information about this dataset can be found on the official website https://www.kaggle.com/kmader/rsna-bone-age

Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type RSNABoneAgeChallenge which will parse said directory and make the data available to Eisen.

Note

This dataset will return data points as dictionaries having fields: ‘image’, ‘male’ (boolean) and during training ‘label’.

from eisen.datasets import RSNABoneAgeChallenge

dset = RSNABoneAgeChallenge('/data/root/path', True)

This dataset will return data points as dictionaries having fields: ‘image’, ‘male’ (boolean) and during training ‘label’.

__init__(data_dir, training, transform=None)[source]
Parameters
  • data_dir (str) – The dataset root path directory where the challenge dataset is stored

  • training (bool) – Boolean indicating whether training or test data should be loaded

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import RSNABoneAgeChallenge

dset = RSNABoneAgeChallenge(
    data_dir='/data/root/path',
    training=True
)
class eisen.datasets.MedSegCovid19(data_dir, image_file, mask_file=None, transform=None)[source]

This object allows the medical segmentation covid-19 dataset to be easily imported into Eisen. Find more information about this dataset here: http://medicalsegmentation.com/covid19/

In summary, this dataset is a collection of 100 slices of CT images that have been annotated and were made available to the community.

When instantiating this module it is necessary to point it to the nifti image file and, optionally, the segmentation. The first argument is the data base directory. The second and third argument should be strings representing the name of the nifti images of this dataset, and the fourth argument is a transform (or composition of transforms).

Each entry of this dataset after loading will be a dictionary with one (in case only images are loaded) or two (in case both images and labels are loaded) keys. Each key stores a numpy array containing the 2D data relative to one image.

Note

This dataset will generate data entries with keys: ‘image’ and (optionally) ‘label’.

from eisen.datasets import MedSegCovid19

dataset = MedSegCovid19(
    '/abs/path/to/data',
    'tr_im.nii',
    'tr_mask.nii',
    transform,
)
__init__(data_dir, image_file, mask_file=None, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located (results of download)

  • image_file (str) – the name of the nifti file containing the images

  • mask_file (string) – the name of the nifti file containing the masks (optional)

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import MedSegCovid19

dataset = MedSegCovid19(
    image_file='tr_im.nii',
    mask_file='tr_mask.nii',
    transform=transform,
)
class eisen.datasets.UCSDCovid19(data_dir, positive_dir, negative_dir, transform=None)[source]

This object allows the UCSD Covid-19 2D dataset to be easily imported into Eisen. Find more information about this dataset here: https://github.com/UCSD-AI4H/COVID-CT. This dataset is meant to be used for classification tasks. It also contains metadata which are currently NOT supported in Eisen.

When instantiating this module it is necessary to point it to two directory names: one containing cases of sick patients, and the other containing images from healthy people (not affected by Covid-19)

The first argument is the data base directory. The second and third argument should be strings representing the name of the two directories relative to the base directory and the fourth argument is a transform (or composition of transforms).

Each entry of this dataset after loading will be a dictionary with two keys. The ‘image’ key stores a path to a png file containing images, you can use LoadPILImageFromFilename IO module to read it, and the ‘label’ key contains an integer that is 0 for healthy scans and 1 for sick individuals.

Note

This dataset will return data entries in form of a dictionary having fields: ‘image’ and ‘label’

from eisen.datasets import UCSDCovid19

dataset = UCSDCovid19(
    '/abs/path/to/data',
    'positive',
    'negative',
    transform,
)
__init__(data_dir, positive_dir, negative_dir, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located (dataset location after unzipping)

  • positive_dir (str) – relative path of directory containing positive cases

  • negative_dir (string) – relative path of directory containing negative cases

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import MedSegCovid19

dataset = MedSegCovid19(
    image_file='tr_im.nii',
    mask_file='tr_mask.nii',
    transform=transform,
)
class eisen.datasets.PANDA(data_dir, csv_file, training, transform=None)[source]

This object implements the capability of reading PANDA dataset. Further information about this dataset can be found on the official website https://www.kaggle.com/c/prostate-cancer-grade-assessment/overview

Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type PANDA which will make use of the data in the directory as well as the csv file that are part of the dataset and make it available to Eisen.

Note

This dataset will return data points in form of a dictionary with fields: ‘image’, ‘provider’ and optionally (during training) ‘mask’, ‘isup’, ‘gleason’.

from eisen.datasets import PANDA

dset = PANDA(
    '/data/root/path',
    'train.csv',
    True
)
__init__(data_dir, csv_file, training, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located

  • csv_file (str) – the relative path of the csv file relative to current task

  • training (bool) – whether the dataset is a training dataset or not

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import PANDA

dset = PANDA(
    data_dir='/data/root/path',
    csv_file='train.csv',
    training=True,
    transform=transform
)
class eisen.datasets.KaggleCovid19(data_dir, csv_file, transform=None)[source]

This object implements the capability of reading Kaggle COVID 19 CT dataset. Further information about this dataset can be found on the official website https://www.kaggle.com/andrewmvd/covid19-ct-scans

Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type KaggleCovid19 which will make use of the data in the directory as well as the csv file that are part of the dataset and make it available to Eisen.

Note

This dataset will generate data entries with fields: ‘image’, ‘lung_mask’, ‘infection_mask’, ‘lung_infection_mask’. This data is returned in form of relative paths (to data_dir) of image and mask files.

from eisen.datasets import KaggleCovid19

dset = KaggleCovid19(
    '/data/root/path',
    'metadata.csv',
)
__init__(data_dir, csv_file, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located

  • csv_file (str) – the relative path of the csv file relative to current task

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import KaggleCovid19

dset = KaggleCovid19(
    data_dir='/data/root/path',
    csv_file='metadata.csv',
    transform=transform
)
class eisen.datasets.ABCsDataset(data_dir, training, flat_dir_structure=False, transform=None)[source]

This object allows Data from the ABC challenge (2020) data to be easily impoted in Eisen. More information about the data and challenge can be found here https://abcs.mgh.harvard.edu

Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type ABCDataset which will make use of the directory structure and the descriptive json file included in it and make the data available to Eisen.

For what concerns labels and data structure refer to: https://abcs.mgh.harvard.edu/index.php/data/download/s end/3-data-for-abcs/14-readme

Note

This dataset returns the following fields: ‘ct’, ‘t1’, ‘t2’ and ‘label_task1’, ‘label_task2’ when training. The content of these fields consists of paths relative to data_dir, to ct, MR and labels.

Get started code can be found here: https://gist.github.com/faustomilletari/af430acfecf0841d71508455cdadcbbf

from eisen.datasets import ABCsDataset

dataset = ABCsDataset(
    '/abs/path/to/data',
    True,
    False,
    transform,
)
__init__(data_dir, training, flat_dir_structure=False, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located (dataset location after unzipping)

  • training (bool) – whether data relative to the training phase should be loaded

  • flat_dir_structure (bool) – whether data is stored in a directory containing all images (without sub-dirs)

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import ABCsDataset

dataset = ABCsDataset(
    data_dir='/abs/path/to/data',
    training=True,
    flat_dir_structure=False,
    transform=transform,
)
class eisen.datasets.EMIDEC(data_dir, training, transform=None)[source]

This object allows Data from the EMIDEC challenge (2020) data to be easily impoted in Eisen. More information about the data and challenge can be found here http://emidec.com

Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type ABCDataset which will make use of the directory structure and the descriptive json file included in it and make the data available to Eisen.

For what concerns labels and data structure refer to the official website http://emidec.com

Note

This dataset returns the following fields: image, metadata and - during training - pathological and label.

from eisen.datasets import EMIDEC

dataset = EMIDEC(
    '/abs/path/to/data',
    True,
    False,
    transform,
)
__init__(data_dir, training, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located (dataset location after unzipping)

  • training (bool) – whether data relative to the training phase should be loaded

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import EMIDEC

dataset = EMIDEC(
    data_dir='/abs/path/to/data',
    training=True,
    transform=transform,
)
class eisen.datasets.Brats2020(data_dir, training, transform=None)[source]

BraTS 2020 challenge dataset. This multi modal brain tumor segmentation and survival prediction dataset contains multi-center and multi-stage MRI images of brain tumors. It contains images obtained via ‘t1’, ‘t1c’, ‘t2’ and ‘flair’ MRI acquisition sequences, and annotations relative to the GD-enhancing tumor (ET — label 4), the peritumoral edema (ED — label 2), and the necrotic and non-enhancing tumor core (NCR/NET — label 1).

Find more info here: https://www.med.upenn.edu/cbica/brats2020/data.html

Note

This dataset will generate data entries with keys: ‘t1’, ‘t1c’, ‘t2, ‘flair’ and ‘name_mapping’. If the training flag is set during initialization it will also provide ‘label’ and ‘survival_info’. The data in ‘name_mapping’ and ‘survival_info’ is also represented in form of dictionary and contains data obtained from the fields (columns) of name_mapping.csv and surivival_info.csv

from eisen.datasets import Brats2020

dset = Brats2020('/data/root/path', True, tform)
__init__(data_dir, training, transform=None)[source]
Parameters
  • data_dir (str) – the base directory where the data is located (after unzipping the archive)

  • training (bool) – whether the labels and survival information should be loaded for training

  • transform (callable) – a transform object (can be the result of a composition of transforms)

from eisen.datasets import Brats2020

dset = Brats2020(
    data_dir='/data/root/path',
    training=True,
    transform=tform
)

I/O

Eisen I/O functionality is contained in this module. I/O functionality is implemented by transforms. That is, this functionality behaves just like any other eisen.transform module. The only difference is that they operate on disk. Another reason of this distinction is that we decided to follow the package structure of torchvision, which has an torchvision.io sub-package.

An example of how I/O functionality can be used to load Nifti data is contained in the Eisen example Colab notebook and it is also shown in compact form here:

from eisen.datasets import MSDData
from eisen.io import LoadNiftyFromFilename

my_reader = LoadNiftyFromFilename(['image', 'label'], PATH_DATA)

dataset = MSDDataset(
    PATH_DATA,
    NAME_MSD_JSON,
    'training',
    transform=my_reader
)

Just like regular transforms, the I/O transforms can be called on data. The data needs to be a Python dictionary and the call can be done in this way:

from eisen.io import LoadNiftyFromFilename

# Assuming my_data_dictionary to be a dictionary containing data

my_reader = LoadNiftyFromFilename(['image', 'label'], PATH_DATA)

my_data_dictionary = my_reader(my_data_dictionary)
class eisen.io.LoadITKFromFilename(fields, data_dir)[source]

This transform loads ITK data from filenames contained in specific field of the data dictionary. Although this transform follows the general structure of other transforms, such as those contained in eisen.transforms, it’s kept separated from the others as it is responsible for I/O operations interacting with the disk.

from eisen.io import LoadITKFromFilename
tform = LoadITKFromFilename(['image', 'label'], '/abs/path/to/dataset')
__init__(fields, data_dir)[source]

LoadITKFromFilename loads ITK compatible files. The data is always read as float32.

Parameters
  • fields (list) – fields of the dictionary containing ITK file paths that need to be read

  • data_dir (str) – source data directory where data is located. This directory will be joined with data paths

from eisen.io import LoadITKFromFilename
tform = LoadITKFromFilename(
    fields=['image', 'label'],
    data_dir='/abs/path/to/dataset'
)
class eisen.io.LoadDICOMFromFilename(fields, data_dir, store_data_array=True)[source]

This transform loads DICOM data from filenames contained in a specific field of the data dictionary. Although this transform follows the general structure of other transforms, such as those contained in eisen.transforms, it’s kept separated from the others as it is responsible for I/O operations interacting with the disk

from eisen.io import LoadDICOMFromFilename
tform = LoadDICOMFromFilename(['image', 'label'], '/abs/path/to/dataset')
__init__(fields, data_dir, store_data_array=True)[source]
Parameters
  • fields (list) – list of names of the field of data dictionary to work on. These fields should contain data paths

  • data_dir (str) – source data directory where data is located. This directory will be joined with data paths

  • store_data_array (bool) – whether image data as numpy array should be stored (in “field” + “_pixel_array”)

from eisen.io import LoadDICOMFromFilename
tform = LoadDICOMFromFilename(
    fields=['image'],
    data_dir='/abs/path/to/dataset'
    store_data_array=True
)
class eisen.io.LoadPILImageFromFilename(fields, data_dir)[source]

This transform loads Images from filenames contained in a specific field of the data dictionary. The images are loaded via Pillow, an imaging library for Python. Although this transform follows the general structure of other transforms, such as those contained in eisen.transforms, it’s kept separated from the others as it is responsible for I/O operations interacting with the disk

from eisen.io import LoadPILImageFromFilename
tform = LoadPILImageFromFilename(['image', 'label'], '/abs/path/to/dataset')
__init__(fields, data_dir)[source]
Parameters
  • fields (list) – list of names of the field of data dictionary to work on. These fields should contain data paths

  • data_dir (str) – source data directory where data is located. This directory will be joined with data paths

from eisen.io import LoadPILImageFromFilename
tform = LoadPILImageFromFilename(
    fields=['image'],
    data_dir='/abs/path/to/dataset'
)
class eisen.io.WriteNiftiToFile(fields, name_fields=None, filename_prefix='./')[source]

This transform writes NIFTI data to a file on disk. Although this transform follows the general structure of other transforms, such as those contained in eisen.transforms, it’s kept separated from the others as it is responsible for I/O operations interacting with the disk.

from eisen.io import WriteNiftiToFile
tform = WriteNiftiToFile(['image', 'label'], '/abs/path/to/filename')
__init__(fields, name_fields=None, filename_prefix='./')[source]
Parameters
  • fields (list) – list of names of the field of data dictionary to work on. These fields should contain data paths

  • filename_prefix (str) – absolute path plus file prefix of output file

from eisen.io import WriteNiftiToFile
tform = WriteNiftiToFile(
    fields=['image', 'label'],
    name_fields=['image_name', 'label_name'],
    filename_prefix='/abs/path/to/dataset'
)

Transforms

Transforms are used to manipulate data in Eisen. Transforms are Python Objects that can be instantiated through the __init__ method and implement a __call__ method. Transforms can be stacked and composed together using torchvision.transforms.Compose.

The reason of the flexibility and composibility of Eisen Transforms - and PyTorch transforms in general - is that their __call__ method implements a standard interface. In the case of Eisen it is:

def __call__(self, data):
    # Do something
    return data

In this code snippet, data is a Python dictionary that contains the data to be processed by the transforms. Each key in this dictionary is a different data field. For example, in the case of an imaging dataset, keys could be [‘image’, ‘label’]. The transform will operate on one or more keys of the dictionary and return a new dictionary with data updated as a result of the transform.

A transform is therefore implemented as a Python object as demonstrated here:

class Transform:
    def __init__(self):
        # Do something
        pass

    def __call__(self, data):
        # Do something
        return data

A more concrete example can be seen below. In this example we instantiate a ResampleNiftiVolumes transform which operated on ‘images’ and ‘labels’ and imposes a resolution (in millimeters) for both the Nifti images contained in data[‘images’] and data[‘labels’] of [1.0, 1.0, 1.0] millimeters. The interpolation used here is linear.

from eisen.transforms import ResampleNiftiVolumes

resample_tform = ResampleNiftiVolumes(
    ['image', 'label'],
    [1.0, 1.0, 1.0],
    'linear'
)

data = resample_tform(data)

Further documentation of the functionality of the Transforms module is reported below.

Imaging Transforms

Imaging transforms are used to handle 2D and 3D imaging data. These transforms implement basic data manipulation such as resampling, cropping, thresholding, normalizing, etc which are useful in the context of data pre-processing for deep learning tasks.

class eisen.transforms.imaging.CreateConstantFlags(fields, values)[source]

Transform allowing to create new fields in the data dictionary containing constants of any type

from eisen.transforms import CreateConstantFlags
tform = CreateConstantFlags(['my_field', 'my_text'], [42.0, 'hello'])
tform = tform(data)
__init__(fields, values)[source]
Parameters
  • fields (list of str) – names of the fields of data dictionary to work on

  • values (list of values) – list of values to add to data

from eisen.transforms import CreateConstantFlags

tform = CreateConstantFlags(
    fields=['my_field', 'my_text'],
    values=[42.0, 'hello']
)
class eisen.transforms.imaging.RenameFields(fields, new_fields)[source]

Transform allowing to rename fields in the data dictionary

from eisen.transforms import RenameFields
tform = RenameFields(['old_name1', 'old_name2'], ['new_name1', 'new_name2'])
tform = tform(data)
__init__(fields, new_fields)[source]
Parameters
  • fields (list of str) – list of names of the fields of data dictionary to rename

  • new_fields (list of str) – new field names for the data dictionary

from eisen.transforms import RenameFields

tform = RenameFields(
    fields=['old_name1', 'old_name2'],
    new_fields=['new_name1', 'new_name2']
)
class eisen.transforms.imaging.FilterFields(fields)[source]

Transform allowing to retain in the data dictionary only a list of fields specified as init argument

from eisen.transforms import FilterFields
tform = FilterFields(['field1', 'field2'])
tform = tform(data)

The resulting data dictionary will only have ‘field1’ and ‘field2’ as keys.

__init__(fields)[source]
Parameters

fields (list of str) – list of fields to KEEP after the transform

from eisen.transforms import FilterFields
tform = FilterFields(fields=['field1', 'field2'])
tform = tform(data)
class eisen.transforms.imaging.ResampleNiftiVolumes(fields, resolution, interpolation='linear')[source]

Transform resampling nifti volumes to a new resolution (expressed in millimeters). This transform can be only applied to fields of the data dictionary containing objects of type Nifti (nibabel)

from eisen.transforms import ResampleNiftiVolumes
tform = ResampleNiftiVolumes(['nifti_data'], [1.0, 1.0, 1.0], 'linear')
tform = tform(data)
__init__(fields, resolution, interpolation='linear')[source]
Parameters
  • fields (list of str) – list of names of the fields of data dictionary to work on

  • resolution (list of float) – vector of float values expressing desired resolution in mm

  • interpolation (string) – interpolation strategy to use

from eisen.transforms import ResampleNiftiVolumes
tform = ResampleNiftiVolumes(
    fields=['nifti_data'],
    resolution=[1.0, 1.0, 1.0],
    interpolation='linear'
)
class eisen.transforms.imaging.ResampleITKVolumes(fields, resolution, interpolation='linear')[source]

Transform resampling ITK volumes to a new resolution (expressed in millimeters). This transform can be only applied to fields of the data dictionary containing objects of type ITK (SimpleITK)

from eisen.transforms import ResampleITKVolumes
tform = ResampleITKVolumes(['itk_data'], [1.0, 1.0, 1.0], 'linear')
tform = tform(data)
__init__(fields, resolution, interpolation='linear')[source]
Parameters
  • fields (list of str) – list of names of the fields of data dictionary to work on

  • resolution (list of float) – vector of float values expressing desired resolution in mm

  • interpolation (string) – interpolation strategy to use

from eisen.transforms import ResampleITKVolumes
tform = ResampleITKVolumes(
    fields=['itk_data'],
    resolution=[1.0, 1.0, 1.0],
    interpolation='linear'
)
class eisen.transforms.imaging.NiftiToNumpy(fields, multichannel=False)[source]

This transform allows a Nifti volume to be converted to Numpy format. It is necessary to have this transform at a certain point of every transformation chain as PyTorch uses data in Numpy format before converting it to PyTorch Tensor.

from eisen.transforms import NiftiToNumpy
tform = NiftiToNumpy(['image', 'label'])
tform = tform(data)
__init__(fields, multichannel=False)[source]
Parameters
  • fields (list of str) – list of names of the fields of data dictionary to convert from Nifti to Numpy

  • multichannel (bool) – need to set this parameter to True if data is multichannel

from eisen.transforms import NiftiToNumpy
tform = NiftiToNumpy(fields=['image', 'label'])
tform = tform(data)
class eisen.transforms.imaging.ITKToNumpy(fields, multichannel=False)[source]

This transform allows a ITK volume to be converted to Numpy format. It is necessary to have this transform at a certain point of every transformation chain as PyTorch uses data in Numpy format before converting it to PyTorch Tensor.

from eisen.transforms import ITKToNumpy
tform = ITKToNumpy(['image', 'label'])
tform = tform(data)
__init__(fields, multichannel=False)[source]
Parameters
  • fields (list of str) – list of names of the fields of data dictionary to convert from ITK to Numpy

  • multichannel (bool) – need to set this parameter to True if data is multichannel

from eisen.transforms import ITKToNumpy
tform = ITKToNumpy(fields=['image', 'label'], multichannel=False)
tform = tform(data)
class eisen.transforms.imaging.PilToNumpy(fields, multichannel=False)[source]

This transform allows a PIL image to be converted to Numpy format. It is necessary to have this transform at a certain point of every transformation chain as PyTorch uses data in Numpy format before converting it to PyTorch Tensor.

from eisen.transforms import PilToNumpy
tform = PilToNumpy(['image', 'label'])
tform = tform(data)
__init__(fields, multichannel=False)[source]
Parameters
  • fields (list of str) – list of names of the fields of data dictionary to convert from PIL to Numpy

  • multichannel (bool) – need to set this parameter to True if data is multichannel

from eisen.transforms import PilToNumpy
tform = PilToNumpy(fields=['image', 'label'])
tform = tform(data)
class eisen.transforms.imaging.MapValues(fields, min_value=0, max_value=1, channelwise=False)[source]

Transform implementing normalization by standardizing the range of data to a known interval. The formula used here is to subtract the minimum value to each data tensor and divide by its maximum range. After that the tensor is multiplied by the max_value .

from eisen.transforms import MapValues
tform = MapValues(['image'], 0, 10)
tform = tform(data)

Is an usage examples where data is normalized to fit the range [0, 10].

__init__(fields, min_value=0, max_value=1, channelwise=False)[source]
Parameters
  • fields (list of str) – list of fields of the data dictionary that will be affected by this transform

  • min_value (float) – minimum desired data value

  • max_value (float) – maximum desired data value

  • channelwise (bool) – whether the transformation should be applied to each channel separately

from eisen.transforms import MapValues
tform = MapValues(
    fields=['image'],
    min_value=0,
    max_value=1,
    channelwise=False
)
tform = tform(data)
class eisen.transforms.imaging.ThresholdValues(fields, threshold, direction='greater')[source]

This transformation threshold the values contained in a tensor. Depending on a parameter supplied by the user, all the value greater, smaller, greater/equal, smaller/equal of a certain threshold are set to 1 while the others are set to zero.

from eisen.transforms import ThresholdValues
tform = ThresholdValues(['label'], 0.5, 'greater')
tform = tform(data)

This example thresholds the values of the tensor stored in correspondence of the key ‘label’ such that those below 0.5 are set to zero and those above 0.5 are set to one.

__init__(fields, threshold, direction='greater')[source]
Parameters
  • fields (list of str) – list of fields of the data dictionary that will be affected by this transform

  • threshold (float) – threshold value for the transform

  • direction (string) – direction of the comparison values and the threshold possible values are: greater, smaller, greater/equal, smaller/equal

from eisen.transforms import ThresholdValues
tform = ThresholdValues(
    fields=['image'],
    threshold=0,
    direction='greater/equal'
)
tform = tform(data)
class eisen.transforms.imaging.AddChannelDimension(fields)[source]

This transformation adds a “channel dimension” to a tensor. Since we use a representation NCHWD for our data, with channels first, this transform creates a new axis in correspondence of the first dimension of the resulting data tensor.

from eisen.transforms import AddChannelDimension
tform = AddChannelDimension(['image', 'label'])
tform = tform(data)

Adds a singleton dimension to the data stored in correspondence of the keys ‘image’ and ‘label’ of data dictionary.

__init__(fields)[source]
Parameters

fields (list of str) – list of fields of the data dictionary that will be affected by this transform

from eisen.transforms import AddChannelDimension
tform = AddChannelDimension(
    fields=['image', 'label']
)
tform = tform(data)
class eisen.transforms.imaging.StackImagesChannelwise(fields, dst_field, create_new_dim=True)[source]

This transform allows stacking together different tensors of the same size stored at different fields of the data dictionary. The tensors are stacked along the channel dimension. The resulting tensor is therefore multi-channel and contains data from all the fields passed as argument by the user.

from eisen.transforms import StackImagesChannelwise
tform = StackImagesChannelwise(['modality1', 'modality2', 'modality3'], 'allmodalities')
tform = tform(data)

This example stacks together multiple modalities in one multi-channel tensor.

__init__(fields, dst_field, create_new_dim=True)[source]
Parameters
  • fields (list of str) – list of fields of the data dictionary that will be stacked together in the output tensor

  • dst_field (str) – string representing the destination field of the data dictionary where outputs will be stored.

  • create_new_dim (bool) – whether a new dimension should be created as result of concat.

from eisen.transforms import StackImagesChannelwise
tform = StackImagesChannelwise(
    fields=['modality1', 'modality2', 'modality3'],
    dst_field='allmodalities'
    create_new_dim=True
)
tform = tform(data)
class eisen.transforms.imaging.CropCenteredSubVolumes(fields, size)[source]

Transform implementing padding/cropping the last 3 dimension of a N-channel 3D volume. A 3D volume processed with this transform will be cropped or padded so that its final size will be corresponding to what specified by the user during instantiation.

from eisen.transforms import CropCenteredSubVolumes
tform = CropCenteredSubVolumes(['image', 'label'], [128, 128, 128])
tform = tform(data)

Will crop the content of the data dictionary at keys ‘image’ and ‘label’ (which need to be N-channel+3D numpy volumes) to a size of 128 cubic pixels.

__init__(fields, size)[source]
Parameters
  • fields (list of str) – field of the data dictionary to modify and replace with cropped volumes

  • size (list of int) – list of 3 integers expressing the desired size of the cropped volumes

from eisen.transforms import CropCenteredSubVolumes
tform = CropCenteredSubVolumes(
    fields=['image', 'label'],
    size=[128, 128, 128]
)
tform = tform(data)
class eisen.transforms.imaging.LabelMapToOneHot(fields, classes)[source]

This transformation converts labels having integer values to one-hot labels. In other words, a single channel tensor data containing integer values representing classes is converted to a corresponding multi-channel tensor data having one-hot entries channel-wise. Each channel corresponds to a class.

from eisen.transforms import LabelMapToOneHot
tform = LabelMapToOneHot(['label'], [1, 2, 25, 3])
tform = tform(data)

This example converts the single channel data[‘label’] tensor to a 4-channel tensor where each entry represents the corresponding entry of the original tensor in one-hot encoding.

__init__(fields, classes)[source]
Parameters
  • fields (list of str) – list of fields of the data dictionary that will be affected by this transform

  • classes (list of int) – list of class identifiers (integers) to be converted to one-hot representation

from eisen.transforms import LabelMapToOneHot
tform = LabelMapToOneHot(
    fields=['label'],
    classes=[1, 2, 25, 3]
)
tform = tform(data)
class eisen.transforms.imaging.FixedMeanStdNormalization(fields, mean, std)[source]

This transform operates demeaning and division by standard deviation of data tensors. The values for mean and standard deviation need to be provided by the user.

from eisen.transforms import FixedMeanStdNormalization
tform = FixedMeanStdNormalization(['image'], 0.5, 1.2)
tform = tform(data)

This example manipulates the data stored in data[‘images’] by removing the mean (0.5) and the std (1.2).

__init__(fields, mean, std)[source]
Parameters
  • fields (list of str) – list of fields of the data dictionary that will be affected by this transform

  • mean (float) – float value representing the mean. This value will be subtracted from the data

  • std (float) – float value representing the standard deviation. The data will be divided by this value.

from eisen.transforms import FixedMeanStdNormalization
tform = FixedMeanStdNormalization(
    fields=['image'],
    mean=0.5,
    std=1.2
)
tform = tform(data)
class eisen.transforms.imaging.RepeatTensor(fields, reps)[source]

This transform repeats tensors “reps” times on each axis according to user parameters

from eisen.transforms import RepeatTensor
tform = RepeatTensor(['image'], (10, 1, 1))
tform = tform(data)

This example repeats the tensor 10 times along the first axis and zero times along the others

__init__(fields, reps)[source]
Parameters
  • fields (list of str) – list of fields of the data dictionary that will be affected by this transform

  • reps (list of int) – list of integers representing repetitions along each axis

from eisen.transforms import RepeatTensor
tform = RepeatTensor(
    fields=['image'],
    reps=(10, 1, 2)
)
tform = tform(data)
class eisen.transforms.imaging.NumpyToNifti(fields, affine=None, data_types=None)[source]

This transform allows a Numpy volume to be converted to Nifti image object (nibabel). This transformation may be useful when writing Numpy array to disk in Nifti format using the WriteNiftiToFilename I/O transform. Note: the transform currently does not explicitly handle multichannel data.

from eisen.transforms import NumpyToNifti
tform = NumpyToNifti(['image', 'label'])
tform = tform(data)
__init__(fields, affine=None, data_types=None)[source]
Parameters
  • fields (list of str) – list of names of the fields of data dictionary to convert from Nifti to Numpy

  • affine (np.ndarray) – affine transformation matrix (see nibabel) specifying spacing, origin, and orientation

Data_types

dictionary in which key is the field and value is the output data type

:type data_types dict

from eisen.transforms import NumpyToNifti
tform = NumpyToNifti(fields=['image', 'label'], affine=np.eye(4),
    data_types=np.float32)
tform = tform(data)

Models

In order to favor reproducibility of deep learning approaches and easy benchmarking, as well as providing “starter-kit” tools to users approaching a certain problem for the first time, we include several well-known neural network architecture within Eisen. This is similar to the approach taken by torchvision which ships network architectures for classification, segmentation and beyond within the package.

Models can be used in any custom code, with or without the rest of the functionality provided by Eisen. In this sense, they are standard torch.nn.Module objects.

Models can be used within Eisen workflows (see below) after wrapping them via EisenModuleWrapper which is available in the eisen.utils submodule. This also allows third party models, such as those available in torchvision, to be used within Eisen.

Segmentation Models

Several models for segmentation are already included in Eisen. These approaches have been successfully used in several academic works. Refer to the related publications to obtain more information.

class eisen.models.segmentation.UNet3D(input_channels, output_channels, n_filters=16, outputs_activation='sigmoid', normalization='groupnorm')[source]
__init__(input_channels, output_channels, n_filters=16, outputs_activation='sigmoid', normalization='groupnorm')[source]
Parameters
  • input_channels (int) – number of input channels

  • output_channels (int) – number of output channels

  • n_filters (int) – number of filters

  • outputs_activation (str) – output activation type either sigmoid, softmax or none

  • normalization (str) – normalization either groupnorm, batchnorm or none

forward(x)[source]

Computes output of the network.

Parameters

x (torch.Tensor) – Input tensor containing images

Returns

prediction

class eisen.models.segmentation.UNet(input_channels, output_channels, n_filters=64, bilinear=False, outputs_activation='sigmoid', normalization='groupnorm')[source]
__init__(input_channels, output_channels, n_filters=64, bilinear=False, outputs_activation='sigmoid', normalization='groupnorm')[source]
Parameters
  • input_channels (int) – number of input channels

  • output_channels (int) – number of output channels

  • n_filters (int) – number of filters

  • outputs_activation (str) – output activation type either sigmoid, softmax or none

  • normalization (str) – normalization either groupnorm, batchnorm or none

forward(x)[source]

Computes output of the network.

Parameters

x (torch.Tensor) – Input tensor containing images

Returns

prediction

class eisen.models.segmentation.VNet(input_channels=3, output_channels=2, n_filters=16, filter_size=3, normalization='none', outputs_activation='sigmoid')[source]
__init__(input_channels=3, output_channels=2, n_filters=16, filter_size=3, normalization='none', outputs_activation='sigmoid')[source]
Parameters
  • input_channels (int) – number of input channels

  • output_channels (int) – number of output channels

  • n_filters (int) – number of filters

  • filter_size (int) – spatial size of the filters

  • normalization (str) – normalization either groupnorm, batchnorm, instancenorm or none

  • outputs_activation (str) – output activation. either sigmoid, softmax or none

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class eisen.models.segmentation.ObeliskMIDL(num_labels, full_res, outputs_activation='sigmoid')[source]
__init__(num_labels, full_res, outputs_activation='sigmoid')[source]
Parameters
  • num_labels (int) – number of output channels

  • full_res (list) – final resolution (sizes)

  • outputs_activation (str) – output activation type either sigmoid, softmax or none

forward(images, sample_grid=None)[source]

Computes output of the Obelisk network.

Parameters
  • images (torch.Tensor) – Input tensor containing images

  • sample_grid (torch.Tensor) – Optional parameter, sampling grid. can be obtained via F.affine_grid(…)

Returns

prediction

class eisen.models.segmentation.HighRes2DNet(input_channels, output_channels, initial_out_channels_power=4, outputs_activation='sigmoid', *args, **kwargs)[source]
__init__(input_channels, output_channels, initial_out_channels_power=4, outputs_activation='sigmoid', *args, **kwargs)[source]
Parameters
  • input_channels (int) – number of input channels

  • output_channels (int) – number of output channels

  • initial_out_channels_power (int) – initial output channels power

  • outputs_activation (str) – output activation type either sigmoid, softmax or none

forward(x)

Computes output of the network.

Parameters

x (torch.Tensor) – Input tensor containing images

Returns

prediction

class eisen.models.segmentation.HighRes3DNet(input_channels, output_channels, initial_out_channels_power=4, outputs_activation='sigmoid', *args, **kwargs)[source]
__init__(input_channels, output_channels, initial_out_channels_power=4, outputs_activation='sigmoid', *args, **kwargs)[source]
Parameters
  • input_channels (int) – number of input channels

  • output_channels (int) – number of output channels

  • initial_out_channels_power (int) – initial output channels power

  • outputs_activation (str) – output activation type either sigmoid, softmax or none

forward(x)

Computes output of the network.

Parameters

x (torch.Tensor) – Input tensor containing images

Returns

prediction

Ops

Eisen includes various operations that are useful when developing deep learning models. The operations are always implemented in PyTorch and derived from the class torch.nn.Module as suggested by the PyTorch documentation itself. Eisen contains implementations of layers, metrics and losses. Losses and metrics implementation include methods such as the Dice loss, which find useful application especially in tasks belonging to the medical domain.

Losses

When optimizing neural network parameters during training, it is crucial to specify a suitable loss that pushes the network towards solving the problem at hand. Eisen includes losses that can be optimized during training and computed during validation.

class eisen.ops.losses.DiceLoss(weight=1.0, dim=None)[source]

Dice loss is often used in segmentation tasks to optimize the overlap between the ground truth contour and the prediction. Dice loss is robust to class imbalance and therefore suitable to segment small foreground regions in images or volumes.

This version of the Dice loss supports multi-class segmentation (although in a naive manner).

__init__(weight=1.0, dim=None)[source]
Parameters

weight (float) – absolute weight of this loss

forward(predictions, labels)[source]

Computes Dice loss between predictions and labels.

Parameters
  • predictions (torch.Tensor) – Predictions by the neural network

  • labels – Ground truth annotation from dataset

Returns

Dice loss

Metrics

Benchmarking, testing and validating models often requires computing metrics that can give an estimate of the performance of the network on the problem at hand. Eisen includes metrics modules that can be computed during training, validation and testing.

class eisen.ops.metrics.DiceMetric(weight=1.0, dim=None)[source]

The Dice coefficient is often used in segmentation tasks to evaluate the performance of algorithms by providing a scalar result expressing the amount of overlap between the ground truth contour and the prediction. The Dice coefficient is robust to class imbalance and therefore suitable to evaluate small foreground regions in images or volumes.

This version of the Dice metrics supports multi-class segmentation (although in a naive manner).

__init__(weight=1.0, dim=None)[source]
Parameters

weight (float) – absolute weight of this metric

forward(predictions, labels)[source]

Computes Dice metric between predictions and labels.

Parameters
  • predictions (torch.Tensor) – Predictions by the neural network

  • labels – Ground truth annotation from dataset

Returns

Dice metric

Workflows

Workflows realize high level functionality that joins several building blocks such as losses, metrics, transforms optimizers and models together in order to perform operations on them.

Warning

Eisen-Core versions after 0.0.5 (Eisen versions after 0.1.6) and current versions installed from GitHub repository introduce breaking changes to workflows and wrappers. Model or Data parallelism need to be taken care of before passing the model to the workflow. This documentation illustrates the most recent way of using workflows.

Workflows possess unique IDs. It is possible to retrieve this ID as workflow.id, where workflow is a Workflow instance.

Typical examples of workflows are Training, Testing and Validation. These basic workflows implement respectively the training, testing and validation loops. Workflows are implemented as python objects similar to this:

from eisen.utils.workflows import GenericWorkflow

class DemoWorkflow(GenericWorkflow):
    def __init__(
        self,
        model,
        data_loader,
        losses,
        optimizer,
        metrics
    ):

    # ...

In particular, the arguments of the __init__ function represent building blocks that need to be used together during the workflow.

Refer to the following documentation to learn more.

Hooks

Eisen hooks are triggered when specific events are generated by Eisen workflows. These hooks are python modules implementing things such as logging, model serialization and summary export, which are executed at the end of an epoch or upon detection of a model exhibiting superior performance with respect losses or metrics.

Hooks are designed to process output_dictionaries obtained from workflows. These output dictionaries are generated and aggregated across an epoch during execution of the workflow. They are ultimately sent to all the hooks listening to events originated from each specific workflow.

The supported events are:

  • eisen.EISEN_END_BATCH_EVENT

  • eisen.EISEN_END_EPOCH_EVENT

  • eisen.EISEN_BEST_MODEL_LOSS

  • eisen.EISEN_BEST_MODEL_METRIC

Hooks are only listening to events generated by workflows that they are monitoring. Workflows are identified by their unique ID. They require minimum three parameters to be instantiated:

  • workflow_id

  • phase

  • artifact_dir

It is possible to associate an arbitrary number of hooks to each workflow. They will be executed in sequence. An example of hook instantiation is shown here:

from eisen.utils.logging import LoggingHook
from eisen.utils.logging import TensorboardSummaryHook

# let us suppose a workflow has been already instantiated

first_hook = LoggingHook(workflow.id, 'Training', './')
second_hook = TensorboardSummaryHook(workflow.id, 'Training', './')

The two hooks in this example, first_hook and second_hook will listen for events such as EISEN_END_EPOCH_EVENT in order to execute their actions on the workflow’s output_dictionary. The typical content (keys) of this dictionary is:

  • losses which contains a list of the losses computed during a batch or epoch

  • metrics which contains a list of the metrics computed during a batch or epoch

  • inputs which contains a dictionary of most recent inputs fed to the network

  • outputs which contains a dictionary of most recent outputs produced by the network

  • model which contains a pointer (not a deep copy) to the current model

  • epoch which is an integer representing the current epoch number

Refer to the following documentation to learn more.

Logging hooks

These hooks are mainly used to monitor training. They are able to extract information such as losses, metrics and examples of inputs and outputs and display them to the user.

class eisen.utils.logging.LoggingHook(workflow_id, phase, artifacts_dir)[source]

Logging object aiming at printing on the console the progress of model training/validation/testing. This logger uses an event based system. The training, validation and test workflows emit events such as EISEN_END_BATCH_EVENT and EISEN_END_EPOCH_EVENT which are picked up by this object and handled.

Once the user instantiates such object, the workflow corresponding to the ID passes as argument will be tracked and the results of the workflow in terms of losses and metrics will be printed on the console

from eisen.utils.logging import LoggingHook

workflow = # Eg. An instance of Training workflow

logger = LoggingHook(workflow.id, 'Training', '/artifacts/dir')
__init__(workflow_id, phase, artifacts_dir)[source]
Parameters
  • workflow_id (UUID) – string containing the workflow id of the workflow being monitored (workflow_instance.id)

  • phase (str) – string containing the name of the phase (training, testing, …) of the workflow monitored

  • artifacts_dir (str) – The path of the directory where the artifacts of the workflow are stored

from eisen.utils.logging import LoggingHook

workflow = # Eg. An instance of Training workflow

logger = LoggingHook(
    workflow_id=workflow.id,
    phase='Training',
    artifacts_dir='/artifacts/dir'
)
class eisen.utils.logging.TensorboardSummaryHook(workflow_id, phase, artifacts_dir, comparison_pairs=None, show_all_axes=False)[source]

Logging object allowing Tensorboard summaries to be automatically exported to the tensorboard. Much of its functionality is automated. This means that the hook will export as much information as possible to the tensorboard.

Losses, Metrics, Inputs and Outputs are all interpreted and exported according to their dimensionality. Vectors results in mean and standard deviation estimates as well as histograms; Pictures results in image summaries and histograms; etc.

There is also the possibily of comparing inputs and outputs pair. This needs to be specified during object instantiation.

Once the user instantiates this object, the workflow corresponding to the ID passes as argument will be tracked and the results of the workflow will be exported to the tensorboard.

from eisen.utils.logging import TensorboardSummaryHook

workflow = # Eg. An instance of Training workflow

logger = TensorboardSummaryHook(workflow.id, 'Training', '/artifacts/dir')
__init__(workflow_id, phase, artifacts_dir, comparison_pairs=None, show_all_axes=False)[source]

This method instantiates an object of type TensorboardSummaryHook. The signature of this method is similar to that of every other hook. There is one additional parameter called comparison_pairs which is meant to hold a list of lists each containing a pair of input/output names that share the same dimensionality and can be compared to each other.

A typical use of comparison_pairs is when users want to plot a pr_curve or a confusion matrix by comparing some input with some output. Eg. by comparing the labels with the predictions.

from eisen.utils.logging import TensorboardSummaryHook

workflow = # Eg. An instance of Training workflow

logger = TensorboardSummaryHook(
    workflow_id=workflow.id,
    phase='Training',
    artifacts_dir='/artifacts/dir'
    comparison_pairs=[['labels', 'predictions']]
)
Parameters
  • workflow_id (UUID) – string containing the workflow id of the workflow being monitored (workflow_instance.id)

  • phase (str) – string containing the name of the phase (training, testing, …) of the workflow monitored

  • artifacts_dir (bool) – whether the history of all models that were at a certain point the best should be saved

  • comparison_pairs (list of lists of strings) – list of lists of pairs, which are names of inputs and outputs to be compared directly

  • show_all_axes (bool) – whether any volumetric data should be shown as axial + sagittal + coronal

Artifacts generation hooks

These hooks are used to save artifacts as a result of a workflow. They save models snapshots in different formats. It is necessary to note that the snapshot are saved as unwrapped torch.nn.Module (they do not belong to EisenModuleWrapper class).

class eisen.utils.artifacts.SaveTorchModelHook(workflow_id, phase, artifacts_dir, select_best_loss=True, save_history=False)[source]

Saves a Torch model snapshot of the current best model. The best model can be selected based using the best average loss or the best average metric. It is possible to save the whole history of best models seen throughout the workflow.

from eisen.utils.artifacts import SaveTorchModelHook

workflow = # Eg. An instance of Validation workflow

saver = SaveTorchModelHook(workflow.id, 'Validation', '/my/artifacts')
__init__(workflow_id, phase, artifacts_dir, select_best_loss=True, save_history=False)[source]
Parameters
  • workflow_id (UUID) – the ID of the workflow that should be tracked by this hook

  • phase (str) – the phase where this hook is being used (Training, Testing, etc.)

  • artifacts_dir (bool) – the path of the artifacts where the results of this hook should be stored

  • select_best_loss (bool) – whether the criterion for saving the model should be best loss or best metric

  • artifacts_dir – whether the history of all models that were at a certain point the best should be saved

from eisen.utils.artifacts import SaveTorchModel

workflow = # Eg. An instance of Validation workflow

saver = SaveTorchModel(
    workflow_id=workflow.id,
    phase='Validation',
    artifacts_dir='/my/artifacts',
    select_best_loss=True,
    save_history=False
)
class eisen.utils.artifacts.SaveONNXModelHook(workflow_id, phase, artifacts_dir, input_size, select_best_loss=True, save_history=False)[source]

Saves a ONNX model snapshot of the current best model. The best model can be selected based using the best average loss or the best average metric. It is possible to save the whole history of best models seen throughout the workflow.

from eisen.utils.artifacts import SaveONNXModelHook

workflow = # Eg. An instance of Validation workflow

saver = SaveONNXModelHook(workflow.id, 'Validation', '/my/artifacts', [1, 3, 224, 224])
__init__(workflow_id, phase, artifacts_dir, input_size, select_best_loss=True, save_history=False)[source]
Parameters
  • workflow_id (UUID) – the ID of the workflow that should be tracked by this hook

  • phase (str) – the phase where this hook is being used (Training, Testing, etc.)

  • artifacts_dir (bool) – the path of the artifacts where the results of this hook should be stored

  • input_size (list of int) – a list of integers expressing the input size that the saved model will process

  • select_best_loss (bool) – whether the criterion for saving the model should be best loss or best metric

  • artifacts_dir – whether the history of all models that were at a certain point the best should be saved

from eisen.utils.artifacts import SaveONNXModelHook

workflow = # Eg. An instance of Validation workflow

saver = SaveONNXModelHook(
    workflow_id=workflow.id,
    phase='Validation',
    artifacts_dir='/my/artifacts',
    input_size=[1, 3, 224, 224],
    select_best_loss=True,
    save_history=False
)

Artifacts

Artifacts can be generated without using hooks. Objects of type torch.nn.Module can be serialized in different formats using functionality provided by Eisen. Naturally, they behave just like any other torch.nn.Module therefore can be saved also using other methods. This is true for any model used into Eisen, even when no workflow has been used for training, testing and validation.

It is suggested NOT to use wrapped modules during serialization. That is, serializing a EisenModuleWrapper object, even if such object is derived from torch.nn.Module can result in issues depending on the chosen type of serialization.

Refer to the following documentation to learn more.

class eisen.utils.artifacts.SaveTorchModel(artifacts_dir)[source]

This object implements model saving for pytorch models. Once instantiated with a parameter consisting of a string representing the path of the directory where the model shall be saved, it can be called on a model in order to save it.

No information about optimizer and training is saved in the process.

from eisen.utils.artifacts import SaveTorchModel

my_model = # Eg. A torch.nn.Module instance

saver = SaveTorchModel('/my/artifacts')

saver(my_model)
__call__(model, filename='model.pt')[source]

Saves a model passed as argument. The model will be saved in Torch (statedict) format.

Parameters
  • model (torch.nn.Module) – Model to be saved (refrain from using wrapped modules, see EisenModuleWrapper)

  • filename (str) – The filename that shall be used to save the model

Returns

None

__init__(artifacts_dir)[source]

Initializes a SaveTorchModel object.

Parameters

artifacts_dir (str) – The path of the directory where the model shall be stored after serialization

class eisen.utils.artifacts.SaveONNXModel(artifacts_dir, input_size)[source]

This object exports a torch.nn.Module in ONNX format. The user is asked to supply two parameters for initialization. The first parameter is the artifact directory, a string representing the path where the model is supposed to be stored after serialization. The second parameter is the input size, a list of integers containing the size of the inputs to be processed by the network.

from eisen.utils.artifacts import SaveONNXModel

my_model = # Eg. A torch.nn.Module instance

saver = SaveONNXModel('/my/artifacts', [1, 1, 224, 224])

saver(my_model)
__call__(model, filename='model.onnx')[source]

Saves a model passed as argument. The model will be saved in ONNX format.

Parameters
  • model (torch.nn.Module) – Model to be saved (refrain from using wrapped modules, see EisenModuleWrapper)

  • filename (str) – The filename that shall be used to save the model

Returns

None

__init__(artifacts_dir, input_size)[source]

Initializes a SaveONNXModel object.

Parameters
  • artifacts_dir (str) – The path of the directory where the model shall be stored after serialization

  • input_size (list of int) – The size of the input the network will be processing after serialization

Wrappers

Many packages in the PyTorch echosystem such as torchvision, are not fully compatible with Eisen. Eisen makes heavy use of dictionaries throught most of the objects in the module eisen.utils and beyond.

Dataset entries are represented as dictionaries, batches are also dictionaries, input and output of modules (models, losses, metrics) are also expected to be dictionaries.

This architecture is unfortunately not universally adopted, and since it is the key of the flexibility of Eisen, adaptors have been developed in order to make functionality inherited from other packages fully compatible with Eisen with no significant impact on performance.

Our wrappers are adaptors that perform simple translation of input and outputs variables from and to the specific format expected by Eisen. We include below the documentation of our wrappers with usage examples.

Warning

Eisen-Core versions after 0.0.5 (Eisen versions after 0.1.6) and current versions installed from GitHub repository introduce breaking changes to workflows and wrappers. Wrappers require an instance of a Module, Transform or Dataset rather than a Module, Transform or Dataset type. This documentation illustrates the most recent way of using wrappers.

class eisen.utils.EisenModuleWrapper(module, input_names, output_names)[source]

This object implements a wrapper allowing standard PyTorch Modules (Eg. those implemented in torchvision) to be used within Eisen.

Modules in Eisen accept positional and named arguments in the forward() method. They return values or a tuple of values.

Eisen workflows make use of dictionaries. That is, data batches are represented as dictionaries and directly fed into modules using the **kwargs mechanism provided by Python.

This wrapper causes standard Modules to behave as prescribed by Eisen. Wrapped modules accept as input a dictionary of keyword arguments with arbitrary (user defined) keys. They return as output a dictionary of keyword values with arbitrary (user defined) keys.

# We import the Module we want to wrap. In this case we import from torchvision

from torchvision.models import resnet18

# We can then instantiate an object of class EisenModuleWrapper and instantiate the Module we want to
# wrap as well as the fields of the data dictionary that will interpreted as input, and the fields
# that we desire the output to be stored at. Additional arguments for the Module itself can
# be passed as named arguments.

module = resnet18(pretrained=False)

adapted_module = EisenModuleWrapper(module, ['image'], ['prediction'])
__init__(module, input_names, output_names)[source]
Parameters
  • module (torch.nn.Module) – This is a Module instance

  • input_names (list of str) – list of names for positional arguments of module. Must match field names in data batches

  • output_names (list of str) – list of names for the outputs of the module

class eisen.utils.EisenTransformWrapper(transform, fields)[source]

This object implements a wrapper allowing standard PyTorch Transform (Eg. those implemented in torchvision) to be used within Eisen.

Transforms in Eisen operate on dictionaries. They are in fact always called on a dictionary containing multiple keys that store data.

This wrapper causes standard Transforms to behave as prescribed by Eisen.

# We import the transform we want to wrap. In this case we import from torchvision

from torchvision.transforms import CenterCrop

# We can then instantiate an object of class EisenTransformWrapper and specify the Transformation we want to
# wrap as well as the field of the data dictionary that should be affected by such Transformation.
# Additional arguments for the Transformation itself can be passed as named arguments.

transform = CenterCrop((224, 224))

adapted_transform = EisenTransformWrapper(transform, ['image'])
__init__(transform, fields)[source]

Initialize self. See help(type(self)) for accurate signature.

class eisen.utils.EisenDatasetWrapper(dataset, field_names, transform=None)[source]

This object implements a wrapper allowing standard PyTorch Datasets (Eg. those implemented in torchvision) to be used within Eisen.

Datasets in Eisen return items that are always dictionaries. Each key of the dictionary contains information from the dataset.

This wrapper causes standard Datasets to behave as prescribed by Eisen.

# We import the dataset we want to wrap. In this case we import from torchvision

from torchvision.datasets import MNIST

# We can then instantiate an object of class EisenDatasetWrapper and specify the Dataset we want to
# wrap as well as the fields of the data dictionary that will be returned by the adapted __getitem__ method.
# Additional arguments for the Dataset itself can be passed as named arguments.

dataset = MNIST('./', download=True)

adapted_dataset = EisenDatasetWrapper(dataset, ['image', 'label'])
__init__(dataset, field_names, transform=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Other utilities

Eisen contains other utility functions and objects that can be be used to further improve functionality and often independently from Eisen itself. Therefore we report the documentation for these modules.

class eisen.utils.PipelineExecutionStreamer(operations_sequence, split_size)[source]

This execution streamer takes a sequence of operations (torch.nn.Module) and executes them in a pipeline. Clearly this is only useful when each operation is executed on a different device. In this way, the execution can be asynchronously kicked off on each device separately, therefore maximizing the GPU usage. More details about this idea can be found here: https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html#speed-up-by-pipelining-inputs

__init__(operations_sequence, split_size)[source]
Parameters
  • operations_sequence (list of torch.nn.Module) – A list containing operations that should be done in sequence

  • split_size (int) – Split size in order to obtain chunks of each batch to fill the pipeline

class eisen.utils.ModelParallel(module, split_size, device_ids=None, output_device=None)[source]

This object implements model parallelism for PyTorch models. Model parallelism refers to the practice of using multiple GPUs for training by splitting layers across different GPUs. In this way huge models can be stored and trained. This module offers pipelined execution for model parallelism as shown in the PyTorch documentation: https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html#speed-up-by-pipelining-inputs

Additionally, this module works in a completely automatic manner and it behaves similarly to torch.nn.DataParallel. The interface implemented here will look familiar to anyone using torch.nn.DataParallel

Warning

Only single input models can be parallelized via the current version of ModelParallel implemented here. Most models such as Resnet, VNet, Unet etc have a single input (for example a batch of images) therefore we trust that most use cases are covered by the current implementation.

from eisen.utils import ModelParallel
from eisen.models.segmentation import UNet

# Transforming a model instance in a model parallel model instance

model = ModelParallel(UNet(input_channels=1, output_channels=1), split_size=2)

# model is ModelParallel and will execute on multiple GPUs
__init__(module, split_size, device_ids=None, output_device=None)[source]

This method instantiates a ModelParallel Module from a module instance passed by the user. The model must have a single input (forward(x) type of signature for the forward method) otherwise an error is returned.

An example is here:

from eisen.utils import ModelParallel
from eisen.models.segmentation import UNet


model = ModelParallel(
    module=UNet(input_channels=1, output_channels=1),
    split_size=2,
    device_ids=[0, 1, 2, 3],
    output_device=0
)
Parameters
  • module (torch.nn.Module) – an instance of the model that should be parallelized

  • split_size (int) – split size for pipelined execution

  • device_ids (list) – list of int or torch devices indicating GPUs to use

  • output_device (int or torch device) – int or torch device indicating output devices

Docs

Access comprehensive developer documentation for Eisen

View Docs

Tutorials

Get in-depth tutorials for beginners and advanced developers

View Tutorials

Resources

Find development resources and get your questions answered

View Resources