Datasets¶
Datasets are used in Eisen to bring data into the training/validation/testing or serving pipeline. They are core functionality to Eisen together with transforms, I/O operations, models and other constructs.
Eisen Datasets are very similar to those commonly used in pytorch. In this sense they implement a __init__, __len__ and __getitem__ methods.
Users need only to instantiate these modules using the appropriate set of parameters and the rest will be handled by Eisen. An example on how to get started on Datasets can be found in the Eisen colab example and is summarized here.
from eisen.datasets import MSDData
from torch.utils.data import DataLoaderset
# ... define transform chain ...
dataset = MSDDataset(
PATH_DATA,
NAME_MSD_JSON,
'training',
transform=transform
)
# create data loader, this functionality is pure pytorch
data_loader = DataLoader(
dataset,
batch_size=BATCH_SIZE,
shuffle=True,
num_workers=4
)
Upon instantiation it is necessary to create and define the transforms that will manipulate the dataset. Futher documentation about the transforms currently implemented in Eisen, as well as general directions on how to implement new ones are included below.
It is important to know that data in Eisen is always represented as a list of dictionaries. Each entry of the list is a dictionary containing data belonging to the same datapoint instance.
my_example_dataset = [
{"image": "/path/to/image1.jpg", "label": "/path/to/label1.jpg"},
{"image": "/path/to/image2.jpg", "label": "/path/to/label2.jpg"},
{"image": "/path/to/image3.jpg", "label": "/path/to/label3.jpg"},
]
The example above conveys the general form that a dataset assumes inside Eisen. This form has to be taken into account when implementing your own Datasets. Once the data is organized in this way it can be processed by Transforms. The transforms are fed individual entries of the list and act on one or multiple fields of the resulting dictionary.
-
class
eisen.datasets.
JsonDataset
(data_dir, json_file, transform=None)[source]¶ This object implements the capability of reading arbitrary data contained in properly structured JSON file into Eisen. The expected JSON file structure is a list of dictionaries. Each entry of the list contains one element of the dataset. Each key of the dictionary stores different information about that data point.
Example of JSON structure:
[ {'image': 'image_file1.png', 'label': 'label_file1.png'}, {'image': 'image_file2.png', 'label': 'label_file2.png'} ]
Note
This dataset will generate data entries with fields corresponding to what is stored in each entry of the json dataset list.
from eisen.datasets import JsonDataset dset = JsonDataset('/abs/path/to/data', '/abs/path/to/file.json', transform)
-
__init__
(data_dir, json_file, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located
json_file (str) – the name of the json file containing the data
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import JsonDataset dset = JsonDataset( data_dir='/abs/path/to/data', json_file='/abs/path/to/file.json', transform=transform )
-
-
class
eisen.datasets.
MSDDataset
(data_dir, json_file, phase, transform=None)[source]¶ This object allows Medical Segmentation Decathlon data to be easily impoted in Eisen. More information about the data can be found here http://medicaldecathlon.com
Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type MSDDataset which will make use of the directory structure and the descriptive json file included in it and make the data available to Eisen.
Note
This dataset will return data items with fields: ‘image’ and, optionally, ‘label’.
from eisen.datasets import MSDDataset dataset = MSDDataset( '/abs/path/to/data', '/path/to/dataset.json', 'training', transform, )
-
__init__
(data_dir, json_file, phase, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located (dataset location after unzipping)
json_file (str) – the name of the json file containing for the MSD dataset
phase (string) – training or test phase as per MSD dataset convention (look at MSD json file)
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import MSDDataset dataset = MSDDataset( data_dir='/abs/path/to/data', json_file='/path/to/dataset.json', phase='training', transform=transform, )
-
-
class
eisen.datasets.
PatchCamelyon
(data_dir, x_h5_file, y_h5_file, mask_h5_file=None, transform=None)[source]¶ This object implements the capability of reading PatchCamelyon data. Further information about this dataset can be found on the official website https://patchcamelyon.grand-challenge.org/Introduction/
Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type PatchCamelyon which will make use of the data in the directory as well as the h5 files that are part of the dataset and make it available to Eisen.
Note
This dataset will generate data entries with keys: ‘image’, ‘label’ and optionally ‘mask’. The generated image and label are tensors.
from eisen.datasets import PatchCamelyon dset = PatchCamelyon( '/data/root/path', 'camelyon_patch_level_2_split_train_x.h5', 'camelyon_patch_level_2_split_train_y.h5', 'camelyon_patch_level_2_split_train_mask.h5' )
-
__init__
(data_dir, x_h5_file, y_h5_file, mask_h5_file=None, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located
x_h5_file (str) – the relative path of the H5 file containing x (the images)
y_h5_file (str) – the relative path of the H5 file containing y (the labels)
mask_h5_file (str) – the relative path of the H5 file containing masks
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import PatchCamelyon dset = PatchCamelyon( data_dir='/data/root/path', x_h5_file='camelyon_patch_level_2_split_train_x.h5', y_h5_file='camelyon_patch_level_2_split_train_y.h5', mask_h5_file='camelyon_patch_level_2_split_train_mask.h5', transform=transform )
-
-
class
eisen.datasets.
CAMUS
(data_dir, with_ground_truth, with_2CH=True, with_4CH=True, with_entire_sequences=False, transform=None)[source]¶ This object implements the capability of reading CAMUS data. The CAMUS dataset is a dataset of ultrasound images of the heart. Further information about this dataset can be found on the official website https://www.creatis.insa-lyon.fr/Challenge/camus/index.html
Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type CAMUS which will make use of the data in the directory and make it available to Eisen.
Note
This dataset will generate data entries with keys: ‘type’, ‘image_2CH’, ‘label_2CH’, ‘sequence_2CH’, ‘image_4CH’, ‘label_4CH’, sequence_4CH depending on the selected input parameter configuration. The data generated consists of paths to images and type (string).
from eisen.datasets import CAMUS dset = CAMUS('/data/root/path')
-
__init__
(data_dir, with_ground_truth, with_2CH=True, with_4CH=True, with_entire_sequences=False, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located
with_ground_truth (bool) – whether ground truth annotation should be included (won’t work during testing)
with_2CH (bool) – whether 2 chambers data should be included (default True)
with_4CH (bool) – whether 4 chambers data should be included (default True)
with_entire_sequences (bool) – whether the entire sequences for 4CH and 2CH data should be included (default False)
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import CAMUS dset = CAMUS( data_dir='/data/root/path', with_ground_truth=True, with_2CH=True, with_4CH=True, with_entire_sequences=False transform=None )
-
-
class
eisen.datasets.
RSNAIntracranialHemorrhageDetection
(data_dir, training, transform=None)[source]¶ This object implements the capability of reading the Kaggle RSNA Intracranial Hemorrhage Detection dataset. Further information about this dataset can be found on the official website https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/overview
Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type RSNAIntracranialHemorrhageDetection which will parse said directory and make the data available to Eisen.
Note
This dataset will return data points in form of a dictionary having keys: ‘image’ and during training ‘label’ as well.
from eisen.datasets import RSNAIntracranialHemorrhageDetection dset = RSNAIntracranialHemorrhageDetection('/data/root/path', True)
-
__init__
(data_dir, training, transform=None)[source]¶ - Parameters
data_dir (str) – The dataset root path directory where the challenge dataset is stored
training (bool) – Boolean indicating whether training or test data should be loaded
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import RSNAIntracranialHemorrhageDetection dset = RSNAIntracranialHemorrhageDetection( data_dir='/data/root/path', training=True )
-
-
class
eisen.datasets.
RSNABoneAgeChallenge
(data_dir, training, transform=None)[source]¶ This object implements the capability of reading the Kaggle RSNA Bone Age Estimation challenge dataset. Further information about this dataset can be found on the official website https://www.kaggle.com/kmader/rsna-bone-age
Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type RSNABoneAgeChallenge which will parse said directory and make the data available to Eisen.
Note
This dataset will return data points as dictionaries having fields: ‘image’, ‘male’ (boolean) and during training ‘label’.
from eisen.datasets import RSNABoneAgeChallenge dset = RSNABoneAgeChallenge('/data/root/path', True)
This dataset will return data points as dictionaries having fields: ‘image’, ‘male’ (boolean) and during training ‘label’.
-
__init__
(data_dir, training, transform=None)[source]¶ - Parameters
data_dir (str) – The dataset root path directory where the challenge dataset is stored
training (bool) – Boolean indicating whether training or test data should be loaded
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import RSNABoneAgeChallenge dset = RSNABoneAgeChallenge( data_dir='/data/root/path', training=True )
-
-
class
eisen.datasets.
MedSegCovid19
(data_dir, image_file, mask_file=None, transform=None)[source]¶ This object allows the medical segmentation covid-19 dataset to be easily imported into Eisen. Find more information about this dataset here: http://medicalsegmentation.com/covid19/
In summary, this dataset is a collection of 100 slices of CT images that have been annotated and were made available to the community.
When instantiating this module it is necessary to point it to the nifti image file and, optionally, the segmentation. The first argument is the data base directory. The second and third argument should be strings representing the name of the nifti images of this dataset, and the fourth argument is a transform (or composition of transforms).
Each entry of this dataset after loading will be a dictionary with one (in case only images are loaded) or two (in case both images and labels are loaded) keys. Each key stores a numpy array containing the 2D data relative to one image.
Note
This dataset will generate data entries with keys: ‘image’ and (optionally) ‘label’.
from eisen.datasets import MedSegCovid19 dataset = MedSegCovid19( '/abs/path/to/data', 'tr_im.nii', 'tr_mask.nii', transform, )
-
__init__
(data_dir, image_file, mask_file=None, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located (results of download)
image_file (str) – the name of the nifti file containing the images
mask_file (string) – the name of the nifti file containing the masks (optional)
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import MedSegCovid19 dataset = MedSegCovid19( image_file='tr_im.nii', mask_file='tr_mask.nii', transform=transform, )
-
-
class
eisen.datasets.
UCSDCovid19
(data_dir, positive_dir, negative_dir, transform=None)[source]¶ This object allows the UCSD Covid-19 2D dataset to be easily imported into Eisen. Find more information about this dataset here: https://github.com/UCSD-AI4H/COVID-CT. This dataset is meant to be used for classification tasks. It also contains metadata which are currently NOT supported in Eisen.
When instantiating this module it is necessary to point it to two directory names: one containing cases of sick patients, and the other containing images from healthy people (not affected by Covid-19)
The first argument is the data base directory. The second and third argument should be strings representing the name of the two directories relative to the base directory and the fourth argument is a transform (or composition of transforms).
Each entry of this dataset after loading will be a dictionary with two keys. The ‘image’ key stores a path to a png file containing images, you can use LoadPILImageFromFilename IO module to read it, and the ‘label’ key contains an integer that is 0 for healthy scans and 1 for sick individuals.
Note
This dataset will return data entries in form of a dictionary having fields: ‘image’ and ‘label’
from eisen.datasets import UCSDCovid19 dataset = UCSDCovid19( '/abs/path/to/data', 'positive', 'negative', transform, )
-
__init__
(data_dir, positive_dir, negative_dir, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located (dataset location after unzipping)
positive_dir (str) – relative path of directory containing positive cases
negative_dir (string) – relative path of directory containing negative cases
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import MedSegCovid19 dataset = MedSegCovid19( image_file='tr_im.nii', mask_file='tr_mask.nii', transform=transform, )
-
-
class
eisen.datasets.
PANDA
(data_dir, csv_file, training, transform=None)[source]¶ This object implements the capability of reading PANDA dataset. Further information about this dataset can be found on the official website https://www.kaggle.com/c/prostate-cancer-grade-assessment/overview
Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type PANDA which will make use of the data in the directory as well as the csv file that are part of the dataset and make it available to Eisen.
Note
This dataset will return data points in form of a dictionary with fields: ‘image’, ‘provider’ and optionally (during training) ‘mask’, ‘isup’, ‘gleason’.
from eisen.datasets import PANDA dset = PANDA( '/data/root/path', 'train.csv', True )
-
__init__
(data_dir, csv_file, training, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located
csv_file (str) – the relative path of the csv file relative to current task
training (bool) – whether the dataset is a training dataset or not
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import PANDA dset = PANDA( data_dir='/data/root/path', csv_file='train.csv', training=True, transform=transform )
-
-
class
eisen.datasets.
KaggleCovid19
(data_dir, csv_file, transform=None)[source]¶ This object implements the capability of reading Kaggle COVID 19 CT dataset. Further information about this dataset can be found on the official website https://www.kaggle.com/andrewmvd/covid19-ct-scans
Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type KaggleCovid19 which will make use of the data in the directory as well as the csv file that are part of the dataset and make it available to Eisen.
Note
This dataset will generate data entries with fields: ‘image’, ‘lung_mask’, ‘infection_mask’, ‘lung_infection_mask’. This data is returned in form of relative paths (to data_dir) of image and mask files.
from eisen.datasets import KaggleCovid19 dset = KaggleCovid19( '/data/root/path', 'metadata.csv', )
-
__init__
(data_dir, csv_file, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located
csv_file (str) – the relative path of the csv file relative to current task
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import KaggleCovid19 dset = KaggleCovid19( data_dir='/data/root/path', csv_file='metadata.csv', transform=transform )
-
-
class
eisen.datasets.
ABCsDataset
(data_dir, training, flat_dir_structure=False, transform=None)[source]¶ This object allows Data from the ABC challenge (2020) data to be easily impoted in Eisen. More information about the data and challenge can be found here https://abcs.mgh.harvard.edu
Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type ABCDataset which will make use of the directory structure and the descriptive json file included in it and make the data available to Eisen.
For what concerns labels and data structure refer to: https://abcs.mgh.harvard.edu/index.php/data/download/s end/3-data-for-abcs/14-readme
Note
This dataset returns the following fields: ‘ct’, ‘t1’, ‘t2’ and ‘label_task1’, ‘label_task2’ when training. The content of these fields consists of paths relative to data_dir, to ct, MR and labels.
Get started code can be found here: https://gist.github.com/faustomilletari/af430acfecf0841d71508455cdadcbbf
from eisen.datasets import ABCsDataset dataset = ABCsDataset( '/abs/path/to/data', True, False, transform, )
-
__init__
(data_dir, training, flat_dir_structure=False, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located (dataset location after unzipping)
training (bool) – whether data relative to the training phase should be loaded
flat_dir_structure (bool) – whether data is stored in a directory containing all images (without sub-dirs)
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import ABCsDataset dataset = ABCsDataset( data_dir='/abs/path/to/data', training=True, flat_dir_structure=False, transform=transform, )
-
-
class
eisen.datasets.
EMIDEC
(data_dir, training, transform=None)[source]¶ This object allows Data from the EMIDEC challenge (2020) data to be easily impoted in Eisen. More information about the data and challenge can be found here http://emidec.com
Through this module, users are able to make use of the challenge data by simply specifying the directory where the data is locally stored. Therefore it is necessary to first download the data, store or unpack it in a specific directory and then instantiate an object of type ABCDataset which will make use of the directory structure and the descriptive json file included in it and make the data available to Eisen.
For what concerns labels and data structure refer to the official website http://emidec.com
Note
This dataset returns the following fields: image, metadata and - during training - pathological and label.
from eisen.datasets import EMIDEC dataset = EMIDEC( '/abs/path/to/data', True, False, transform, )
-
__init__
(data_dir, training, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located (dataset location after unzipping)
training (bool) – whether data relative to the training phase should be loaded
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import EMIDEC dataset = EMIDEC( data_dir='/abs/path/to/data', training=True, transform=transform, )
-
-
class
eisen.datasets.
Brats2020
(data_dir, training, transform=None)[source]¶ BraTS 2020 challenge dataset. This multi modal brain tumor segmentation and survival prediction dataset contains multi-center and multi-stage MRI images of brain tumors. It contains images obtained via ‘t1’, ‘t1c’, ‘t2’ and ‘flair’ MRI acquisition sequences, and annotations relative to the GD-enhancing tumor (ET — label 4), the peritumoral edema (ED — label 2), and the necrotic and non-enhancing tumor core (NCR/NET — label 1).
Find more info here: https://www.med.upenn.edu/cbica/brats2020/data.html
Note
This dataset will generate data entries with keys: ‘t1’, ‘t1c’, ‘t2, ‘flair’ and ‘name_mapping’. If the training flag is set during initialization it will also provide ‘label’ and ‘survival_info’. The data in ‘name_mapping’ and ‘survival_info’ is also represented in form of dictionary and contains data obtained from the fields (columns) of name_mapping.csv and surivival_info.csv
from eisen.datasets import Brats2020 dset = Brats2020('/data/root/path', True, tform)
-
__init__
(data_dir, training, transform=None)[source]¶ - Parameters
data_dir (str) – the base directory where the data is located (after unzipping the archive)
training (bool) – whether the labels and survival information should be loaded for training
transform (callable) – a transform object (can be the result of a composition of transforms)
from eisen.datasets import Brats2020 dset = Brats2020( data_dir='/data/root/path', training=True, transform=tform )
-
I/O¶
Eisen I/O functionality is contained in this module. I/O functionality is implemented by transforms. That is, this functionality behaves just like any other eisen.transform module. The only difference is that they operate on disk. Another reason of this distinction is that we decided to follow the package structure of torchvision, which has an torchvision.io sub-package.
An example of how I/O functionality can be used to load Nifti data is contained in the Eisen example Colab notebook and it is also shown in compact form here:
from eisen.datasets import MSDData
from eisen.io import LoadNiftyFromFilename
my_reader = LoadNiftyFromFilename(['image', 'label'], PATH_DATA)
dataset = MSDDataset(
PATH_DATA,
NAME_MSD_JSON,
'training',
transform=my_reader
)
Just like regular transforms, the I/O transforms can be called on data. The data needs to be a Python dictionary and the call can be done in this way:
from eisen.io import LoadNiftyFromFilename
# Assuming my_data_dictionary to be a dictionary containing data
my_reader = LoadNiftyFromFilename(['image', 'label'], PATH_DATA)
my_data_dictionary = my_reader(my_data_dictionary)
-
class
eisen.io.
LoadITKFromFilename
(fields, data_dir)[source]¶ This transform loads ITK data from filenames contained in specific field of the data dictionary. Although this transform follows the general structure of other transforms, such as those contained in eisen.transforms, it’s kept separated from the others as it is responsible for I/O operations interacting with the disk.
from eisen.io import LoadITKFromFilename tform = LoadITKFromFilename(['image', 'label'], '/abs/path/to/dataset')
-
__init__
(fields, data_dir)[source]¶ LoadITKFromFilename loads ITK compatible files. The data is always read as float32.
- Parameters
fields (list) – fields of the dictionary containing ITK file paths that need to be read
data_dir (str) – source data directory where data is located. This directory will be joined with data paths
from eisen.io import LoadITKFromFilename tform = LoadITKFromFilename( fields=['image', 'label'], data_dir='/abs/path/to/dataset' )
-
-
class
eisen.io.
LoadDICOMFromFilename
(fields, data_dir, store_data_array=True)[source]¶ This transform loads DICOM data from filenames contained in a specific field of the data dictionary. Although this transform follows the general structure of other transforms, such as those contained in eisen.transforms, it’s kept separated from the others as it is responsible for I/O operations interacting with the disk
from eisen.io import LoadDICOMFromFilename tform = LoadDICOMFromFilename(['image', 'label'], '/abs/path/to/dataset')
-
__init__
(fields, data_dir, store_data_array=True)[source]¶ - Parameters
fields (list) – list of names of the field of data dictionary to work on. These fields should contain data paths
data_dir (str) – source data directory where data is located. This directory will be joined with data paths
store_data_array (bool) – whether image data as numpy array should be stored (in “field” + “_pixel_array”)
from eisen.io import LoadDICOMFromFilename tform = LoadDICOMFromFilename( fields=['image'], data_dir='/abs/path/to/dataset' store_data_array=True )
-
-
class
eisen.io.
LoadPILImageFromFilename
(fields, data_dir)[source]¶ This transform loads Images from filenames contained in a specific field of the data dictionary. The images are loaded via Pillow, an imaging library for Python. Although this transform follows the general structure of other transforms, such as those contained in eisen.transforms, it’s kept separated from the others as it is responsible for I/O operations interacting with the disk
from eisen.io import LoadPILImageFromFilename tform = LoadPILImageFromFilename(['image', 'label'], '/abs/path/to/dataset')
-
__init__
(fields, data_dir)[source]¶ - Parameters
fields (list) – list of names of the field of data dictionary to work on. These fields should contain data paths
data_dir (str) – source data directory where data is located. This directory will be joined with data paths
from eisen.io import LoadPILImageFromFilename tform = LoadPILImageFromFilename( fields=['image'], data_dir='/abs/path/to/dataset' )
-
-
class
eisen.io.
WriteNiftiToFile
(fields, name_fields=None, filename_prefix='./')[source]¶ This transform writes NIFTI data to a file on disk. Although this transform follows the general structure of other transforms, such as those contained in eisen.transforms, it’s kept separated from the others as it is responsible for I/O operations interacting with the disk.
from eisen.io import WriteNiftiToFile tform = WriteNiftiToFile(['image', 'label'], '/abs/path/to/filename')
-
__init__
(fields, name_fields=None, filename_prefix='./')[source]¶ - Parameters
fields (list) – list of names of the field of data dictionary to work on. These fields should contain data paths
filename_prefix (str) – absolute path plus file prefix of output file
from eisen.io import WriteNiftiToFile tform = WriteNiftiToFile( fields=['image', 'label'], name_fields=['image_name', 'label_name'], filename_prefix='/abs/path/to/dataset' )
-
Transforms¶
Transforms are used to manipulate data in Eisen. Transforms are Python Objects that can be instantiated through the __init__ method and implement a __call__ method. Transforms can be stacked and composed together using torchvision.transforms.Compose.
The reason of the flexibility and composibility of Eisen Transforms - and PyTorch transforms in general - is that their __call__ method implements a standard interface. In the case of Eisen it is:
def __call__(self, data):
# Do something
return data
In this code snippet, data is a Python dictionary that contains the data to be processed by the transforms. Each key in this dictionary is a different data field. For example, in the case of an imaging dataset, keys could be [‘image’, ‘label’]. The transform will operate on one or more keys of the dictionary and return a new dictionary with data updated as a result of the transform.
A transform is therefore implemented as a Python object as demonstrated here:
class Transform:
def __init__(self):
# Do something
pass
def __call__(self, data):
# Do something
return data
A more concrete example can be seen below. In this example we instantiate a ResampleNiftiVolumes transform which operated on ‘images’ and ‘labels’ and imposes a resolution (in millimeters) for both the Nifti images contained in data[‘images’] and data[‘labels’] of [1.0, 1.0, 1.0] millimeters. The interpolation used here is linear.
from eisen.transforms import ResampleNiftiVolumes
resample_tform = ResampleNiftiVolumes(
['image', 'label'],
[1.0, 1.0, 1.0],
'linear'
)
data = resample_tform(data)
Further documentation of the functionality of the Transforms module is reported below.
Imaging Transforms¶
Imaging transforms are used to handle 2D and 3D imaging data. These transforms implement basic data manipulation such as resampling, cropping, thresholding, normalizing, etc which are useful in the context of data pre-processing for deep learning tasks.
-
class
eisen.transforms.imaging.
CreateConstantFlags
(fields, values)[source]¶ Transform allowing to create new fields in the data dictionary containing constants of any type
from eisen.transforms import CreateConstantFlags tform = CreateConstantFlags(['my_field', 'my_text'], [42.0, 'hello']) tform = tform(data)
-
__init__
(fields, values)[source]¶ - Parameters
fields (list of str) – names of the fields of data dictionary to work on
values (list of values) – list of values to add to data
from eisen.transforms import CreateConstantFlags tform = CreateConstantFlags( fields=['my_field', 'my_text'], values=[42.0, 'hello'] )
-
-
class
eisen.transforms.imaging.
RenameFields
(fields, new_fields)[source]¶ Transform allowing to rename fields in the data dictionary
from eisen.transforms import RenameFields tform = RenameFields(['old_name1', 'old_name2'], ['new_name1', 'new_name2']) tform = tform(data)
-
__init__
(fields, new_fields)[source]¶ - Parameters
fields (list of str) – list of names of the fields of data dictionary to rename
new_fields (list of str) – new field names for the data dictionary
from eisen.transforms import RenameFields tform = RenameFields( fields=['old_name1', 'old_name2'], new_fields=['new_name1', 'new_name2'] )
-
-
class
eisen.transforms.imaging.
FilterFields
(fields)[source]¶ Transform allowing to retain in the data dictionary only a list of fields specified as init argument
from eisen.transforms import FilterFields tform = FilterFields(['field1', 'field2']) tform = tform(data)
The resulting data dictionary will only have ‘field1’ and ‘field2’ as keys.
-
class
eisen.transforms.imaging.
ResampleNiftiVolumes
(fields, resolution, interpolation='linear')[source]¶ Transform resampling nifti volumes to a new resolution (expressed in millimeters). This transform can be only applied to fields of the data dictionary containing objects of type Nifti (nibabel)
from eisen.transforms import ResampleNiftiVolumes tform = ResampleNiftiVolumes(['nifti_data'], [1.0, 1.0, 1.0], 'linear') tform = tform(data)
-
__init__
(fields, resolution, interpolation='linear')[source]¶ - Parameters
fields (list of str) – list of names of the fields of data dictionary to work on
resolution (list of float) – vector of float values expressing desired resolution in mm
interpolation (string) – interpolation strategy to use
from eisen.transforms import ResampleNiftiVolumes tform = ResampleNiftiVolumes( fields=['nifti_data'], resolution=[1.0, 1.0, 1.0], interpolation='linear' )
-
-
class
eisen.transforms.imaging.
ResampleITKVolumes
(fields, resolution, interpolation='linear')[source]¶ Transform resampling ITK volumes to a new resolution (expressed in millimeters). This transform can be only applied to fields of the data dictionary containing objects of type ITK (SimpleITK)
from eisen.transforms import ResampleITKVolumes tform = ResampleITKVolumes(['itk_data'], [1.0, 1.0, 1.0], 'linear') tform = tform(data)
-
__init__
(fields, resolution, interpolation='linear')[source]¶ - Parameters
fields (list of str) – list of names of the fields of data dictionary to work on
resolution (list of float) – vector of float values expressing desired resolution in mm
interpolation (string) – interpolation strategy to use
from eisen.transforms import ResampleITKVolumes tform = ResampleITKVolumes( fields=['itk_data'], resolution=[1.0, 1.0, 1.0], interpolation='linear' )
-
-
class
eisen.transforms.imaging.
NiftiToNumpy
(fields, multichannel=False)[source]¶ This transform allows a Nifti volume to be converted to Numpy format. It is necessary to have this transform at a certain point of every transformation chain as PyTorch uses data in Numpy format before converting it to PyTorch Tensor.
from eisen.transforms import NiftiToNumpy tform = NiftiToNumpy(['image', 'label']) tform = tform(data)
-
__init__
(fields, multichannel=False)[source]¶ - Parameters
fields (list of str) – list of names of the fields of data dictionary to convert from Nifti to Numpy
multichannel (bool) – need to set this parameter to True if data is multichannel
from eisen.transforms import NiftiToNumpy tform = NiftiToNumpy(fields=['image', 'label']) tform = tform(data)
-
-
class
eisen.transforms.imaging.
ITKToNumpy
(fields, multichannel=False)[source]¶ This transform allows a ITK volume to be converted to Numpy format. It is necessary to have this transform at a certain point of every transformation chain as PyTorch uses data in Numpy format before converting it to PyTorch Tensor.
from eisen.transforms import ITKToNumpy tform = ITKToNumpy(['image', 'label']) tform = tform(data)
-
__init__
(fields, multichannel=False)[source]¶ - Parameters
fields (list of str) – list of names of the fields of data dictionary to convert from ITK to Numpy
multichannel (bool) – need to set this parameter to True if data is multichannel
from eisen.transforms import ITKToNumpy tform = ITKToNumpy(fields=['image', 'label'], multichannel=False) tform = tform(data)
-
-
class
eisen.transforms.imaging.
PilToNumpy
(fields, multichannel=False)[source]¶ This transform allows a PIL image to be converted to Numpy format. It is necessary to have this transform at a certain point of every transformation chain as PyTorch uses data in Numpy format before converting it to PyTorch Tensor.
from eisen.transforms import PilToNumpy tform = PilToNumpy(['image', 'label']) tform = tform(data)
-
__init__
(fields, multichannel=False)[source]¶ - Parameters
fields (list of str) – list of names of the fields of data dictionary to convert from PIL to Numpy
multichannel (bool) – need to set this parameter to True if data is multichannel
from eisen.transforms import PilToNumpy tform = PilToNumpy(fields=['image', 'label']) tform = tform(data)
-
-
class
eisen.transforms.imaging.
MapValues
(fields, min_value=0, max_value=1, channelwise=False)[source]¶ Transform implementing normalization by standardizing the range of data to a known interval. The formula used here is to subtract the minimum value to each data tensor and divide by its maximum range. After that the tensor is multiplied by the max_value .
from eisen.transforms import MapValues tform = MapValues(['image'], 0, 10) tform = tform(data)
Is an usage examples where data is normalized to fit the range [0, 10].
-
__init__
(fields, min_value=0, max_value=1, channelwise=False)[source]¶ - Parameters
fields (list of str) – list of fields of the data dictionary that will be affected by this transform
min_value (float) – minimum desired data value
max_value (float) – maximum desired data value
channelwise (bool) – whether the transformation should be applied to each channel separately
from eisen.transforms import MapValues tform = MapValues( fields=['image'], min_value=0, max_value=1, channelwise=False ) tform = tform(data)
-
-
class
eisen.transforms.imaging.
ThresholdValues
(fields, threshold, direction='greater')[source]¶ This transformation threshold the values contained in a tensor. Depending on a parameter supplied by the user, all the value greater, smaller, greater/equal, smaller/equal of a certain threshold are set to 1 while the others are set to zero.
from eisen.transforms import ThresholdValues tform = ThresholdValues(['label'], 0.5, 'greater') tform = tform(data)
This example thresholds the values of the tensor stored in correspondence of the key ‘label’ such that those below 0.5 are set to zero and those above 0.5 are set to one.
-
__init__
(fields, threshold, direction='greater')[source]¶ - Parameters
fields (list of str) – list of fields of the data dictionary that will be affected by this transform
threshold (float) – threshold value for the transform
direction (string) – direction of the comparison values and the threshold possible values are: greater, smaller, greater/equal, smaller/equal
from eisen.transforms import ThresholdValues tform = ThresholdValues( fields=['image'], threshold=0, direction='greater/equal' ) tform = tform(data)
-
-
class
eisen.transforms.imaging.
AddChannelDimension
(fields)[source]¶ This transformation adds a “channel dimension” to a tensor. Since we use a representation NCHWD for our data, with channels first, this transform creates a new axis in correspondence of the first dimension of the resulting data tensor.
from eisen.transforms import AddChannelDimension tform = AddChannelDimension(['image', 'label']) tform = tform(data)
Adds a singleton dimension to the data stored in correspondence of the keys ‘image’ and ‘label’ of data dictionary.
-
class
eisen.transforms.imaging.
StackImagesChannelwise
(fields, dst_field, create_new_dim=True)[source]¶ This transform allows stacking together different tensors of the same size stored at different fields of the data dictionary. The tensors are stacked along the channel dimension. The resulting tensor is therefore multi-channel and contains data from all the fields passed as argument by the user.
from eisen.transforms import StackImagesChannelwise tform = StackImagesChannelwise(['modality1', 'modality2', 'modality3'], 'allmodalities') tform = tform(data)
This example stacks together multiple modalities in one multi-channel tensor.
-
__init__
(fields, dst_field, create_new_dim=True)[source]¶ - Parameters
fields (list of str) – list of fields of the data dictionary that will be stacked together in the output tensor
dst_field (str) – string representing the destination field of the data dictionary where outputs will be stored.
create_new_dim (bool) – whether a new dimension should be created as result of concat.
from eisen.transforms import StackImagesChannelwise tform = StackImagesChannelwise( fields=['modality1', 'modality2', 'modality3'], dst_field='allmodalities' create_new_dim=True ) tform = tform(data)
-
-
class
eisen.transforms.imaging.
CropCenteredSubVolumes
(fields, size)[source]¶ Transform implementing padding/cropping the last 3 dimension of a N-channel 3D volume. A 3D volume processed with this transform will be cropped or padded so that its final size will be corresponding to what specified by the user during instantiation.
from eisen.transforms import CropCenteredSubVolumes tform = CropCenteredSubVolumes(['image', 'label'], [128, 128, 128]) tform = tform(data)
Will crop the content of the data dictionary at keys ‘image’ and ‘label’ (which need to be N-channel+3D numpy volumes) to a size of 128 cubic pixels.
-
__init__
(fields, size)[source]¶ - Parameters
fields (list of str) – field of the data dictionary to modify and replace with cropped volumes
size (list of int) – list of 3 integers expressing the desired size of the cropped volumes
from eisen.transforms import CropCenteredSubVolumes tform = CropCenteredSubVolumes( fields=['image', 'label'], size=[128, 128, 128] ) tform = tform(data)
-
-
class
eisen.transforms.imaging.
LabelMapToOneHot
(fields, classes)[source]¶ This transformation converts labels having integer values to one-hot labels. In other words, a single channel tensor data containing integer values representing classes is converted to a corresponding multi-channel tensor data having one-hot entries channel-wise. Each channel corresponds to a class.
from eisen.transforms import LabelMapToOneHot tform = LabelMapToOneHot(['label'], [1, 2, 25, 3]) tform = tform(data)
This example converts the single channel data[‘label’] tensor to a 4-channel tensor where each entry represents the corresponding entry of the original tensor in one-hot encoding.
-
__init__
(fields, classes)[source]¶ - Parameters
fields (list of str) – list of fields of the data dictionary that will be affected by this transform
classes (list of int) – list of class identifiers (integers) to be converted to one-hot representation
from eisen.transforms import LabelMapToOneHot tform = LabelMapToOneHot( fields=['label'], classes=[1, 2, 25, 3] ) tform = tform(data)
-
-
class
eisen.transforms.imaging.
FixedMeanStdNormalization
(fields, mean, std)[source]¶ This transform operates demeaning and division by standard deviation of data tensors. The values for mean and standard deviation need to be provided by the user.
from eisen.transforms import FixedMeanStdNormalization tform = FixedMeanStdNormalization(['image'], 0.5, 1.2) tform = tform(data)
This example manipulates the data stored in data[‘images’] by removing the mean (0.5) and the std (1.2).
-
__init__
(fields, mean, std)[source]¶ - Parameters
fields (list of str) – list of fields of the data dictionary that will be affected by this transform
mean (float) – float value representing the mean. This value will be subtracted from the data
std (float) – float value representing the standard deviation. The data will be divided by this value.
from eisen.transforms import FixedMeanStdNormalization tform = FixedMeanStdNormalization( fields=['image'], mean=0.5, std=1.2 ) tform = tform(data)
-
-
class
eisen.transforms.imaging.
RepeatTensor
(fields, reps)[source]¶ This transform repeats tensors “reps” times on each axis according to user parameters
from eisen.transforms import RepeatTensor tform = RepeatTensor(['image'], (10, 1, 1)) tform = tform(data)
This example repeats the tensor 10 times along the first axis and zero times along the others
-
__init__
(fields, reps)[source]¶ - Parameters
fields (list of str) – list of fields of the data dictionary that will be affected by this transform
reps (list of int) – list of integers representing repetitions along each axis
from eisen.transforms import RepeatTensor tform = RepeatTensor( fields=['image'], reps=(10, 1, 2) ) tform = tform(data)
-
-
class
eisen.transforms.imaging.
NumpyToNifti
(fields, affine=None, data_types=None)[source]¶ This transform allows a Numpy volume to be converted to Nifti image object (nibabel). This transformation may be useful when writing Numpy array to disk in Nifti format using the WriteNiftiToFilename I/O transform. Note: the transform currently does not explicitly handle multichannel data.
from eisen.transforms import NumpyToNifti tform = NumpyToNifti(['image', 'label']) tform = tform(data)
-
__init__
(fields, affine=None, data_types=None)[source]¶ - Parameters
fields (list of str) – list of names of the fields of data dictionary to convert from Nifti to Numpy
affine (np.ndarray) – affine transformation matrix (see nibabel) specifying spacing, origin, and orientation
- Data_types
dictionary in which key is the field and value is the output data type
:type data_types dict
from eisen.transforms import NumpyToNifti tform = NumpyToNifti(fields=['image', 'label'], affine=np.eye(4), data_types=np.float32) tform = tform(data)
-
Models¶
In order to favor reproducibility of deep learning approaches and easy benchmarking, as well as providing “starter-kit” tools to users approaching a certain problem for the first time, we include several well-known neural network architecture within Eisen. This is similar to the approach taken by torchvision which ships network architectures for classification, segmentation and beyond within the package.
Models can be used in any custom code, with or without the rest of the functionality provided by Eisen. In this sense, they are standard torch.nn.Module objects.
Models can be used within Eisen workflows (see below) after wrapping them via EisenModuleWrapper which is available in the eisen.utils submodule. This also allows third party models, such as those available in torchvision, to be used within Eisen.
Segmentation Models¶
Several models for segmentation are already included in Eisen. These approaches have been successfully used in several academic works. Refer to the related publications to obtain more information.
-
class
eisen.models.segmentation.
UNet3D
(input_channels, output_channels, n_filters=16, outputs_activation='sigmoid', normalization='groupnorm')[source]¶ -
__init__
(input_channels, output_channels, n_filters=16, outputs_activation='sigmoid', normalization='groupnorm')[source]¶ - Parameters
input_channels (int) – number of input channels
output_channels (int) – number of output channels
n_filters (int) – number of filters
outputs_activation (str) – output activation type either sigmoid, softmax or none
normalization (str) – normalization either groupnorm, batchnorm or none
-
-
class
eisen.models.segmentation.
UNet
(input_channels, output_channels, n_filters=64, bilinear=False, outputs_activation='sigmoid', normalization='groupnorm')[source]¶ -
__init__
(input_channels, output_channels, n_filters=64, bilinear=False, outputs_activation='sigmoid', normalization='groupnorm')[source]¶ - Parameters
input_channels (int) – number of input channels
output_channels (int) – number of output channels
n_filters (int) – number of filters
outputs_activation (str) – output activation type either sigmoid, softmax or none
normalization (str) – normalization either groupnorm, batchnorm or none
-
-
class
eisen.models.segmentation.
VNet
(input_channels=3, output_channels=2, n_filters=16, filter_size=3, normalization='none', outputs_activation='sigmoid')[source]¶ -
__init__
(input_channels=3, output_channels=2, n_filters=16, filter_size=3, normalization='none', outputs_activation='sigmoid')[source]¶ - Parameters
input_channels (int) – number of input channels
output_channels (int) – number of output channels
n_filters (int) – number of filters
filter_size (int) – spatial size of the filters
normalization (str) – normalization either groupnorm, batchnorm, instancenorm or none
outputs_activation (str) – output activation. either sigmoid, softmax or none
-
forward
(x)[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
-
class
eisen.models.segmentation.
ObeliskMIDL
(num_labels, full_res, outputs_activation='sigmoid')[source]¶
-
class
eisen.models.segmentation.
HighRes2DNet
(input_channels, output_channels, initial_out_channels_power=4, outputs_activation='sigmoid', *args, **kwargs)[source]¶ -
__init__
(input_channels, output_channels, initial_out_channels_power=4, outputs_activation='sigmoid', *args, **kwargs)[source]¶ - Parameters
input_channels (int) – number of input channels
output_channels (int) – number of output channels
initial_out_channels_power (int) – initial output channels power
outputs_activation (str) – output activation type either sigmoid, softmax or none
-
forward
(x)¶ Computes output of the network.
- Parameters
x (torch.Tensor) – Input tensor containing images
- Returns
prediction
-
-
class
eisen.models.segmentation.
HighRes3DNet
(input_channels, output_channels, initial_out_channels_power=4, outputs_activation='sigmoid', *args, **kwargs)[source]¶ -
__init__
(input_channels, output_channels, initial_out_channels_power=4, outputs_activation='sigmoid', *args, **kwargs)[source]¶ - Parameters
input_channels (int) – number of input channels
output_channels (int) – number of output channels
initial_out_channels_power (int) – initial output channels power
outputs_activation (str) – output activation type either sigmoid, softmax or none
-
forward
(x)¶ Computes output of the network.
- Parameters
x (torch.Tensor) – Input tensor containing images
- Returns
prediction
-
Ops¶
Eisen includes various operations that are useful when developing deep learning models. The operations are always implemented in PyTorch and derived from the class torch.nn.Module as suggested by the PyTorch documentation itself. Eisen contains implementations of layers, metrics and losses. Losses and metrics implementation include methods such as the Dice loss, which find useful application especially in tasks belonging to the medical domain.
Losses¶
When optimizing neural network parameters during training, it is crucial to specify a suitable loss that pushes the network towards solving the problem at hand. Eisen includes losses that can be optimized during training and computed during validation.
-
class
eisen.ops.losses.
DiceLoss
(weight=1.0, dim=None)[source]¶ Dice loss is often used in segmentation tasks to optimize the overlap between the ground truth contour and the prediction. Dice loss is robust to class imbalance and therefore suitable to segment small foreground regions in images or volumes.
This version of the Dice loss supports multi-class segmentation (although in a naive manner).
Metrics¶
Benchmarking, testing and validating models often requires computing metrics that can give an estimate of the performance of the network on the problem at hand. Eisen includes metrics modules that can be computed during training, validation and testing.
-
class
eisen.ops.metrics.
DiceMetric
(weight=1.0, dim=None)[source]¶ The Dice coefficient is often used in segmentation tasks to evaluate the performance of algorithms by providing a scalar result expressing the amount of overlap between the ground truth contour and the prediction. The Dice coefficient is robust to class imbalance and therefore suitable to evaluate small foreground regions in images or volumes.
This version of the Dice metrics supports multi-class segmentation (although in a naive manner).
Workflows¶
Workflows realize high level functionality that joins several building blocks such as losses, metrics, transforms optimizers and models together in order to perform operations on them.
Warning
Eisen-Core versions after 0.0.5 (Eisen versions after 0.1.6) and current versions installed from GitHub repository introduce breaking changes to workflows and wrappers. Model or Data parallelism need to be taken care of before passing the model to the workflow. This documentation illustrates the most recent way of using workflows.
Workflows possess unique IDs. It is possible to retrieve this ID as workflow.id, where workflow is a Workflow instance.
Typical examples of workflows are Training, Testing and Validation. These basic workflows implement respectively the training, testing and validation loops. Workflows are implemented as python objects similar to this:
from eisen.utils.workflows import GenericWorkflow
class DemoWorkflow(GenericWorkflow):
def __init__(
self,
model,
data_loader,
losses,
optimizer,
metrics
):
# ...
In particular, the arguments of the __init__ function represent building blocks that need to be used together during the workflow.
Refer to the following documentation to learn more.
Hooks¶
Eisen hooks are triggered when specific events are generated by Eisen workflows. These hooks are python modules implementing things such as logging, model serialization and summary export, which are executed at the end of an epoch or upon detection of a model exhibiting superior performance with respect losses or metrics.
Hooks are designed to process output_dictionaries obtained from workflows. These output dictionaries are generated and aggregated across an epoch during execution of the workflow. They are ultimately sent to all the hooks listening to events originated from each specific workflow.
The supported events are:
eisen.EISEN_END_BATCH_EVENT
eisen.EISEN_END_EPOCH_EVENT
eisen.EISEN_BEST_MODEL_LOSS
eisen.EISEN_BEST_MODEL_METRIC
Hooks are only listening to events generated by workflows that they are monitoring. Workflows are identified by their unique ID. They require minimum three parameters to be instantiated:
workflow_id
phase
artifact_dir
It is possible to associate an arbitrary number of hooks to each workflow. They will be executed in sequence. An example of hook instantiation is shown here:
from eisen.utils.logging import LoggingHook
from eisen.utils.logging import TensorboardSummaryHook
# let us suppose a workflow has been already instantiated
first_hook = LoggingHook(workflow.id, 'Training', './')
second_hook = TensorboardSummaryHook(workflow.id, 'Training', './')
The two hooks in this example, first_hook and second_hook will listen for events such as EISEN_END_EPOCH_EVENT in order to execute their actions on the workflow’s output_dictionary. The typical content (keys) of this dictionary is:
losses which contains a list of the losses computed during a batch or epoch
metrics which contains a list of the metrics computed during a batch or epoch
inputs which contains a dictionary of most recent inputs fed to the network
outputs which contains a dictionary of most recent outputs produced by the network
model which contains a pointer (not a deep copy) to the current model
epoch which is an integer representing the current epoch number
Refer to the following documentation to learn more.
Logging hooks¶
These hooks are mainly used to monitor training. They are able to extract information such as losses, metrics and examples of inputs and outputs and display them to the user.
-
class
eisen.utils.logging.
LoggingHook
(workflow_id, phase, artifacts_dir)[source]¶ Logging object aiming at printing on the console the progress of model training/validation/testing. This logger uses an event based system. The training, validation and test workflows emit events such as EISEN_END_BATCH_EVENT and EISEN_END_EPOCH_EVENT which are picked up by this object and handled.
Once the user instantiates such object, the workflow corresponding to the ID passes as argument will be tracked and the results of the workflow in terms of losses and metrics will be printed on the console
from eisen.utils.logging import LoggingHook workflow = # Eg. An instance of Training workflow logger = LoggingHook(workflow.id, 'Training', '/artifacts/dir')
-
__init__
(workflow_id, phase, artifacts_dir)[source]¶ - Parameters
workflow_id (UUID) – string containing the workflow id of the workflow being monitored (workflow_instance.id)
phase (str) – string containing the name of the phase (training, testing, …) of the workflow monitored
artifacts_dir (str) – The path of the directory where the artifacts of the workflow are stored
from eisen.utils.logging import LoggingHook workflow = # Eg. An instance of Training workflow logger = LoggingHook( workflow_id=workflow.id, phase='Training', artifacts_dir='/artifacts/dir' )
-
-
class
eisen.utils.logging.
TensorboardSummaryHook
(workflow_id, phase, artifacts_dir, comparison_pairs=None, show_all_axes=False)[source]¶ Logging object allowing Tensorboard summaries to be automatically exported to the tensorboard. Much of its functionality is automated. This means that the hook will export as much information as possible to the tensorboard.
Losses, Metrics, Inputs and Outputs are all interpreted and exported according to their dimensionality. Vectors results in mean and standard deviation estimates as well as histograms; Pictures results in image summaries and histograms; etc.
There is also the possibily of comparing inputs and outputs pair. This needs to be specified during object instantiation.
Once the user instantiates this object, the workflow corresponding to the ID passes as argument will be tracked and the results of the workflow will be exported to the tensorboard.
from eisen.utils.logging import TensorboardSummaryHook workflow = # Eg. An instance of Training workflow logger = TensorboardSummaryHook(workflow.id, 'Training', '/artifacts/dir')
-
__init__
(workflow_id, phase, artifacts_dir, comparison_pairs=None, show_all_axes=False)[source]¶ This method instantiates an object of type TensorboardSummaryHook. The signature of this method is similar to that of every other hook. There is one additional parameter called comparison_pairs which is meant to hold a list of lists each containing a pair of input/output names that share the same dimensionality and can be compared to each other.
A typical use of comparison_pairs is when users want to plot a pr_curve or a confusion matrix by comparing some input with some output. Eg. by comparing the labels with the predictions.
from eisen.utils.logging import TensorboardSummaryHook workflow = # Eg. An instance of Training workflow logger = TensorboardSummaryHook( workflow_id=workflow.id, phase='Training', artifacts_dir='/artifacts/dir' comparison_pairs=[['labels', 'predictions']] )
- Parameters
workflow_id (UUID) – string containing the workflow id of the workflow being monitored (workflow_instance.id)
phase (str) – string containing the name of the phase (training, testing, …) of the workflow monitored
artifacts_dir (bool) – whether the history of all models that were at a certain point the best should be saved
comparison_pairs (list of lists of strings) – list of lists of pairs, which are names of inputs and outputs to be compared directly
show_all_axes (bool) – whether any volumetric data should be shown as axial + sagittal + coronal
-
Artifacts generation hooks¶
These hooks are used to save artifacts as a result of a workflow. They save models snapshots in different formats. It is necessary to note that the snapshot are saved as unwrapped torch.nn.Module (they do not belong to EisenModuleWrapper class).
-
class
eisen.utils.artifacts.
SaveTorchModelHook
(workflow_id, phase, artifacts_dir, select_best_loss=True, save_history=False)[source]¶ Saves a Torch model snapshot of the current best model. The best model can be selected based using the best average loss or the best average metric. It is possible to save the whole history of best models seen throughout the workflow.
from eisen.utils.artifacts import SaveTorchModelHook workflow = # Eg. An instance of Validation workflow saver = SaveTorchModelHook(workflow.id, 'Validation', '/my/artifacts')
-
__init__
(workflow_id, phase, artifacts_dir, select_best_loss=True, save_history=False)[source]¶ - Parameters
workflow_id (UUID) – the ID of the workflow that should be tracked by this hook
phase (str) – the phase where this hook is being used (Training, Testing, etc.)
artifacts_dir (bool) – the path of the artifacts where the results of this hook should be stored
select_best_loss (bool) – whether the criterion for saving the model should be best loss or best metric
artifacts_dir – whether the history of all models that were at a certain point the best should be saved
from eisen.utils.artifacts import SaveTorchModel workflow = # Eg. An instance of Validation workflow saver = SaveTorchModel( workflow_id=workflow.id, phase='Validation', artifacts_dir='/my/artifacts', select_best_loss=True, save_history=False )
-
-
class
eisen.utils.artifacts.
SaveONNXModelHook
(workflow_id, phase, artifacts_dir, input_size, select_best_loss=True, save_history=False)[source]¶ Saves a ONNX model snapshot of the current best model. The best model can be selected based using the best average loss or the best average metric. It is possible to save the whole history of best models seen throughout the workflow.
from eisen.utils.artifacts import SaveONNXModelHook workflow = # Eg. An instance of Validation workflow saver = SaveONNXModelHook(workflow.id, 'Validation', '/my/artifacts', [1, 3, 224, 224])
-
__init__
(workflow_id, phase, artifacts_dir, input_size, select_best_loss=True, save_history=False)[source]¶ - Parameters
workflow_id (UUID) – the ID of the workflow that should be tracked by this hook
phase (str) – the phase where this hook is being used (Training, Testing, etc.)
artifacts_dir (bool) – the path of the artifacts where the results of this hook should be stored
input_size (list of int) – a list of integers expressing the input size that the saved model will process
select_best_loss (bool) – whether the criterion for saving the model should be best loss or best metric
artifacts_dir – whether the history of all models that were at a certain point the best should be saved
from eisen.utils.artifacts import SaveONNXModelHook workflow = # Eg. An instance of Validation workflow saver = SaveONNXModelHook( workflow_id=workflow.id, phase='Validation', artifacts_dir='/my/artifacts', input_size=[1, 3, 224, 224], select_best_loss=True, save_history=False )
-
Artifacts¶
Artifacts can be generated without using hooks. Objects of type torch.nn.Module can be serialized in different formats using functionality provided by Eisen. Naturally, they behave just like any other torch.nn.Module therefore can be saved also using other methods. This is true for any model used into Eisen, even when no workflow has been used for training, testing and validation.
It is suggested NOT to use wrapped modules during serialization. That is, serializing a EisenModuleWrapper object, even if such object is derived from torch.nn.Module can result in issues depending on the chosen type of serialization.
Refer to the following documentation to learn more.
-
class
eisen.utils.artifacts.
SaveTorchModel
(artifacts_dir)[source]¶ This object implements model saving for pytorch models. Once instantiated with a parameter consisting of a string representing the path of the directory where the model shall be saved, it can be called on a model in order to save it.
No information about optimizer and training is saved in the process.
from eisen.utils.artifacts import SaveTorchModel my_model = # Eg. A torch.nn.Module instance saver = SaveTorchModel('/my/artifacts') saver(my_model)
-
__call__
(model, filename='model.pt')[source]¶ Saves a model passed as argument. The model will be saved in Torch (statedict) format.
- Parameters
model (torch.nn.Module) – Model to be saved (refrain from using wrapped modules, see EisenModuleWrapper)
filename (str) – The filename that shall be used to save the model
- Returns
None
-
-
class
eisen.utils.artifacts.
SaveONNXModel
(artifacts_dir, input_size)[source]¶ This object exports a torch.nn.Module in ONNX format. The user is asked to supply two parameters for initialization. The first parameter is the artifact directory, a string representing the path where the model is supposed to be stored after serialization. The second parameter is the input size, a list of integers containing the size of the inputs to be processed by the network.
from eisen.utils.artifacts import SaveONNXModel my_model = # Eg. A torch.nn.Module instance saver = SaveONNXModel('/my/artifacts', [1, 1, 224, 224]) saver(my_model)
-
__call__
(model, filename='model.onnx')[source]¶ Saves a model passed as argument. The model will be saved in ONNX format.
- Parameters
model (torch.nn.Module) – Model to be saved (refrain from using wrapped modules, see EisenModuleWrapper)
filename (str) – The filename that shall be used to save the model
- Returns
None
-
Wrappers¶
Many packages in the PyTorch echosystem such as torchvision, are not fully compatible with Eisen. Eisen makes heavy use of dictionaries throught most of the objects in the module eisen.utils and beyond.
Dataset entries are represented as dictionaries, batches are also dictionaries, input and output of modules (models, losses, metrics) are also expected to be dictionaries.
This architecture is unfortunately not universally adopted, and since it is the key of the flexibility of Eisen, adaptors have been developed in order to make functionality inherited from other packages fully compatible with Eisen with no significant impact on performance.
Our wrappers are adaptors that perform simple translation of input and outputs variables from and to the specific format expected by Eisen. We include below the documentation of our wrappers with usage examples.
Warning
Eisen-Core versions after 0.0.5 (Eisen versions after 0.1.6) and current versions installed from GitHub repository introduce breaking changes to workflows and wrappers. Wrappers require an instance of a Module, Transform or Dataset rather than a Module, Transform or Dataset type. This documentation illustrates the most recent way of using wrappers.
-
class
eisen.utils.
EisenModuleWrapper
(module, input_names, output_names)[source]¶ This object implements a wrapper allowing standard PyTorch Modules (Eg. those implemented in torchvision) to be used within Eisen.
Modules in Eisen accept positional and named arguments in the forward() method. They return values or a tuple of values.
Eisen workflows make use of dictionaries. That is, data batches are represented as dictionaries and directly fed into modules using the **kwargs mechanism provided by Python.
This wrapper causes standard Modules to behave as prescribed by Eisen. Wrapped modules accept as input a dictionary of keyword arguments with arbitrary (user defined) keys. They return as output a dictionary of keyword values with arbitrary (user defined) keys.
# We import the Module we want to wrap. In this case we import from torchvision from torchvision.models import resnet18 # We can then instantiate an object of class EisenModuleWrapper and instantiate the Module we want to # wrap as well as the fields of the data dictionary that will interpreted as input, and the fields # that we desire the output to be stored at. Additional arguments for the Module itself can # be passed as named arguments. module = resnet18(pretrained=False) adapted_module = EisenModuleWrapper(module, ['image'], ['prediction'])
-
__init__
(module, input_names, output_names)[source]¶ - Parameters
module (torch.nn.Module) – This is a Module instance
input_names (list of str) – list of names for positional arguments of module. Must match field names in data batches
output_names (list of str) – list of names for the outputs of the module
-
-
class
eisen.utils.
EisenTransformWrapper
(transform, fields)[source]¶ This object implements a wrapper allowing standard PyTorch Transform (Eg. those implemented in torchvision) to be used within Eisen.
Transforms in Eisen operate on dictionaries. They are in fact always called on a dictionary containing multiple keys that store data.
This wrapper causes standard Transforms to behave as prescribed by Eisen.
# We import the transform we want to wrap. In this case we import from torchvision from torchvision.transforms import CenterCrop # We can then instantiate an object of class EisenTransformWrapper and specify the Transformation we want to # wrap as well as the field of the data dictionary that should be affected by such Transformation. # Additional arguments for the Transformation itself can be passed as named arguments. transform = CenterCrop((224, 224)) adapted_transform = EisenTransformWrapper(transform, ['image'])
-
class
eisen.utils.
EisenDatasetWrapper
(dataset, field_names, transform=None)[source]¶ This object implements a wrapper allowing standard PyTorch Datasets (Eg. those implemented in torchvision) to be used within Eisen.
Datasets in Eisen return items that are always dictionaries. Each key of the dictionary contains information from the dataset.
This wrapper causes standard Datasets to behave as prescribed by Eisen.
# We import the dataset we want to wrap. In this case we import from torchvision from torchvision.datasets import MNIST # We can then instantiate an object of class EisenDatasetWrapper and specify the Dataset we want to # wrap as well as the fields of the data dictionary that will be returned by the adapted __getitem__ method. # Additional arguments for the Dataset itself can be passed as named arguments. dataset = MNIST('./', download=True) adapted_dataset = EisenDatasetWrapper(dataset, ['image', 'label'])
Other utilities¶
Eisen contains other utility functions and objects that can be be used to further improve functionality and often independently from Eisen itself. Therefore we report the documentation for these modules.
-
class
eisen.utils.
PipelineExecutionStreamer
(operations_sequence, split_size)[source]¶ This execution streamer takes a sequence of operations (torch.nn.Module) and executes them in a pipeline. Clearly this is only useful when each operation is executed on a different device. In this way, the execution can be asynchronously kicked off on each device separately, therefore maximizing the GPU usage. More details about this idea can be found here: https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html#speed-up-by-pipelining-inputs
-
class
eisen.utils.
ModelParallel
(module, split_size, device_ids=None, output_device=None)[source]¶ This object implements model parallelism for PyTorch models. Model parallelism refers to the practice of using multiple GPUs for training by splitting layers across different GPUs. In this way huge models can be stored and trained. This module offers pipelined execution for model parallelism as shown in the PyTorch documentation: https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html#speed-up-by-pipelining-inputs
Additionally, this module works in a completely automatic manner and it behaves similarly to torch.nn.DataParallel. The interface implemented here will look familiar to anyone using torch.nn.DataParallel
Warning
Only single input models can be parallelized via the current version of ModelParallel implemented here. Most models such as Resnet, VNet, Unet etc have a single input (for example a batch of images) therefore we trust that most use cases are covered by the current implementation.
from eisen.utils import ModelParallel from eisen.models.segmentation import UNet # Transforming a model instance in a model parallel model instance model = ModelParallel(UNet(input_channels=1, output_channels=1), split_size=2) # model is ModelParallel and will execute on multiple GPUs
-
__init__
(module, split_size, device_ids=None, output_device=None)[source]¶ This method instantiates a ModelParallel Module from a module instance passed by the user. The model must have a single input (forward(x) type of signature for the forward method) otherwise an error is returned.
An example is here:
from eisen.utils import ModelParallel from eisen.models.segmentation import UNet model = ModelParallel( module=UNet(input_channels=1, output_channels=1), split_size=2, device_ids=[0, 1, 2, 3], output_device=0 )
- Parameters
module (torch.nn.Module) – an instance of the model that should be parallelized
split_size (int) – split size for pipelined execution
device_ids (list) – list of int or torch devices indicating GPUs to use
output_device (int or torch device) – int or torch device indicating output devices
-