DataHandler
Table of Contents
- datahandler.data_feature_distribution.gaussian
- GaussianNoiseTransform
- datahandler.data_feature_distribution
- datahandler.data_feature_distribution.data_feature_distribution
- DataFeatureDistribution
- datahandler.datahandler
- DataHandler
- datahandler.data_label_distribution.uniform
- Uniform
- datahandler.data_label_distribution.data_label_distribution
- DataLabelDistribution
- datahandler.data_label_distribution
- datahandler.data_label_distribution.discrete
- Discrete
- datahandler.data_label_distribution.dirichlet
- Dirichlet
- datahandler.mnist
- MNISTDataHandler
- datahandler.cifar10
- Cifar10DataHandler
- datahandler.data_quantity_distribution.uniform
- Uniform
- datahandler.data_quantity_distribution.data_quantity_distribution
- DataQuantityDistribution
- datahandler.data_quantity_distribution
- datahandler.data_quantity_distribution.dirichlet
- Dirichlet
datahandler.data_feature_distribution.gaussian
GaussianNoiseTransform Objects
Add Gaussian noise to a tensor
datahandler.data_feature_distribution
This module contains methods of skewing data features
datahandler.data_feature_distribution.data_feature_distribution
DataFeatureDistribution is an abstract class that defines the interface for any implemented data feature distributions
DataFeatureDistribution Objects
DataFeatureDistribution is an abstract class that defines the interface for any implemented data feature distributions
apply_feature_skew
Applies the feature skew to the data
datahandler.datahandler
This contains the abstract data handler that defines the interface for any implemented data handlers and provides some universal methods
DataHandler Objects
DataHandler is an abstract class that defines the interface for any implemented data handlers
load_distributed_datasets
Called to load the dataset
get_classes
Returns the classes of the dataset
split_and_transform_data
Split the data into partitions and create DataLoaders
Arguments:
testset: test datasettrainset: training dataset
Returns:
testloader, trainloaders, valloaders
distribute_data
Distribute the data according to the label distribution and partition sizes
Arguments:
label_distribution: np.array of shape (NUM_CLIENTS, NUM_CLASSES)partition_sizes: np.array of shape (NUM_CLIENTS)trainset: torch.utils.data.Dataset
Returns:
list of torch.utils.data.Subset
load_existing_distribution
Load an existing data distribution from a file
Arguments:
trainset: torch.utils.data.Dataset
Returns:
List of torch.utils.data.Subset
generate_transforms
Generate the transforms for the dataset
Custom transforms are applied after a tensor was created and before normalization and feature skewing
Arguments:
custom_transforms: List of custom transforms
Returns:
Composed transforms
datahandler.data_label_distribution.uniform
Uniform distribution of labels
Uniform Objects
Uniform distribution of labels
get_label_distribution
Returns the label distribution as an array of dimension (no_clients, no_classes)
Uses uniform distribution to (not-)skew the data label distribution
Returns:
label_distribution
datahandler.data_label_distribution.data_label_distribution
DataLabelDistribution is an abstract class that defines the interface for any implemented data label distributions
DataLabelDistribution Objects
DataLabelDistribution is an abstract class that defines the interface for any implemented data label distributions
get_label_distribution
Returns the label distribution as an array of dimension (no_clients, no_classes)
datahandler.data_label_distribution
This module contains methods of skewing data labels
datahandler.data_label_distribution.discrete
Discrete data label distribution
Discrete Objects
Discrete data label distribution
get_label_distribution
Returns the label distribution as an array of dimension no_clients, no_classes
Allows each client to have only a subset of the classes
Returns:
label_distribution
datahandler.data_label_distribution.dirichlet
Dirichlet distribution for data label distribution
Dirichlet Objects
Dirichlet distribution for data label distribution
get_label_distribution
Returns the label distribution as an array of dimension (no_clients, no_classes)
Uses a dirichlet distribution to skew the data label distribution
Returns:
label_distribution
datahandler.mnist
MNIST data handler LeCun, Yann, Corinna Cortes, and C. J. Burges. n.d. “MNIST Handwritten Digit Database.” ATT Labs [Online]. Available: Http://yann. Lecun. Com/exdb/mnist.
MNISTDataHandler Objects
load_distributed_datasets
Load the MNIST dataset and divide it into partitions
get_classes
Returns the classes of the dataset
Returns:
List of classes
datahandler.cifar10
CIFAR-10 data handler He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Deep Residual Learning for Image Recognition.” arXiv [cs.CV]. arXiv. http://arxiv.org/abs/1512.03385.
Cifar10DataHandler Objects
Data handler for CIFAR-10
load_distributed_datasets
Load the CIFAR-10 dataset and divide it into partitions
Returns:
Train, validation and test data loaders
get_classes
Get the classes of the CIFAR-10 dataset
Returns:
List of classes
datahandler.data_quantity_distribution.uniform
Uniform data quantity distribution
Uniform Objects
Uniform data quantity distribution
get_partition_sizes
Returns the partition sizes as an array of dimension (no_clients)
Uses a uniform distribution to (not-)skew the data quantities
Arguments:
testset: test datasettrainset: train dataset
datahandler.data_quantity_distribution.data_quantity_distribution
This class contains the abstract class DataQuantityDistribution which is used to for all implemented data quantity distributions
DataQuantityDistribution Objects
DataQuantityDistribution is an abstract class that defines the interface for any implemented data quantity distributions
get_partition_sizes
Returns the number of samples to be allocated to every client
Arguments:
testset: test datasettrainset: training dataset
datahandler.data_quantity_distribution
This module contains the classes for skewing data quantity distributions
datahandler.data_quantity_distribution.dirichlet
Dirichlet distribution for data quantity distribution
Dirichlet Objects
get_partition_sizes
Returns the number of samples to be allocated to every client
Arguments:
testset: test datasettrainset: training dataset
Returns:
Array of size (no_clients) containing the number of samples for every client