Skip to content

DataHandler

Table of Contents

datahandler.data_feature_distribution.gaussian

GaussianNoiseTransform Objects

class GaussianNoiseTransform(object)

Add Gaussian noise to a tensor

datahandler.data_feature_distribution

This module contains methods of skewing data features

datahandler.data_feature_distribution.data_feature_distribution

DataFeatureDistribution is an abstract class that defines the interface for any implemented data feature distributions

DataFeatureDistribution Objects

class DataFeatureDistribution(ABC)

DataFeatureDistribution is an abstract class that defines the interface for any implemented data feature distributions

apply_feature_skew

def apply_feature_skew(datahandler)

Applies the feature skew to the data

datahandler.datahandler

This contains the abstract data handler that defines the interface for any implemented data handlers and provides some universal methods

DataHandler Objects

class DataHandler(ABC)

DataHandler is an abstract class that defines the interface for any implemented data handlers

load_distributed_datasets

@abstractmethod
def load_distributed_datasets()

Called to load the dataset

get_classes

@abstractmethod
def get_classes()

Returns the classes of the dataset

split_and_transform_data

def split_and_transform_data(testset, trainset)

Split the data into partitions and create DataLoaders

Arguments:

  • testset: test dataset
  • trainset: training dataset

Returns:

testloader, trainloaders, valloaders

distribute_data

def distribute_data(label_distribution, partition_sizes, trainset)

Distribute the data according to the label distribution and partition sizes

Arguments:

  • label_distribution: np.array of shape (NUM_CLIENTS, NUM_CLASSES)
  • partition_sizes: np.array of shape (NUM_CLIENTS)
  • trainset: torch.utils.data.Dataset

Returns:

list of torch.utils.data.Subset

load_existing_distribution

def load_existing_distribution(trainset)

Load an existing data distribution from a file

Arguments:

  • trainset: torch.utils.data.Dataset

Returns:

List of torch.utils.data.Subset

generate_transforms

def generate_transforms(custom_transforms=None)

Generate the transforms for the dataset

Custom transforms are applied after a tensor was created and before normalization and feature skewing

Arguments:

  • custom_transforms: List of custom transforms

Returns:

Composed transforms

datahandler.data_label_distribution.uniform

Uniform distribution of labels

Uniform Objects

class Uniform(DataLabelDistribution)

Uniform distribution of labels

get_label_distribution

def get_label_distribution()

Returns the label distribution as an array of dimension (no_clients, no_classes)

Uses uniform distribution to (not-)skew the data label distribution

Returns:

label_distribution

datahandler.data_label_distribution.data_label_distribution

DataLabelDistribution is an abstract class that defines the interface for any implemented data label distributions

DataLabelDistribution Objects

class DataLabelDistribution(ABC)

DataLabelDistribution is an abstract class that defines the interface for any implemented data label distributions

get_label_distribution

def get_label_distribution()

Returns the label distribution as an array of dimension (no_clients, no_classes)

datahandler.data_label_distribution

This module contains methods of skewing data labels

datahandler.data_label_distribution.discrete

Discrete data label distribution

Discrete Objects

class Discrete(DataLabelDistribution)

Discrete data label distribution

get_label_distribution

def get_label_distribution()

Returns the label distribution as an array of dimension no_clients, no_classes

Allows each client to have only a subset of the classes

Returns:

label_distribution

datahandler.data_label_distribution.dirichlet

Dirichlet distribution for data label distribution

Dirichlet Objects

class Dirichlet(DataLabelDistribution)

Dirichlet distribution for data label distribution

get_label_distribution

def get_label_distribution()

Returns the label distribution as an array of dimension (no_clients, no_classes)

Uses a dirichlet distribution to skew the data label distribution

Returns:

label_distribution

datahandler.mnist

MNIST data handler LeCun, Yann, Corinna Cortes, and C. J. Burges. n.d. “MNIST Handwritten Digit Database.” ATT Labs [Online]. Available: Http://yann. Lecun. Com/exdb/mnist.

MNISTDataHandler Objects

class MNISTDataHandler(DataHandler)

load_distributed_datasets

def load_distributed_datasets()

Load the MNIST dataset and divide it into partitions

get_classes

def get_classes()

Returns the classes of the dataset

Returns:

List of classes

datahandler.cifar10

CIFAR-10 data handler He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Deep Residual Learning for Image Recognition.” arXiv [cs.CV]. arXiv. http://arxiv.org/abs/1512.03385.

Cifar10DataHandler Objects

class Cifar10DataHandler(DataHandler)

Data handler for CIFAR-10

load_distributed_datasets

def load_distributed_datasets()

Load the CIFAR-10 dataset and divide it into partitions

Returns:

Train, validation and test data loaders

get_classes

def get_classes()

Get the classes of the CIFAR-10 dataset

Returns:

List of classes

datahandler.data_quantity_distribution.uniform

Uniform data quantity distribution

Uniform Objects

class Uniform(DataQuantityDistribution)

Uniform data quantity distribution

get_partition_sizes

def get_partition_sizes(testset, trainset)

Returns the partition sizes as an array of dimension (no_clients)

Uses a uniform distribution to (not-)skew the data quantities

Arguments:

  • testset: test dataset
  • trainset: train dataset

datahandler.data_quantity_distribution.data_quantity_distribution

This class contains the abstract class DataQuantityDistribution which is used to for all implemented data quantity distributions

DataQuantityDistribution Objects

class DataQuantityDistribution(ABC)

DataQuantityDistribution is an abstract class that defines the interface for any implemented data quantity distributions

get_partition_sizes

def get_partition_sizes(testset, trainset)

Returns the number of samples to be allocated to every client

Arguments:

  • testset: test dataset
  • trainset: training dataset

datahandler.data_quantity_distribution

This module contains the classes for skewing data quantity distributions

datahandler.data_quantity_distribution.dirichlet

Dirichlet distribution for data quantity distribution

Dirichlet Objects

class Dirichlet(DataQuantityDistribution)

get_partition_sizes

def get_partition_sizes(testset, trainset)

Returns the number of samples to be allocated to every client

Arguments:

  • testset: test dataset
  • trainset: training dataset

Returns:

Array of size (no_clients) containing the number of samples for every client