afnio.utils.data#

class afnio.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, drop_last=False, seed=None)[source]#

Bases: Generic[T_co]

Data loader combines a dataset and a sampler, and provides an iterable over the given dataset.

The DataLoader supports both map-style and iterable-style datasets with single-process loading, customizing loading order and optional automatic batching (collation) and memory pinning.

See afnio.utils.data documentation page for more details.

Parameters:
  • dataset (Dataset) – dataset from which to load the data.

  • batch_size (int, optional) – how many samples per batch to load (default: 1).

  • shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).

  • sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.

  • drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)

  • seed (int, optional) – If not None, this seed will be used by RandomSampler to generate random indexes. (default: None)

batch_size: Optional[int]#
dataset: Dataset[TypeVar(T_co, covariant=True)]#
drop_last: bool#
sampler: Union[Sampler, Iterable]#
class afnio.utils.data.Dataset[source]#

Bases: Generic[T_co]

An abstract class representing a Dataset.

All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite __getitem__(), supporting fetching a data sample for a given key and __len__(), which is expected to return the size of the dataset by the default options of DataLoader. Subclasses could also optionally implement __getitems__(), for speedup batched samples loading. This method accepts list of indices of samples of batch and returns list of samples.

class afnio.utils.data.RandomSampler(data_source, replacement=False, num_samples=None, seed=None)[source]#

Bases: Sampler[int]

Samples elements randomly. If without replacement, then sample from a shuffled dataset.

If with replacement, then user can specify num_samples to draw.

Parameters:
  • data_source (Dataset) – dataset to sample from

  • replacement (bool) – samples are drawn on-demand with replacement if True, default=``False``

  • num_samples (int) – number of samples to draw, default=`len(dataset)`.

  • seed (int) – A number to set the seed for the random draws.

data_source: Sized#
property num_samples: int#
replacement: bool#
class afnio.utils.data.Sampler[source]#

Bases: Generic[T_co]

Base class for all Samplers.

Every Sampler subclass has to provide an __iter__() method, providing a way to iterate over indices or lists of indices (batches) of dataset elements, and may provide a __len__() method that returns the length of the returned iterators.

class afnio.utils.data.SequentialSampler(data_source)[source]#

Bases: Sampler[int]

Samples elements sequentially, always in the same order.

Parameters:

data_source (Dataset) – dataset to sample from

data_source: Sized#
class afnio.utils.data.WeightedRandomSampler(weights, num_samples, replacement=True, seed=None)[source]#

Bases: Sampler[int]

Samples elements from [0,..,len(weights)-1] with given probabilities (weights).

Parameters:
  • weights (sequence) – a sequence of weights, not necessary summing up to one

  • num_samples (int) – number of samples to draw

  • replacement (bool) – if True, samples are drawn with replacement. If not, they are drawn without replacement, which means that when a sample index is drawn for a row, it cannot be drawn again for that row.

  • seed (int) – A number to set the seed for the random draws.

Example

>>> list(WeightedRandomSampler([0.1, 0.9, 0.4, 0.7, 3.0, 0.6], 5, replacement=True))
[4, 4, 1, 4, 5]
>>> list(WeightedRandomSampler([0.9, 0.4, 0.05, 0.2, 0.3, 0.1], 5, replacement=False))
[0, 1, 4, 3, 2]
num_samples: int#
replacement: bool#
weights: Sequence[float]#

Modules