• I'd like to thank both Thomas Capelle (https://github.com/tcapelle) and Xander Dunn (https://github.com/xanderdunn) for their contributions to make this code possible.
  • This functionality allows you to use multiple numpy arrays instead of a single one, which may be very useful in many practical settings. I've tested it with 10k+ datasets and it works well.

class TSMetaDataset[source]

TSMetaDataset(dataset_list, **kwargs)

A dataset capable of indexing mutiple datasets at the same time!

class TSMetaDatasets[source]

TSMetaDatasets(metadataset, splits) :: FilteredBase

Base class for lists with subsets

Let's create 3 datasets. In this case they will have different sizes.

dsets = []
for i in range(3):
    size = np.random.randint(50, 150)
    X = torch.rand(size, 5, 50)
    y = torch.randint(0, 10, (size,))
    tfms = [None, TSClassification()]
    dset = TSDatasets(X, y, tfms=tfms)
    dsets.append(dset)
dsets
[(#51) [(TSTensor(vars:5, len:50), TensorCategory(39)),(TSTensor(vars:5, len:50), TensorCategory(45)),(TSTensor(vars:5, len:50), TensorCategory(14)),(TSTensor(vars:5, len:50), TensorCategory(6)),(TSTensor(vars:5, len:50), TensorCategory(40)),(TSTensor(vars:5, len:50), TensorCategory(11)),(TSTensor(vars:5, len:50), TensorCategory(41)),(TSTensor(vars:5, len:50), TensorCategory(7)),(TSTensor(vars:5, len:50), TensorCategory(48)),(TSTensor(vars:5, len:50), TensorCategory(19))...],
 (#89) [(TSTensor(vars:5, len:50), TensorCategory(48)),(TSTensor(vars:5, len:50), TensorCategory(81)),(TSTensor(vars:5, len:50), TensorCategory(32)),(TSTensor(vars:5, len:50), TensorCategory(74)),(TSTensor(vars:5, len:50), TensorCategory(75)),(TSTensor(vars:5, len:50), TensorCategory(82)),(TSTensor(vars:5, len:50), TensorCategory(49)),(TSTensor(vars:5, len:50), TensorCategory(21)),(TSTensor(vars:5, len:50), TensorCategory(0)),(TSTensor(vars:5, len:50), TensorCategory(22))...],
 (#116) [(TSTensor(vars:5, len:50), TensorCategory(106)),(TSTensor(vars:5, len:50), TensorCategory(96)),(TSTensor(vars:5, len:50), TensorCategory(27)),(TSTensor(vars:5, len:50), TensorCategory(37)),(TSTensor(vars:5, len:50), TensorCategory(48)),(TSTensor(vars:5, len:50), TensorCategory(0)),(TSTensor(vars:5, len:50), TensorCategory(28)),(TSTensor(vars:5, len:50), TensorCategory(107)),(TSTensor(vars:5, len:50), TensorCategory(11)),(TSTensor(vars:5, len:50), TensorCategory(60))...]]
metadataset = TSMetaDataset(dsets)
metadataset, metadataset.vars, metadataset.len
(<__main__.TSMetaDataset at 0x7fdc46e1b290>, 5, 50)

We'll apply splits now to create train and valid metadatasets:

splits = TimeSplitter()(metadataset)
splits
((#205) [0,1,2,3,4,5,6,7,8,9...],
 (#51) [205,206,207,208,209,210,211,212,213,214...])
metadatasets = TSMetaDatasets(metadataset, splits=splits)
metadatasets.train, metadatasets.valid
(<__main__.TSMetaDataset at 0x7fdc46d14110>,
 <__main__.TSMetaDataset at 0x7fdc46d14c90>)
dls = TSDataLoaders.from_dsets(metadatasets.train, metadatasets.valid)
xb, yb = first(dls.train)
xb, yb
(TSTensor(samples:64, vars:5, len:50),
 TensorCategory([  9,  25,  45,  19,  40,  39,  23,  48,  44,   8,  47,  14,  41,  21,
           6,   3,  22,  54,  75,  21,  42,  64,   0,  44,  36,  33,  85,  84,
           7,   5,  59,  28,   6,  58,  76,  24,   8,  22,  20,  10,  53,  29,
          87,  32,  15,  69,  66,  64, 100,  62,   1,  31,  13,  37,  60,  78,
          52,  49,  79, 110,  81,  41, 108,  80]))

There also en easy way to map any particular sample in a batch to the original dataset and id:

dls = TSDataLoaders.from_dsets(metadatasets.train, metadatasets.valid)
xb, yb = first(dls.train)
mappings = dls.train.dataset.mapping_idxs
for i, (xbi, ybi) in enumerate(zip(xb, yb)):
    ds, idx = mappings[i]
    test_close(dsets[ds][idx][0].data, xbi)
    test_close(dsets[ds][idx][1].data, ybi)

For example the 3rd sample in this batch would be:

dls.train.dataset.mapping_idxs[2]
array([ 0, 26], dtype=int32)