- I'd like to thank both Thomas Capelle (https://github.com/tcapelle) and Xander Dunn (https://github.com/xanderdunn) for their contributions to make this code possible.
- This functionality allows you to use multiple numpy arrays instead of a single one, which may be very useful in many practical settings. I've tested it with 10k+ datasets and it works well.
Let's create 3 datasets. In this case they will have different sizes.
dsets = []
for i in range(3):
size = np.random.randint(50, 150)
X = torch.rand(size, 5, 50)
y = torch.randint(0, 10, (size,))
tfms = [None, TSClassification()]
dset = TSDatasets(X, y, tfms=tfms)
dsets.append(dset)
dsets
metadataset = TSMetaDataset(dsets)
metadataset, metadataset.vars, metadataset.len
We'll apply splits now to create train and valid metadatasets:
splits = TimeSplitter()(metadataset)
splits
metadatasets = TSMetaDatasets(metadataset, splits=splits)
metadatasets.train, metadatasets.valid
dls = TSDataLoaders.from_dsets(metadatasets.train, metadatasets.valid)
xb, yb = first(dls.train)
xb, yb
There also en easy way to map any particular sample in a batch to the original dataset and id:
dls = TSDataLoaders.from_dsets(metadatasets.train, metadatasets.valid)
xb, yb = first(dls.train)
mappings = dls.train.dataset.mapping_idxs
for i, (xbi, ybi) in enumerate(zip(xb, yb)):
ds, idx = mappings[i]
test_close(dsets[ds][idx][0].data, xbi)
test_close(dsets[ds][idx][1].data, ybi)
For example the 3rd sample in this batch would be:
dls.train.dataset.mapping_idxs[2]