Corefunctions

Corefunctionality for data preparation of sequential data for pytorch, fastai models

Application Structure

The data will be extracted and prepared via transforms. Those are grouped in: - Type Transforms: Those extraxt the needed components from the source items, like input sequences or target scalar values. The work on single tensors. - Item Transforms: Those Transforms may work on tuple level and therefore may process relationships between input and output. - Batch Transform: Those transforms work on batch level. They receive batched tensors and may apply lazy transforms like normalization very effeciently.

An application example may look like the following: - sourceitems: - path extraction with hdf5 file endings - create pandas dataframe with information for type transforms, like slices - filter items in pandas dataframe - type transforms: - extract hdf5 input and output sequence - create windows - item transforms: - filter sequence by value - shift output sequence by 1 element - batch transforms: - noise injection - normalization


source

obj_in_lst

 obj_in_lst (lst, cls)

retrieve first object of type cls from a list


source

count_parameters

 count_parameters (model)

retrieve number of trainable parameters of a model

1. Extract Source Items

The file paths may be extracted with get_files of fastai2. get_hdf_files removes the need of writing the hdf5 file extension.

Then a pandas dataframe may be created in case further information for the source items need to be stored like slices for the windowing function.

1.1 Extract File Paths

from nbdev.config import get_config
project_root = get_config().config_file.parent
f_path = project_root / 'test_data/WienerHammerstein'
hdf_files = get_files(f_path,extensions='.hdf5',recurse=True)
len(hdf_files),hdf_files[0]
(3,
 Path('/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5'))

source

get_hdf_files

 get_hdf_files (path, recurse=True, folders=None)

Get hdf5 files in path recursively, only in folders, if specified.

hdf_files = get_hdf_files(f_path)
len(hdf_files),hdf_files[0]
(3,
 Path('/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5'))

1.2 Create Source Dictionaries

In order to extract mulitple realizations of one file with different modifications, we create a list of properties. Pandas Dataframes are to slow for iteration but very fast and convenient for creations. So after creation of the pandas Dataframe we convert it to a list of dictionaries.


source

apply_df_tfms

 apply_df_tfms (src, pd_tfms=None)

Create Pandas Dataframe out of a list of items, with a list of df transforms applied

df = apply_df_tfms(hdf_files)
df.head()
path
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5
test_eq(apply_df_tfms(hdf_files),apply_df_tfms(apply_df_tfms(hdf_files)))

source

CreateDict

 CreateDict (pd_tfms=None)

Create List of Dictionarys out of a list of items, with a list of df transforms applied

l_dict =CreateDict()(hdf_files)
l_dict
[{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5'},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5'},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5'}]

source

ValidClmContains

 ValidClmContains (lst_valid)

add validation column using a list of strings that are part of the validation frames

lst_valid = ['valid']
CreateDict([ValidClmContains(lst_valid)])(hdf_files)
CPU times: user 432 μs, sys: 4 μs, total: 436 μs
Wall time: 449 μs
[{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
  'valid': True},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5',
  'valid': False},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5',
  'valid': False}]

source

ValidClmIs

 ValidClmIs (lst_valid)

adds validation column using a list of validation filenames

lst_valid = ['test_data/battery/train/Sim_RealisticCycle2.hdf5',
'test_data/battery/valid/Sim_RealisticCycle3.hdf5']
CreateDict([ValidClmIs(lst_valid)])(hdf_files)
CPU times: user 275 μs, sys: 7 μs, total: 282 μs
Wall time: 284 μs
[{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
  'valid': False},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5',
  'valid': False},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5',
  'valid': False}]

source

FilterClm

 FilterClm (clm_name, func=<function <lambda>>)

adds validation column using a list of validation filenames

CreateDict([ValidClmIs(lst_valid),FilterClm('valid')])(hdf_files)
[]

source

get_hdf_seq_len

 get_hdf_seq_len (df, clm, ds=None)

extract the sequence length of the dataset with the ‘clm’ name and ‘f_path’ path


source

df_get_hdf_seq_len

 df_get_hdf_seq_len (df, clm, ds=None)

extracts the sequence length of every file in advance to prepare repeated window extractions with ‘DfHDFCreateWindows’


source

DfHDFGetSeqLen

 DfHDFGetSeqLen (clm)
df_get_hdf_seq_len(df,'u')
path seq_len
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000
DfHDFGetSeqLen('u')(df)
path seq_len
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000

source

DfResamplingFactor

 DfResamplingFactor (src_fs, lst_targ_fs)
targ_fs = [50,100,300]
test_eq(len(DfResamplingFactor(100,targ_fs)(df)),9)   
df['src_fs'] = 200.
test_eq(len(DfResamplingFactor('src_fs',targ_fs)(df)),9)

source

DfHDFCreateWindows

 DfHDFCreateWindows (win_sz, stp_sz, clm, fixed_start=False,
                     fixed_end=False)

create windows of sequences, splits sequence into multiple items

create_win = DfHDFCreateWindows(win_sz=100.2,stp_sz=100,clm='u')
win_df = create_win(df)
win_df
CPU times: user 710 μs, sys: 496 μs, total: 1.21 ms
Wall time: 807 μs
path seq_len src_fs l_slc r_slc
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 0 100.2
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 100 200.2
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 200 300.2
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 300 400.2
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 400 500.2
... ... ... ... ... ...
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 79400 79500.2
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 79500 79600.2
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 79600 79700.2
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 79700 79800.2
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 79800 79900.2

1877 rows × 5 columns

win_df = DfHDFCreateWindows(win_sz=20_000,stp_sz=1000,clm='u')(df)
test_eq(len(win_df),131)
win_df
path seq_len src_fs l_slc r_slc
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 0 20000
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000 200.0 0 20000
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000 200.0 1000 21000
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000 200.0 2000 22000
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000 200.0 3000 23000
... ... ... ... ... ...
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 56000 76000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 57000 77000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 58000 78000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 59000 79000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 60000 80000

131 rows × 5 columns

test_eq(create_win(df_get_hdf_seq_len(df,'u')) , create_win(df))
res_win_df = create_win(DfResamplingFactor(20,[0.1])(df))
res_win_df
path seq_len src_fs targ_fs resampling_factor l_slc r_slc
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000 200.0 0.1 0.005 0 100.2
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000 200.0 0.1 0.005 100 200.2
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000 200.0 0.1 0.005 200 300.2
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000 200.0 0.1 0.005 300 400.2
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 0.1 0.005 0 100.2
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 0.1 0.005 100 200.2
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 0.1 0.005 200 300.2
test_eq(len(res_win_df),7)
query_expr = 'l_slc <= 200'
filt_df = win_df.query(query_expr)
filt_df
path seq_len src_fs l_slc r_slc
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 0 20000
1 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 88000 200.0 0 20000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 0 20000

source

DfApplyFuncSplit

 DfApplyFuncSplit (split_func, func1, func2)

apply two different functions on the dataframe, func1 on the first indices of split_func, func2 on the second indices. Split_func is a Training, Validation split function

create_win_split = DfApplyFuncSplit(
    IndexSplitter([1,2]),
    DfHDFCreateWindows(win_sz=10000,stp_sz=1,clm='u'),
    DfHDFCreateWindows(win_sz=10000,stp_sz=10000,clm='u')
)
create_win_split(df)
path seq_len src_fs l_slc r_slc
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 0 10000
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 1 10001
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 2 10002
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 3 10003
0 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 20000 200.0 4 10004
... ... ... ... ... ...
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 30000 40000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 40000 50000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 50000 60000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 60000 70000
2 /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 80000 200.0 70000 80000

10017 rows × 5 columns


source

DfFilterQuery

 DfFilterQuery (query)
test_eq(DfFilterQuery(query_expr)(win_df),filt_df)
tfm_src = CreateDict([ValidClmContains(['valid']),DfHDFCreateWindows(win_sz=100+1,stp_sz=10,clm='u')])
src_dicts = tfm_src(hdf_files)
src_dicts[:5]
CPU times: user 7.65 ms, sys: 1.21 ms, total: 8.86 ms
Wall time: 8.36 ms
[{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
  'valid': True,
  'l_slc': 0,
  'r_slc': 101},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
  'valid': True,
  'l_slc': 10,
  'r_slc': 111},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
  'valid': True,
  'l_slc': 20,
  'r_slc': 121},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
  'valid': True,
  'l_slc': 30,
  'r_slc': 131},
 {'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
  'valid': True,
  'l_slc': 40,
  'r_slc': 141}]

source

DfDropClmExcept

 DfDropClmExcept (clms=['path', 'l_slc', 'r_slc', 'p_sample',
                  'resampling_factor'])

drop unused dataframe columns as a last optional step to accelerate dictionary conversion

2. Convert Paths to Sequence Objects

Der Pfad wird unter Angabe der Spaltennamen in Sequenzen und Skalare Werte umgewandelt, um so am Ende ein 3-Tupel zu erhalten aus: - (Sequence, Scalar, Sequence) <-> (input,input,output)

2.1 Extract sequential data from hdf5-files

Two different functions, based on pandas df and on lists

2.1.1 Shift time Series

Sometimes we need to shift columns of a sequence by a specific value. Then we cant simply slice the array but have to handle each column individually. First a performance test has to be made.


source

calc_shift_offsets

 calc_shift_offsets (clm_shift)
shft = [0,0,-1,1]
calc_shift_offsets(shft)
(array([1, 1, 0, 2]), array([-1, -1, -2,  0]), np.int64(2))

both shifting methods have their own performance character. vstack needs double the time on short sequences, while the creation of a seperate array with copy becomes worse starting at around 5000 elements

# ta = array([[1,2,3]*2]*10000)
# %%timeit
# y = np.vstack([ta[i:-ta.shape[1]+i,i] for i in range(ta.shape[1])]).T
# %%timeit
# x = np.zeros((ta.shape[0]-ta.shape[1],ta.shape[1]))
# for i in range(ta.shape[1]):
#     x[:,i] = ta[i:-ta.shape[1]+i,i]

2.1.2 HDF2Sequence

HDF5 performance is massively affected by the dtype of the signals. f4 (32 bit floating point) Numbers are faster to load and lead to smaller files then f8 numbers.


source

running_mean

 running_mean (x, N)

source

downsample_mean

 downsample_mean (x, N)

source

resample_interp

 resample_interp (x, resampling_factor, sequence_first=True,
                  lowpass_cut=1.0, upsample_cubic_cut=None)

*signal resampling using linear or cubic interpolation

x: signal to resample with shape: features x resampling_dimension or resampling_dimension x features if sequence_first=True resampling_factor: Factor > 0 that scales the signal lowpass_cut: Upper boundary for resampling_factor that activates the lowpassfilter, low values exchange accuracy for performance, default is 0.7 upsample_cubic_cut: Lower boundary for resampling_factor that activates cubic interpolation at high upsampling values. Improves signal dynamics in exchange of performance. None deactivates cubic interpolation*

x = np.random.normal(size=(100000,9))
test_eq(resample_interp(x,0.3).shape[0],30000)
/var/folders/pc/13zbh_m514n1tp522cx9npt00000gn/T/ipykernel_31734/551091804.py:35: DeprecationWarning: __array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword
  x = array(nn.functional.interpolate(x_int, size=targ_size, mode='linear',align_corners=False)[0])

source

hdf_extract_sequence

 hdf_extract_sequence (hdf_path, clms, dataset=None, l_slc=None,
                       r_slc=None, resampling_factor=None, fs_idx=None,
                       dt_idx=False, fast_resample=True)

*extracts a sequence with the shape [seq_len x num_features]

hdf_path: file path of hdf file, may be a string or path type clms: list of dataset names of sequences in hdf file dataset: dataset root for clms. Useful for multiples sequences stored in one file. l_slc: left boundary for extraction of a window of the whole sequence r_slc: right boundary for extraction of a window of the whole sequence resampling_factor: scaling factor for the sequence length, uses ‘resample_interp’ for resampling fs_idx: clms list idx of fs entry in sequence. Will be scaled by resampling_factor after resampling dt_idx: clms list idx of dt entry in sequence. Will be scaled by resampling_factor after resampling fast_resample: if True, uses linear interpolation with anti-aliasing filter for faster resampling. Is less accurate than fft based resampling*


source

Memoize

 Memoize (fn)

Initialize self. See help(type(self)) for accurate signature.


source

MemoizeMP

 MemoizeMP (fn)

Initialize self. See help(type(self)) for accurate signature.


source

HDF2Sequence

 HDF2Sequence (clm_names, clm_shift=None, truncate_sz=None,
               to_cls=<function noop>, cached=True, fs_idx=None,
               dt_idx=None, fast_resample=True)

Delegates (__call__,decode,setup) to (encodes,decodes,setups) if split_idx matches

# %%timeit
hdf2seq = HDF2Sequence(['u','y'],cached=False)
hdf2seq(hdf_files[0])
array([[ 0.27021001,  0.24308601],
       [ 0.14557693,  0.23587583],
       [ 0.05459135,  0.23415912],
       ...,
       [ 0.78831281, -0.1781944 ],
       [ 0.61458185, -0.14866701],
       [ 0.55758711, -0.11364614]], shape=(20000, 2))
hdf2seq = HDF2Sequence(['u','y'],clm_shift=[1,1])
hdf2seq(hdf_files[0])
array([[ 0.14557693,  0.23587583],
       [ 0.05459135,  0.23415912],
       [-0.0295274 ,  0.23656251],
       ...,
       [ 0.78831281, -0.1781944 ],
       [ 0.61458185, -0.14866701],
       [ 0.55758711, -0.11364614]], shape=(19999, 2))
hdf2seq = HDF2Sequence(['u','y'],cached='shared')
print(hdf2seq(hdf_files[0]))
[[ 0.27021001  0.24308601]
 [ 0.14557693  0.23587583]
 [ 0.05459135  0.23415912]
 ...
 [ 0.78831281 -0.1781944 ]
 [ 0.61458185 -0.14866701]
 [ 0.55758711 -0.11364614]]
# %%timeit
hdf2seq(hdf_files[0])
array([[ 0.27021001,  0.24308601],
       [ 0.14557693,  0.23587583],
       [ 0.05459135,  0.23415912],
       ...,
       [ 0.78831281, -0.1781944 ],
       [ 0.61458185, -0.14866701],
       [ 0.55758711, -0.11364614]], shape=(20000, 2))
hdf2seq = HDF2Sequence(['u','y'],cached=True)
# %%timeit
hdf2seq(hdf_files[0])
array([[ 0.27021001,  0.24308601],
       [ 0.14557693,  0.23587583],
       [ 0.05459135,  0.23415912],
       ...,
       [ 0.78831281, -0.1781944 ],
       [ 0.61458185, -0.14866701],
       [ 0.55758711, -0.11364614]], shape=(20000, 2))

Die Funktion lässt sich mittels Pipeline auf eine Liste von Quellobjekten (hier Pfade) anwenden

hdf2seq = HDF2Sequence(['u'])
hdf2seq(hdf_files[0]).shape
(20000, 1)
pipe = Pipeline(HDF2Sequence(['u','y']))
# res_pipe = pipe(hdf_files)
# len(res_pipe), res_pipe[0][0]

Performance Test

Caching stores the arrays for future use at every function call. Very usefull, especially for windows. Should allways be turned. Only explicitly turn it off when there is not enough memory for your data.

tfms=[  [HDF2Sequence(['u','y'],cached=None)],
        [HDF2Sequence(['y'],cached=None)]]
dsrc = Datasets(src_dicts[:1000],tfms=tfms)
len(dsrc)
1000
# %%time
# for x in dsrc:
#     x
tfms=[  [HDF2Sequence(['u','y'],cached=True,clm_shift=[1,2])],
        [HDF2Sequence(['y'],cached=True)]]
dsrc = Datasets(src_dicts[:1000],tfms=tfms)
# # %%timeit
# for x in dsrc:
#     x

Caching is way faster because every file gets loaded multiple times

Extract Scalar data from hdf5-files


source

hdf2scalars

 hdf2scalars (hdf_path, c_names, dataset=None)
# hdf2scalars('/mnt/data/sicwell/hdf5/Cycles/ch3/cycle00568.hdf5',['soc','temperature1'],dataset='measurement_00000')

source

HDF2Scalars

 HDF2Scalars (clm_names, to_cls=<function noop>)

Delegates (__call__,decode,setup) to (encodes,decodes,setups) if split_idx matches

# HDF2Scalars(['soc','temperature1'])({'path':'/mnt/data/sicwell/hdf5/Cycles/ch3/cycle00568.hdf5','dataset':'measurement_00000'})

Extract Scalar from sequence


source

ScalarSequenceElement

 ScalarSequenceElement (idx, to_cls=<function noop>)

Delegates (__call__,decode,setup) to (encodes,decodes,setups) if split_idx matches

ScalarSequenceElement(-1)(hdf2seq(hdf_files[0]))
array([0.55758711])

2.1 Datatypes for Sequences and Scalars


source

TensorSequencesOutput

 TensorSequencesOutput (x, **kwargs)

A Tensor which support subclass pickling, and maintains metadata when casting or after methods


source

TensorSequencesInput

 TensorSequencesInput (x, **kwargs)

A Tensor which support subclass pickling, and maintains metadata when casting or after methods


source

TensorSequences

 TensorSequences (x, **kwargs)

A Tensor which support subclass pickling, and maintains metadata when casting or after methods

f = TensorSequencesInput.from_hdf(['u'])
type(f(hdf_files[0]))
numpy.ndarray
# TensorSequences(np.ones((30,2))).show()

toTensorSequencesOutput

 toTensorSequencesOutput (*args, split_idx=None, **kwargs)

Delegates (__call__,decode,setup) to (encodes,decodes,setups) if split_idx matches


toTensorSequencesInput

 toTensorSequencesInput (*args, split_idx=None, **kwargs)

Delegates (__call__,decode,setup) to (encodes,decodes,setups) if split_idx matches


source

TensorScalarsOutput

 TensorScalarsOutput (x, **kwargs)

A Tensor which support subclass pickling, and maintains metadata when casting or after methods


source

TensorScalarsInput

 TensorScalarsInput (x, **kwargs)

A Tensor which support subclass pickling, and maintains metadata when casting or after methods


source

TensorScalars

 TensorScalars (x, **kwargs)

A Tensor which support subclass pickling, and maintains metadata when casting or after methods

The tensor subclassing mechanism since pytorch 1.7 keeps the tensor type in tensor operations. Operations with different branches of subclasses of tensors require a implementation of ‘torch_function’. Fastai implements ‘TensorBase.register_func’ to mark methods that behave for the given types like the default torch operation.

https://pytorch.org/docs/stable/notes/extending.html#extending-torch

x1 = TensorSequencesInput(torch.rand((10,10)))
x2 = TensorSequencesOutput(torch.rand((10,10)))
torch.nn.functional.mse_loss(x1,x2)
TensorSequencesInput(0.1615)

5.1 Low-Level with Transforms

tfms=[  [HDF2Sequence(['u']),toTensorSequencesInput],
        [HDF2Sequence(['y']),toTensorSequencesOutput]]
ds = Datasets(get_hdf_files(f_path),tfms=tfms)
dls = ds.dataloaders(bs=1)
dls.one_batch()[0].shape
torch.Size([1, 88000, 1])

6. Show Batches and Results


source

plot_sequence

 plot_sequence (axs, in_sig, targ_sig, out_sig=None, **kwargs)

source

plot_seqs_single_figure

 plot_seqs_single_figure (n_samples, n_targ, samples, plot_func,
                          outs=None, **kwargs)

source

plot_seqs_multi_figures

 plot_seqs_multi_figures (n_samples, n_targ, samples, plot_func,
                          outs=None, **kwargs)
dls.show_batch()