from nbdev.config import get_config
Corefunctions
Application Structure
The data will be extracted and prepared via transforms. Those are grouped in: - Type Transforms: Those extraxt the needed components from the source items, like input sequences or target scalar values. The work on single tensors. - Item Transforms: Those Transforms may work on tuple level and therefore may process relationships between input and output. - Batch Transform: Those transforms work on batch level. They receive batched tensors and may apply lazy transforms like normalization very effeciently.
An application example may look like the following: - sourceitems: - path extraction with hdf5 file endings - create pandas dataframe with information for type transforms, like slices - filter items in pandas dataframe - type transforms: - extract hdf5 input and output sequence - create windows - item transforms: - filter sequence by value - shift output sequence by 1 element - batch transforms: - noise injection - normalization
obj_in_lst
obj_in_lst (lst, cls)
retrieve first object of type cls from a list
count_parameters
count_parameters (model)
retrieve number of trainable parameters of a model
1. Extract Source Items
The file paths may be extracted with get_files
of fastai2. get_hdf_files
removes the need of writing the hdf5 file extension.
Then a pandas dataframe may be created in case further information for the source items need to be stored like slices for the windowing function.
1.1 Extract File Paths
= get_config().config_file.parent
project_root = project_root / 'test_data/WienerHammerstein' f_path
= get_files(f_path,extensions='.hdf5',recurse=True)
hdf_files len(hdf_files),hdf_files[0]
(3,
Path('/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5'))
get_hdf_files
get_hdf_files (path, recurse=True, folders=None)
Get hdf5 files in path
recursively, only in folders
, if specified.
= get_hdf_files(f_path)
hdf_files len(hdf_files),hdf_files[0]
(3,
Path('/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5'))
1.2 Create Source Dictionaries
In order to extract mulitple realizations of one file with different modifications, we create a list of properties. Pandas Dataframes are to slow for iteration but very fast and convenient for creations. So after creation of the pandas Dataframe we convert it to a list of dictionaries.
apply_df_tfms
apply_df_tfms (src, pd_tfms=None)
Create Pandas Dataframe out of a list of items, with a list of df transforms applied
= apply_df_tfms(hdf_files)
df df.head()
path | |
---|---|
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 |
test_eq(apply_df_tfms(hdf_files),apply_df_tfms(apply_df_tfms(hdf_files)))
CreateDict
CreateDict (pd_tfms=None)
Create List of Dictionarys out of a list of items, with a list of df transforms applied
=CreateDict()(hdf_files)
l_dict l_dict
[{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5'},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5'},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5'}]
ValidClmContains
ValidClmContains (lst_valid)
add validation column using a list of strings that are part of the validation frames
= ['valid']
lst_valid CreateDict([ValidClmContains(lst_valid)])(hdf_files)
CPU times: user 432 μs, sys: 4 μs, total: 436 μs
Wall time: 449 μs
[{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
'valid': True},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5',
'valid': False},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5',
'valid': False}]
ValidClmIs
ValidClmIs (lst_valid)
adds validation column using a list of validation filenames
= ['test_data/battery/train/Sim_RealisticCycle2.hdf5',
lst_valid 'test_data/battery/valid/Sim_RealisticCycle3.hdf5']
CreateDict([ValidClmIs(lst_valid)])(hdf_files)
CPU times: user 275 μs, sys: 7 μs, total: 282 μs
Wall time: 284 μs
[{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
'valid': False},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5',
'valid': False},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5',
'valid': False}]
FilterClm
FilterClm (clm_name, func=<function <lambda>>)
adds validation column using a list of validation filenames
'valid')])(hdf_files) CreateDict([ValidClmIs(lst_valid),FilterClm(
[]
get_hdf_seq_len
get_hdf_seq_len (df, clm, ds=None)
extract the sequence length of the dataset with the ‘clm’ name and ‘f_path’ path
df_get_hdf_seq_len
df_get_hdf_seq_len (df, clm, ds=None)
extracts the sequence length of every file in advance to prepare repeated window extractions with ‘DfHDFCreateWindows’
DfHDFGetSeqLen
DfHDFGetSeqLen (clm)
'u') df_get_hdf_seq_len(df,
path | seq_len | |
---|---|---|
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 |
'u')(df) DfHDFGetSeqLen(
path | seq_len | |
---|---|---|
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 |
DfResamplingFactor
DfResamplingFactor (src_fs, lst_targ_fs)
= [50,100,300]
targ_fs len(DfResamplingFactor(100,targ_fs)(df)),9)
test_eq('src_fs'] = 200.
df[len(DfResamplingFactor('src_fs',targ_fs)(df)),9) test_eq(
DfHDFCreateWindows
DfHDFCreateWindows (win_sz, stp_sz, clm, fixed_start=False, fixed_end=False)
create windows of sequences, splits sequence into multiple items
= DfHDFCreateWindows(win_sz=100.2,stp_sz=100,clm='u')
create_win = create_win(df)
win_df win_df
CPU times: user 710 μs, sys: 496 μs, total: 1.21 ms
Wall time: 807 μs
path | seq_len | src_fs | l_slc | r_slc | |
---|---|---|---|---|---|
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 0 | 100.2 |
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 100 | 200.2 |
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 200 | 300.2 |
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 300 | 400.2 |
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 400 | 500.2 |
... | ... | ... | ... | ... | ... |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 79400 | 79500.2 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 79500 | 79600.2 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 79600 | 79700.2 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 79700 | 79800.2 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 79800 | 79900.2 |
1877 rows × 5 columns
= DfHDFCreateWindows(win_sz=20_000,stp_sz=1000,clm='u')(df)
win_df len(win_df),131)
test_eq( win_df
path | seq_len | src_fs | l_slc | r_slc | |
---|---|---|---|---|---|
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 0 | 20000 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 | 200.0 | 0 | 20000 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 | 200.0 | 1000 | 21000 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 | 200.0 | 2000 | 22000 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 | 200.0 | 3000 | 23000 |
... | ... | ... | ... | ... | ... |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 56000 | 76000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 57000 | 77000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 58000 | 78000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 59000 | 79000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 60000 | 80000 |
131 rows × 5 columns
'u')) , create_win(df)) test_eq(create_win(df_get_hdf_seq_len(df,
= create_win(DfResamplingFactor(20,[0.1])(df))
res_win_df res_win_df
path | seq_len | src_fs | targ_fs | resampling_factor | l_slc | r_slc | |
---|---|---|---|---|---|---|---|
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 | 200.0 | 0.1 | 0.005 | 0 | 100.2 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 | 200.0 | 0.1 | 0.005 | 100 | 200.2 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 | 200.0 | 0.1 | 0.005 | 200 | 300.2 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 | 200.0 | 0.1 | 0.005 | 300 | 400.2 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 0.1 | 0.005 | 0 | 100.2 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 0.1 | 0.005 | 100 | 200.2 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 0.1 | 0.005 | 200 | 300.2 |
len(res_win_df),7) test_eq(
= 'l_slc <= 200'
query_expr = win_df.query(query_expr)
filt_df filt_df
path | seq_len | src_fs | l_slc | r_slc | |
---|---|---|---|---|---|
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 0 | 20000 |
1 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/test/WienerHammerstein_test.hdf5 | 88000 | 200.0 | 0 | 20000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 0 | 20000 |
DfApplyFuncSplit
DfApplyFuncSplit (split_func, func1, func2)
apply two different functions on the dataframe, func1 on the first indices of split_func, func2 on the second indices. Split_func is a Training, Validation split function
= DfApplyFuncSplit(
create_win_split 1,2]),
IndexSplitter([=10000,stp_sz=1,clm='u'),
DfHDFCreateWindows(win_sz=10000,stp_sz=10000,clm='u')
DfHDFCreateWindows(win_sz
) create_win_split(df)
path | seq_len | src_fs | l_slc | r_slc | |
---|---|---|---|---|---|
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 0 | 10000 |
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 1 | 10001 |
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 2 | 10002 |
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 3 | 10003 |
0 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5 | 20000 | 200.0 | 4 | 10004 |
... | ... | ... | ... | ... | ... |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 30000 | 40000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 40000 | 50000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 50000 | 60000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 60000 | 70000 |
2 | /Users/daniel/Development/tsfast/test_data/WienerHammerstein/train/WienerHammerstein_train.hdf5 | 80000 | 200.0 | 70000 | 80000 |
10017 rows × 5 columns
DfFilterQuery
DfFilterQuery (query)
test_eq(DfFilterQuery(query_expr)(win_df),filt_df)
= CreateDict([ValidClmContains(['valid']),DfHDFCreateWindows(win_sz=100+1,stp_sz=10,clm='u')])
tfm_src = tfm_src(hdf_files)
src_dicts 5] src_dicts[:
CPU times: user 7.65 ms, sys: 1.21 ms, total: 8.86 ms
Wall time: 8.36 ms
[{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
'valid': True,
'l_slc': 0,
'r_slc': 101},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
'valid': True,
'l_slc': 10,
'r_slc': 111},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
'valid': True,
'l_slc': 20,
'r_slc': 121},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
'valid': True,
'l_slc': 30,
'r_slc': 131},
{'path': '/Users/daniel/Development/tsfast/test_data/WienerHammerstein/valid/WienerHammerstein_valid.hdf5',
'valid': True,
'l_slc': 40,
'r_slc': 141}]
DfDropClmExcept
DfDropClmExcept (clms=['path', 'l_slc', 'r_slc', 'p_sample', 'resampling_factor'])
drop unused dataframe columns as a last optional step to accelerate dictionary conversion
2. Convert Paths to Sequence Objects
Der Pfad wird unter Angabe der Spaltennamen in Sequenzen und Skalare Werte umgewandelt, um so am Ende ein 3-Tupel zu erhalten aus: - (Sequence, Scalar, Sequence) <-> (input,input,output)
2.1 Extract sequential data from hdf5-files
Two different functions, based on pandas df and on lists
2.1.1 Shift time Series
Sometimes we need to shift columns of a sequence by a specific value. Then we cant simply slice the array but have to handle each column individually. First a performance test has to be made.
calc_shift_offsets
calc_shift_offsets (clm_shift)
= [0,0,-1,1]
shft calc_shift_offsets(shft)
(array([1, 1, 0, 2]), array([-1, -1, -2, 0]), np.int64(2))
both shifting methods have their own performance character. vstack needs double the time on short sequences, while the creation of a seperate array with copy becomes worse starting at around 5000 elements
# ta = array([[1,2,3]*2]*10000)
# %%timeit
# y = np.vstack([ta[i:-ta.shape[1]+i,i] for i in range(ta.shape[1])]).T
# %%timeit
# x = np.zeros((ta.shape[0]-ta.shape[1],ta.shape[1]))
# for i in range(ta.shape[1]):
# x[:,i] = ta[i:-ta.shape[1]+i,i]
2.1.2 HDF2Sequence
HDF5 performance is massively affected by the dtype of the signals. f4 (32 bit floating point) Numbers are faster to load and lead to smaller files then f8 numbers.
running_mean
running_mean (x, N)
downsample_mean
downsample_mean (x, N)
resample_interp
resample_interp (x, resampling_factor, sequence_first=True, lowpass_cut=1.0, upsample_cubic_cut=None)
*signal resampling using linear or cubic interpolation
x: signal to resample with shape: features x resampling_dimension or resampling_dimension x features if sequence_first=True resampling_factor: Factor > 0 that scales the signal lowpass_cut: Upper boundary for resampling_factor that activates the lowpassfilter, low values exchange accuracy for performance, default is 0.7 upsample_cubic_cut: Lower boundary for resampling_factor that activates cubic interpolation at high upsampling values. Improves signal dynamics in exchange of performance. None deactivates cubic interpolation*
= np.random.normal(size=(100000,9))
x 0.3).shape[0],30000) test_eq(resample_interp(x,
/var/folders/pc/13zbh_m514n1tp522cx9npt00000gn/T/ipykernel_31734/551091804.py:35: DeprecationWarning: __array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword
x = array(nn.functional.interpolate(x_int, size=targ_size, mode='linear',align_corners=False)[0])
hdf_extract_sequence
hdf_extract_sequence (hdf_path, clms, dataset=None, l_slc=None, r_slc=None, resampling_factor=None, fs_idx=None, dt_idx=False, fast_resample=True)
*extracts a sequence with the shape [seq_len x num_features]
hdf_path: file path of hdf file, may be a string or path type clms: list of dataset names of sequences in hdf file dataset: dataset root for clms. Useful for multiples sequences stored in one file. l_slc: left boundary for extraction of a window of the whole sequence r_slc: right boundary for extraction of a window of the whole sequence resampling_factor: scaling factor for the sequence length, uses ‘resample_interp’ for resampling fs_idx: clms list idx of fs entry in sequence. Will be scaled by resampling_factor after resampling dt_idx: clms list idx of dt entry in sequence. Will be scaled by resampling_factor after resampling fast_resample: if True, uses linear interpolation with anti-aliasing filter for faster resampling. Is less accurate than fft based resampling*
Memoize
Memoize (fn)
Initialize self. See help(type(self)) for accurate signature.
MemoizeMP
MemoizeMP (fn)
Initialize self. See help(type(self)) for accurate signature.
HDF2Sequence
HDF2Sequence (clm_names, clm_shift=None, truncate_sz=None, to_cls=<function noop>, cached=True, fs_idx=None, dt_idx=None, fast_resample=True)
Delegates (__call__
,decode
,setup
) to (encodes
,decodes
,setups
) if split_idx
matches
# %%timeit
= HDF2Sequence(['u','y'],cached=False)
hdf2seq 0]) hdf2seq(hdf_files[
array([[ 0.27021001, 0.24308601],
[ 0.14557693, 0.23587583],
[ 0.05459135, 0.23415912],
...,
[ 0.78831281, -0.1781944 ],
[ 0.61458185, -0.14866701],
[ 0.55758711, -0.11364614]], shape=(20000, 2))
= HDF2Sequence(['u','y'],clm_shift=[1,1])
hdf2seq 0]) hdf2seq(hdf_files[
array([[ 0.14557693, 0.23587583],
[ 0.05459135, 0.23415912],
[-0.0295274 , 0.23656251],
...,
[ 0.78831281, -0.1781944 ],
[ 0.61458185, -0.14866701],
[ 0.55758711, -0.11364614]], shape=(19999, 2))
= HDF2Sequence(['u','y'],cached='shared')
hdf2seq print(hdf2seq(hdf_files[0]))
[[ 0.27021001 0.24308601]
[ 0.14557693 0.23587583]
[ 0.05459135 0.23415912]
...
[ 0.78831281 -0.1781944 ]
[ 0.61458185 -0.14866701]
[ 0.55758711 -0.11364614]]
# %%timeit
0]) hdf2seq(hdf_files[
array([[ 0.27021001, 0.24308601],
[ 0.14557693, 0.23587583],
[ 0.05459135, 0.23415912],
...,
[ 0.78831281, -0.1781944 ],
[ 0.61458185, -0.14866701],
[ 0.55758711, -0.11364614]], shape=(20000, 2))
= HDF2Sequence(['u','y'],cached=True) hdf2seq
# %%timeit
0]) hdf2seq(hdf_files[
array([[ 0.27021001, 0.24308601],
[ 0.14557693, 0.23587583],
[ 0.05459135, 0.23415912],
...,
[ 0.78831281, -0.1781944 ],
[ 0.61458185, -0.14866701],
[ 0.55758711, -0.11364614]], shape=(20000, 2))
Die Funktion lässt sich mittels Pipeline auf eine Liste von Quellobjekten (hier Pfade) anwenden
= HDF2Sequence(['u'])
hdf2seq 0]).shape hdf2seq(hdf_files[
(20000, 1)
= Pipeline(HDF2Sequence(['u','y'])) pipe
# res_pipe = pipe(hdf_files)
# len(res_pipe), res_pipe[0][0]
Performance Test
Caching stores the arrays for future use at every function call. Very usefull, especially for windows. Should allways be turned. Only explicitly turn it off when there is not enough memory for your data.
=[ [HDF2Sequence(['u','y'],cached=None)],
tfms'y'],cached=None)]]
[HDF2Sequence([= Datasets(src_dicts[:1000],tfms=tfms) dsrc
len(dsrc)
1000
# %%time
# for x in dsrc:
# x
=[ [HDF2Sequence(['u','y'],cached=True,clm_shift=[1,2])],
tfms'y'],cached=True)]]
[HDF2Sequence([= Datasets(src_dicts[:1000],tfms=tfms) dsrc
# # %%timeit
# for x in dsrc:
# x
Caching is way faster because every file gets loaded multiple times
Extract Scalar data from hdf5-files
hdf2scalars
hdf2scalars (hdf_path, c_names, dataset=None)
# hdf2scalars('/mnt/data/sicwell/hdf5/Cycles/ch3/cycle00568.hdf5',['soc','temperature1'],dataset='measurement_00000')
HDF2Scalars
HDF2Scalars (clm_names, to_cls=<function noop>)
Delegates (__call__
,decode
,setup
) to (encodes
,decodes
,setups
) if split_idx
matches
# HDF2Scalars(['soc','temperature1'])({'path':'/mnt/data/sicwell/hdf5/Cycles/ch3/cycle00568.hdf5','dataset':'measurement_00000'})
Extract Scalar from sequence
ScalarSequenceElement
ScalarSequenceElement (idx, to_cls=<function noop>)
Delegates (__call__
,decode
,setup
) to (encodes
,decodes
,setups
) if split_idx
matches
-1)(hdf2seq(hdf_files[0])) ScalarSequenceElement(
array([0.55758711])
2.1 Datatypes for Sequences and Scalars
TensorSequencesOutput
TensorSequencesOutput (x, **kwargs)
A Tensor
which support subclass pickling, and maintains metadata when casting or after methods
TensorSequencesInput
TensorSequencesInput (x, **kwargs)
A Tensor
which support subclass pickling, and maintains metadata when casting or after methods
TensorSequences
TensorSequences (x, **kwargs)
A Tensor
which support subclass pickling, and maintains metadata when casting or after methods
= TensorSequencesInput.from_hdf(['u'])
f type(f(hdf_files[0]))
numpy.ndarray
# TensorSequences(np.ones((30,2))).show()
toTensorSequencesOutput
toTensorSequencesOutput (*args, split_idx=None, **kwargs)
Delegates (__call__
,decode
,setup
) to (encodes
,decodes
,setups
) if split_idx
matches
toTensorSequencesInput
toTensorSequencesInput (*args, split_idx=None, **kwargs)
Delegates (__call__
,decode
,setup
) to (encodes
,decodes
,setups
) if split_idx
matches
TensorScalarsOutput
TensorScalarsOutput (x, **kwargs)
A Tensor
which support subclass pickling, and maintains metadata when casting or after methods
TensorScalarsInput
TensorScalarsInput (x, **kwargs)
A Tensor
which support subclass pickling, and maintains metadata when casting or after methods
TensorScalars
TensorScalars (x, **kwargs)
A Tensor
which support subclass pickling, and maintains metadata when casting or after methods
The tensor subclassing mechanism since pytorch 1.7 keeps the tensor type in tensor operations. Operations with different branches of subclasses of tensors require a implementation of ‘torch_function’. Fastai implements ‘TensorBase.register_func’ to mark methods that behave for the given types like the default torch operation.
https://pytorch.org/docs/stable/notes/extending.html#extending-torch
= TensorSequencesInput(torch.rand((10,10)))
x1 = TensorSequencesOutput(torch.rand((10,10)))
x2 torch.nn.functional.mse_loss(x1,x2)
TensorSequencesInput(0.1615)
5.1 Low-Level with Transforms
=[ [HDF2Sequence(['u']),toTensorSequencesInput],
tfms'y']),toTensorSequencesOutput]]
[HDF2Sequence([= Datasets(get_hdf_files(f_path),tfms=tfms) ds
= ds.dataloaders(bs=1)
dls 0].shape dls.one_batch()[
torch.Size([1, 88000, 1])
6. Show Batches and Results
plot_sequence
plot_sequence (axs, in_sig, targ_sig, out_sig=None, **kwargs)
plot_seqs_single_figure
plot_seqs_single_figure (n_samples, n_targ, samples, plot_func, outs=None, **kwargs)
plot_seqs_multi_figures
plot_seqs_multi_figures (n_samples, n_targ, samples, plot_func, outs=None, **kwargs)
dls.show_batch()