opinf.pre
#
Tools for preprocessing snapshot data after (optional) lifting but prior to compression.
Classes
Template class for transformers. |
|
Identity transformation \(\q\mapsto\q\). |
|
Shift snapshots by a given reference snapshot \(\bar{\q}\). |
|
Scale (nondimensionalize) snapshots as a whole or by row. |
|
Process snapshots by vector centering and/or affine scaling (in that order). |
|
Chain multiple transformers. |
|
Join transformers together for states with multiple variables. |
Functions
Shift the columns of a snapshot matrix by a vector. |
|
Scale the entries of a snapshot matrix to a specified interval. |
Overview
Operator Inference performance often improves when the training data are standardized. Multivariable data in particular benefits from preprocessing.
opinf.pre
classes define invertible transformations for data standardization.
Example Data
The examples on this page use data downsampled from the combustion problem described in [SKHW20].
State Variables
The data consists of nine variables recorded at 100 points in time.
Pressure \(p\)
\(x\)-velocity \(v_{x}\)
\(y\)-velocity \(v_{y}\)
Temperature \(T\)
Specific volume (inverse density) \(\xi = 1/\rho\)
Chemical species molar concentrations for CH\(_{4}\), O\(_{2}\), CO\(_{2}\), and H\(_{2}\)O.
The dimension of the spatial discretization in the full example in [SKHW20] is \(n_x = 38{,}523\) for each of the \(n_q = 9\) variables, so the total state dimension is \(n_q n_x = 9 \times 38{,}523 = 346{,}707\). For demonstration purposes, we have downsampled the state dimension to \(n_x' = 535\), hence \(n = n_q n_x' = 9 \times 535 = 4{,}815\) is the total state dimension of the example data.
You can download the data here to repeat the experiments. The full dataset is available here.
import numpy as np
import matplotlib.pyplot as plt
import opinf
opinf.utils.mpl_config()
Preprocessing Data#
Raw dynamical systems data often need to be lightly preprocessed before use in Operator Inference. This module includes tools for centering/shifting and scaling/nondimensionalization of snapshot data after lifting (when applicable) and prior to dimensionality reduction.
Notation
On this page,
\(\q \in \RR^n\) denotes the unprocessed state variable for which we have \(k\) snapshots \(\q_0,\ldots,\q_{k-1}\in\RR^n\),
\(\q'\in\RR^n\) denotes state variable after being shifted (centered), and
\(\q''\in\RR^n\) denotes the state variable after being shifted and scaled (non-dimensionalized).
The tools demonstrated here define a mapping \(\mathcal{T}:\RR^n\to\RR^n\) with \(\q'' = \mathcal{T}(\q)\).
Lifting and Preprocessing
A lifting map can be viewed as a type of preprocessing map, \(\mathcal{L}:\RR^{n_1}\to\RR^{n_2}\). However, the preprocessing transformations defined in this module map from a vector space back to itself (\(n_1 = n_2\)) while lifting maps may augment the state with additional variables (\(n_2 \ge n_1\)).
Fit-and-Transform versus Transform
Pre-processing transformation classes are calibrated through user-provided hyperparameters in the constructor and/or training snapshots passed to fit()
or fit_transform()
.
The transform()
method applies but does not alter the transformation.
Some transformations are designed so that the transformed training data has certain properties, but those properties are not guaranteed to hold for transformed data that was not used for training.
Example
Consider a set of training snapshots \(\{\q_{j}\}_{j=0}^{k-1}\subset\RR^n\).
The ShiftScaleTransformer
can shift data by the mean training snapshot, meaning it can represent the transformation \(\mathcal{T}:\RR^{n}\to\RR^{n}\) given by
The key property of this transformation is that the transformed training snapshots have zero mean. That is,
However, for any other collection \(\{\mathbf{x}_j\}_{j=0}^{k'-1}\subset\RR^{n}\) of snapshots, the set of transformed snapshots \(\{\mathcal{T}(\mathbf{x}_j)\}_{j=0}^{k'-1}\) is not guaranteed to have zero mean because \(\mathcal{T}\) shifts by the mean of the \(\q_j\)’s, not the mean of the \(\mathbf{x}_j\)’s. That is,
Shifting / Centering#
A common first preprocessing step is to shift the training snapshots by some reference snapshot \(\bar{\q}\in\RR^n\), i.e.,
The ShiftTransformer
receives a reference snapshot \(\bar{\q}\) and applies this transformation.
This is useful for scenarios where a specific \(\bar{\q}\) can result in desirable properties in the shifted data, such as homogeneous boundary conditions.
# Load the example snapshot data.
snapshots = np.load("pre_example.npy")
snapshots.shape
(4815, 100)
# Extract the pressure variable from the snapshot data.
pressure = np.split(snapshots, 9, axis=0)[0]
# Initialize a ShiftTransformer for shifting the pressure so that
# each row has a minimum of 0.
pressure_shifter = opinf.pre.ShiftTransformer(
pressure.min(axis=1),
name="pressure",
)
print(pressure_shifter)
ShiftTransformer for variable 'pressure'
state_dimension: 535
pressure_shifted = pressure_shifter.fit_transform(pressure)
pressure_shifted.shape
(535, 100)
print(f"minimum pressure before shift: {pressure.min():.2e}")
print(f"minimum pressure after shift: {pressure_shifted.min():.2e}")
minimum pressure before shift: 9.14e+05
minimum pressure after shift: 0.00e+00
One strategy that is often effective for Operator Inference is to set the reference snapshot to be the average of the training snapshots:
In this case, the transformed snapshots \(\q_j' = \q_j - \bar{\q}\) are centered around \(\0\).
This type of transformation can be accomplished using a ShiftScaleTransformer
with centering=True
.
# Initialize a ShiftScaleTransformer for centering the pressure.
pressure_transformer = opinf.pre.ShiftScaleTransformer(
centering=True,
name="pressure",
verbose=True,
)
print(pressure_transformer)
ShiftScaleTransformer for variable 'pressure'
centering: True
scaling: None
# Shift the pressure snapshots by the average pressure snapshot.
pressure_shifted = pressure_transformer.fit_transform(pressure)
<pressure>
Learned mean centering Q -> Q'
| min | mean | max | std
----|------------|------------|------------|------------
Q | 9.141e+05 | 1.143e+06 | 1.432e+06 | 6.470e+04
Q' | -2.332e+05 | -3.388e-12 | 2.743e+05 | 6.461e+04
# Plot the distribution of the entries of the raw and processed states.
fig, axes = plt.subplots(1, 2, sharey=True)
axes[0].hist(pressure.flatten(), bins=40)
axes[1].hist(pressure_shifted.flatten(), bins=40)
axes[0].set_ylabel("Frequency")
axes[0].set_xlabel("Pressure")
axes[1].set_xlabel("Shifted pressure")
fig.tight_layout()
plt.show()
Shifting Affects Model Form
Introducing a shift can cause a structural change in the governing dynamics. When shifting state variables, the structure of a reduced-order model should be determined based on the dynamics of the shifted variable, not the original variable.
Example 1: Linear System
Consider the linear system
The dynamics of the shifted variable \(\q'(t) = \q(t) - \bar{\q}\) are given by
which has a new constant term \(\A\bar{\q}\) in addition to a linear term \(\A\q'(t)\). If the variable \(\q\) is used for Operator Inference, the reduced-order model should take on the linear form \(\ddt\qhat(t) = \Ahat\qhat(t)\), while if \(\q'\) is the state variable, the reduced-order model should be \(\ddt\qhat(t) = \chat + \Ahat\qhat(t)\).
Example 2: Quadratic System
Consider the purely quadratic system
where \(\otimes\) denotes the Kronecker product. An appropriate reduced-order model for this system is also quadratic, \(\ddt\qhat(t) = \Hhat[\qhat(t)\otimes\qhat(t)]\). However, the dynamics of the shifted variable \(\q'(t) = \q(t) - \bar{\q}\) includes lower-order terms:
The terms \(\H[\bar{\q}\otimes\q'(t)] + \H[\q'(t)\otimes\bar{\q}]\) can be interpreted as a linear transformation of \(\q'(t)\), hence an appropriate reduced-order model for \(\q'(t)\) has the fully quadratic form \(\ddt\qhat(t) = \chat + \Ahat\qhat(t) + \Hhat[\qhat(t)\otimes\qhat(t)]\).
Scaling / Non-dimensionalization#
Many engineering problems feature multiple variables with ranges across different scales. For such cases, it is often beneficial to scale the variables to similar ranges so that one variable does not overwhelm the other during operator learning. In other words, training data should be nondimensionalized when possible.
A scaling operation for a single variable is given by
where \(\alpha \neq 0\) and \(\q'\) is a training snapshot after shifting (when desired).
The ScaleTransformer
class receives a scaler \(\alpha\) and implements this transformation.
# Initialize a ScaleTransformer for scaling the pressure to [0, 1].
pressure_scaler = opinf.pre.ScaleTransformer(
1 / pressure.max(), name="pressure"
)
print(pressure_scaler)
ScaleTransformer for variable 'pressure'
scaler: 6.9816e-07
# Apply the scaling.
pressure_scaled = pressure_scaler.fit_transform(pressure)
pressure_scaled.shape
(535, 100)
print(f"min pressure before scaling: {pressure.min():.2e}")
print(f"max pressure before scaling: {pressure.max():.2e}")
print(f"min pressure after scaling: {pressure_scaled.min():.2e}")
print(f"max pressure after scaling: {pressure_scaled.max():.2e}")
min pressure before scaling: 9.14e+05
max pressure before scaling: 1.43e+06
min pressure after scaling: 6.38e-01
max pressure after scaling: 1.00e+00
The entries of the state can be scaled individually by passing a vector to ScaleTransformer
.
# Scale the pressure so the maximum of each row is 1.
pressure_scaler = opinf.pre.ScaleTransformer(
1 / pressure.max(axis=1), name="pressure"
)
print(pressure_scaler)
ScaleTransformer for variable 'pressure'
state_dimension: 535
scaling by row
# Apply the scaling.
pressure_scaled2 = pressure_scaler.fit_transform(pressure)
pressure_scaled2.shape
(535, 100)
print(
"number of rows whose maximum is 1 (whole scaling): "
f"{np.count_nonzero(np.isclose(pressure_scaled.max(axis=1), 1))}"
)
print(
"number of rows whose maximum is 1 (row scaling): "
f"{np.count_nonzero(np.isclose(pressure_scaled2.max(axis=1), 1))}"
)
number of rows whose maximum is 1 (whole scaling): 1
number of rows whose maximum is 1 (row scaling): 535
The ShiftScaleTransformer
class implements several types of scalings that are calibrated from data.
For example, setting scaling="maxabs"
scales the training data by the inverse of its absolute maximum entry so that the resulting data lies in the interval \([-1, 1]\).
# Extract the velocity in the x direction.
xvelocity = np.split(snapshots, 9, axis=0)[1]
# Initialize a ShiftScaleTransformer for scaling the velocity to [-1, 1].
xvelocity_scaler = opinf.pre.ShiftScaleTransformer(
centering=False,
scaling="maxabs",
name="x velocity",
)
print(xvelocity_scaler)
ShiftScaleTransformer for variable 'x velocity'
centering: False
scaling: 'maxabs'
byrow: False
# Apply the scaling.
xvelocity_scaled = xvelocity_scaler.fit_transform(xvelocity)
xvelocity_scaled.shape
(535, 100)
print(f"min x-velocity before scaling: {xvelocity.min():.2e}")
print(f"max x-velocity before scaling: {xvelocity.max():.2e}")
print(f"min x-velocity after scaling: {xvelocity_scaled.min():.2e}")
print(f"max x-velocity after scaling: {xvelocity_scaled.max():.2e}")
min x-velocity before scaling: -3.19e+02
max x-velocity before scaling: 3.13e+02
min x-velocity after scaling: -1.00e+00
max x-velocity after scaling: 9.79e-01
The ShiftScaleTransformer
class can perform a mean-centering shift, followed by a data-driven scaling.
To link a custom shift with a custom scaling, instantiate a ShiftTransformer
and a ScaleTransformer
and pass them to a TransformerPipeline
.
# Combine the shift to zero from before with a custom scaling.
pressure_scaler = opinf.pre.ScaleTransformer(1e-6, "pressure")
pressure_transformer2 = opinf.pre.TransformerPipeline(
[pressure_shifter, pressure_scaler], name="pressure"
)
print(pressure_transformer2)
TransformerPipeline for variable 'pressure'
state_dimension: 535
num_transformers: 2
transformers
ShiftTransformer for variable 'pressure'
state_dimension: 535
ScaleTransformer for variable 'pressure'
scaler: 1.0000e-06
# Apply the scaling.
pressure_transformed = pressure_transformer2.fit_transform(pressure)
pressure_transformed.shape
(535, 100)
print(f"min pressure before shifting/scaling: {pressure.min():.2e}")
print(f"max pressure before shifting/scaling: {pressure.max():.2e}")
print(f"min pressure after shifting/scaling: {pressure_transformed.min():.2e}")
print(f"max pressure after shifting/scaling: {pressure_transformed.max():.2e}")
min pressure before shifting/scaling: 9.14e+05
max pressure before shifting/scaling: 1.43e+06
min pressure after shifting/scaling: 0.00e+00
max pressure after shifting/scaling: 5.05e-01
No Free Lunch
Choosing an advantageous preprocessing strategy is highly problem dependent, and the tools in this module are not the only ways to preprocess snapshot data. See, for example, [IK23] for a compelling application of Operator Inference to solar wind streams in which preprocessing plays a vital role.
Multivariable Data#
For systems where the full state consists of several variables (pressure, velocity, temperature, etc.), it may not be appropriate to apply the same scaling to each variable.
The TransformerMulti
class joins individual transformers together to handle multi-state data.
Below, we construct the following transformation for the nine state variables.
Pressure: center, then scale to \([-1, 1]\).
\(x\)-velocity: Scale to \([-1, 1]\).
\(y\)-velocity: Scale to \([-1, 1]\).
Temperature: center, then scale to \([-1, 1]\).
Specific volume: scale to \([0, 1]\).
Chemical species: scale to \([0, 1]\).
combustion_transformer = opinf.pre.TransformerMulti(
transformers=[
opinf.pre.ShiftScaleTransformer(
name="pressure", centering=True, scaling="maxabs", verbose=True
),
opinf.pre.ShiftScaleTransformer(
name="x-velocity", scaling="maxabs", verbose=True
),
opinf.pre.ShiftScaleTransformer(
name="y-velocity", scaling="maxabs", verbose=True
),
opinf.pre.ShiftScaleTransformer(
name="temperature", centering=True, scaling="maxabs", verbose=True
),
opinf.pre.ShiftScaleTransformer(
name="specific volume", scaling="minmax", verbose=True
),
opinf.pre.ShiftScaleTransformer(
name="methane", scaling="minmax", verbose=True
),
opinf.pre.ShiftScaleTransformer(
name="oxygen", scaling="minmax", verbose=True
),
opinf.pre.ShiftScaleTransformer(
name="carbon dioxide", scaling="minmax", verbose=True
),
opinf.pre.ShiftScaleTransformer(
name="water", scaling="minmax", verbose=True
),
]
)
snapshots_preprocessed = combustion_transformer.fit_transform(snapshots)
<pressure>
Learned mean centering Q -> Q' and maxabs scaling Q' -> Q''
| min | mean | max | std
----|------------|------------|------------|------------
Q | 9.141e+05 | 1.143e+06 | 1.432e+06 | 6.470e+04
Q' | -2.332e+05 | -3.388e-12 | 2.743e+05 | 6.461e+04
Q'' | -8.503e-01 | -1.243e-17 | 1.000e+00 | 2.355e-01
<x-velocity>
Learned maxabs scaling Q -> Q''
| min | mean | max | std
----|------------|------------|------------|------------
Q | -3.194e+02 | 7.419e+01 | 3.127e+02 | 5.663e+01
Q'' | -1.000e+00 | 2.323e-01 | 9.791e-01 | 1.773e-01
<y-velocity>
Learned maxabs scaling Q -> Q''
| min | mean | max | std
----|------------|------------|------------|------------
Q | -2.048e+02 | 1.487e+00 | 1.341e+02 | 1.970e+01
Q'' | -1.000e+00 | 7.259e-03 | 6.545e-01 | 9.618e-02
<temperature>
Learned mean centering Q -> Q' and maxabs scaling Q' -> Q''
| min | mean | max | std
----|------------|------------|------------|------------
Q | 2.670e+02 | 9.724e+02 | 2.742e+03 | 5.765e+02
Q' | -1.306e+03 | -2.380e-15 | 1.722e+03 | 4.329e+02
Q'' | -7.586e-01 | -1.378e-18 | 1.000e+00 | 2.513e-01
<specific volume>
Learned minmax scaling Q -> Q''
| min | mean | max | std
----|------------|------------|------------|------------
Q | 1.064e-01 | 3.350e-01 | 1.017e+00 | 2.018e-01
Q'' | 0.000e+00 | 2.510e-01 | 1.000e+00 | 2.216e-01
<methane>
Learned minmax scaling Q -> Q''
| min | mean | max | std
----|------------|------------|------------|------------
Q | 0.000e+00 | 3.511e-02 | 5.861e-01 | 9.908e-02
Q'' | 0.000e+00 | 5.991e-02 | 1.000e+00 | 1.691e-01
<oxygen>
Learned minmax scaling Q -> Q''
| min | mean | max | std
----|------------|------------|------------|------------
Q | 0.000e+00 | 3.845e-02 | 6.603e-02 | 2.267e-02
Q'' | 0.000e+00 | 5.823e-01 | 1.000e+00 | 3.433e-01
<carbon dioxide>
Learned minmax scaling Q -> Q''
| min | mean | max | std
----|------------|------------|------------|------------
Q | 0.000e+00 | 1.043e-01 | 1.598e-01 | 4.223e-02
Q'' | 0.000e+00 | 6.528e-01 | 1.000e+00 | 2.643e-01
<water>
Learned minmax scaling Q -> Q''
| min | mean | max | std
----|------------|------------|------------|------------
Q | 0.000e+00 | 1.649e-03 | 7.705e-03 | 2.339e-03
Q'' | 0.000e+00 | 2.141e-01 | 1.000e+00 | 3.035e-01
print(combustion_transformer)
TransformerMulti
state_dimension: 4815
num_variables: 9
transformers
ShiftScaleTransformer for variable 'pressure'
state_dimension: 535
centering: True
scaling: 'maxabs'
byrow: False
ShiftScaleTransformer for variable 'x-velocity'
state_dimension: 535
centering: False
scaling: 'maxabs'
byrow: False
ShiftScaleTransformer for variable 'y-velocity'
state_dimension: 535
centering: False
scaling: 'maxabs'
byrow: False
ShiftScaleTransformer for variable 'temperature'
state_dimension: 535
centering: True
scaling: 'maxabs'
byrow: False
ShiftScaleTransformer for variable 'specific volume'
state_dimension: 535
centering: False
scaling: 'minmax'
byrow: False
ShiftScaleTransformer for variable 'methane'
state_dimension: 535
centering: False
scaling: 'minmax'
byrow: False
ShiftScaleTransformer for variable 'oxygen'
state_dimension: 535
centering: False
scaling: 'minmax'
byrow: False
ShiftScaleTransformer for variable 'carbon dioxide'
state_dimension: 535
centering: False
scaling: 'minmax'
byrow: False
ShiftScaleTransformer for variable 'water'
state_dimension: 535
centering: False
scaling: 'minmax'
byrow: False
# Extract a single variable from the processed snapshots.
oxygen_processed = combustion_transformer.get_var(
"oxygen",
snapshots_preprocessed,
)
oxygen_processed.shape
(535, 100)
Custom Transformers#
New transformers can be defined by inheriting from the TransformerTemplate
.
Once implemented, the verify()
method may be used to test for consistency between the required methods.
class MyTransformer(opinf.pre.TransformerTemplate):
"""Custom pre-processing transformation."""
def __init__(self, hyperparameters, name=None):
"""Set any transformation hyperparameters.
If there are no hyperparameters, __init__() may be omitted.
"""
super().__init__(name)
# Process/store 'hyperparameters' here.
# Required methods --------------------------------------------------------
def fit_transform(self, states, inplace=False):
"""Learn and apply the transformation."""
# Set self.state_dimension in this method, e.g.,
self.state_dimension = len(states)
raise NotImplementedError
def transform(self, states, inplace=False):
"""Apply the learned transformation."""
raise NotImplementedError
def inverse_transform(self, states_transformed, inplace=False, locs=None):
"""Apply the inverse of the learned transformation."""
raise NotImplementedError
# Optional methods --------------------------------------------------------
# These may be deleted if not implemented.
def transform_ddts(self, ddts, inplace=False):
"""Apply the learned transformation to snapshot time derivatives."""
return NotImplemented
def save(self, savefile, overwrite=False):
"""Save the transformer to an HDF5 file."""
return NotImplemented
@classmethod
def load(cls, loadfile):
"""Load a transformer from an HDF5 file."""
return NotImplemented
See the TransformerTemplate
page for details on the arguments for each method.
Example: Hadamard Scaling#
The following class implements the transformation \(\mathcal{T}(\q) = \q \ast \w\) where \(\ast\) is the Hadamard (elementwise) product and \(\s\in\RR^{n}\) is a given vector with all nonzero entries.
The inverse of this transform is \(\mathcal{T}^{-1}(\q) = \q \ast \w'\) where the entries of \(\w'\in\RR^{n}\) are the inverse of the entries of \(\w\).
This transformation is equivalent to ScaleTransformer
with scaler
set to \(\w\) and can be interpreted as applying a diagonal weighting matrix to the state snapshots.
class HadamardTransformer(opinf.pre.TransformerTemplate):
"""Hadamard product transformer (weighting)."""
def __init__(self, w, name=None):
"""Set the product vector."""
super().__init__(name)
self.w = w
self.winv = 1 / w
# Required methods --------------------------------------------------------
def fit_transform(self, states, inplace=False):
"""Learn and apply the transformation."""
self.state_dimension = self.w.size
return self.transform(states, inplace=inplace)
def transform(self, states, inplace=False):
"""Apply the learned transformation."""
out = states if inplace else np.empty_like(states)
w = self.w
if states.ndim == 2:
w = w.reshape((self.state_dimension, 1))
out[:] = states * w
return out
def inverse_transform(self, states_transformed, inplace=False, locs=None):
"""Apply the inverse of the learned transformation."""
winv = self.winv
if locs is not None:
winv = winv[locs]
if states_transformed.ndim == 2:
winv = winv.reshape((-1, 1))
states = (
states_transformed
if inplace
else np.empty_like(states_transformed)
)
states[:] = states_transformed * winv
return states
def transform_ddts(self, ddts, inplace=False):
"""Apply the learned transformation to snapshot time derivatives."""
return self.transform(ddts, inplace=inplace)
def save(self, savefile, overwrite=False):
"""Save the transformer to an HDF5 file."""
with opinf.utils.hdf5_savehandle(savefile, overwrite) as hf:
hf.create_dataset("w", data=self.w)
if self.name is not None:
meta = hf.create_dataset("meta", shape=(0,))
meta.attrs["name"] = self.name
@classmethod
def load(cls, loadfile):
"""Load a transformer from an HDF5 file."""
name = None
with opinf.utils.hdf5_loadhandle(loadfile) as hf:
w = hf["w"][:]
if "meta" in hf:
name = str(hf["meta"].attrs["name"])
return cls(w, name=name)
w = np.random.uniform(size=pressure.shape[0])
ht = HadamardTransformer(w, name="Pressure weighter")
pressure_weighted = ht.fit_transform(pressure)
ht.verify()
transform() and inverse_transform() are consistent
transform() and transform_ddts() are consistent
Developer Notes
In this example, the
state_dimension
could be set in the constructor because thew
argument is a vector of length \(n\). However, thestate_dimension
is not required to be set untilfit_transform()
.Because the transformation is dictated by the choice of
w
and not calibrated from data,fit_transform()
simply callstransform()
.When
locs
is provided ininverse_transform()
, it is assumed that thestates_transformed
are the elements of the state vector at the given locations. That is,inverse_transform(transform(states)[locs], locs) == states[locs]
.