opinf.pre#

Tools for preprocessing snapshot data prior to compression.

Functions

shift

Shift the columns of a snapshot matrix by a vector.

scale

Scale the entries of a snapshot matrix to a specified interval.

Classes

ShiftScaleTransformer

Process snapshots by centering and/or scaling (in that order).

TransformerMulti

Join transformers together for states with multiple variables.

TransformerTemplate

Template class for transformers.

Preprocessing Data#

Raw dynamical systems data often need to be lightly preprocessed before use in Operator Inference. The tools in this module enable centering/shifting and scaling/nondimensionalization of snapshot data after lifting (when applicable) and prior to dimensionality reduction.

Notation

On this page,

  • \(\q \in \RR^n\) denotes the unprocessed state variable for which we have \(k\) snapshots \(\q_0,\ldots,\q_{k-1}\in\RR^n\),

  • \(\q'\in\RR^n\) denotes state variable after being shifted (centered), and

  • \(\q''\in\RR^n\) denotes the state variable after being shifted and scaled (non-dimensionalized).

The tools demonstrated here define a mapping \(\mathcal{T}:\RR^n\to\RR^n\) with \(\q'' = \mathcal{T}(\q)\).

Example Data

The examples on this page use data from the combustion problem described in [SKHW20]. The data consists of nine variables recorded at 100 points in time.

State Variables
  • Pressure \(p\)

  • \(x\)-velocity \(v_{x}\)

  • \(y\)-velocity \(v_{y}\)

  • Temperature \(T\)

  • Specific volume (inverse density) \(\xi = 1/\rho\)

  • Chemical species molar concentrations for CH\(_{4}\), O\(_{2}\), CO\(_{2}\), and H\(_{2}\)O.

The dimension of the spatial discretization in the full example in [SKHW20] is \(38{,}523\) per variable, so \(n = 9 \times 38{,}523 = 346{,}707\). Here we have downsampled the state dimension to \(535\) for each variable for demonstration purposes, i.e., \(n = 9 \times 535 = 4{,}815\).

You can download the data here to repeat the experiments.

import numpy as np
import matplotlib.pyplot as plt

import opinf

opinf.utils.mpl_config()
# Load the example snapshot data.
snapshots = np.load("pre_example.npy")

snapshots.shape
(4815, 100)

Shifting / Centering#

A common first preprocessing step is to shift the training snapshots by some reference snapshot \(\bar{\q}\in\RR^n\), i.e.,

\[ \q' = \q - \bar{\q}. \]

For example, the reference snapshot could be chosen to be the average of the training snapshots:

\[ \bar{\q} := \frac{1}{k}\sum_{j=0}^{k-1}\q_{j}. \]

In this case, the transformed snapshots \(\q_j' = \q_j - \bar{\q}\) are centered around \(\0\). This type of transformation can be accomplished using a ShiftScaleTransformer with centering=True or the shift() function.

# Extract the pressure variable from the snapshot data.
pressure = np.split(snapshots, 9, axis=0)[0]

# Initialize a ShiftScaleTransformer for centering the pressure variable.
pressure_transformer = opinf.pre.ShiftScaleTransformer(
    centering=True,
    verbose=True,
)

# Shift the pressure snapshots by the average pressure snapshot.
pressure_shifted = pressure_transformer.fit_transform(pressure)
Learned mean centering Q -> Q'
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  9.141e+05 |  1.143e+06 |  1.432e+06 |  6.470e+04
Q'  | -2.332e+05 | -3.388e-12 |  2.743e+05 |  6.461e+04
# Average pressure value.
np.mean(pressure)
1143160.4356617983
# Average shifted pressure value (zero).
np.mean(pressure_shifted)
-3.3880122632623834e-12
# Plot the distribution of the entries of the raw and processed states.
fig, axes = plt.subplots(1, 2, sharey=True)
axes[0].hist(pressure.flatten(), bins=40)
axes[1].hist(pressure_shifted.flatten(), bins=40)
axes[0].set_ylabel("Frequency")
axes[0].set_xlabel("Pressure")
axes[1].set_xlabel("Shifted pressure")
fig.tight_layout()
plt.show()
../../_images/8869b872b2eb371b397388699aa7888cb694e4d9231f5b5df27ef94f5c56659f.png

Shifting Affects Model Form

Introducing a shift can cause a structural change in the governing dynamics. When shifting state variables, the structure of a reduced-order model should be determined based on the dynamics of the shifted variable, not the original variable.

Example 1: Linear System

Consider the linear system

\[ \begin{align*} \ddt\q(t) = \A\q(t). \end{align*} \]

The dynamics of the shifted variable \(\q'(t) = \q(t) - \bar{\q}\) are given by

\[ \begin{align*} \ddt\q'(t) = \ddt[\q(t) - \bar{\q}] = \ddt\q(t) = \A\q(t) = \A[\bar{\q} + \q'(t)] = \A\bar{\q} + \A\q'(t), \end{align*} \]

which has a new constant term \(\A\bar{\q}\) in addition to a linear term \(\A\q'(t)\). If the variable \(\q\) is used for Operator Inference, the reduced-order model should take on the linear form \(\ddt\qhat(t) = \Ahat\qhat(t)\), while if \(\q'\) is the state variable, the reduced-order model should be \(\ddt\qhat(t) = \chat + \Ahat\qhat(t)\).

Example 2: Quadratic System

Consider the purely quadratic system

\[ \begin{align*} \ddt\q(t) = \H[\q(t)\otimes\q(t)], \end{align*} \]

where \(\otimes\) denotes the Kronecker product. An appropriate reduced-order model for this system is also quadratic, \(\ddt\qhat(t) = \Hhat[\qhat(t)\otimes\qhat(t)]\). However, the dynamics of the shifted variable \(\q'(t) = \q(t) - \bar{\q}\) includes lower-order terms:

\[\begin{split} \begin{align*} \ddt\q'(t) &= \ddt[\q(t) - \bar{\q}] \\ &= \H[\q(t)\otimes\q(t)] \\ &= \H[(\bar{\q} + \q'(t))\otimes(\bar{\q} + \q'(t))] \\ &= \H[\bar{\q}\otimes\bar{\q}] + \H[\bar{\q}\otimes\q'(t)] + \H[\q'(t)\otimes\bar{\q}] + \H[\q'(t)\otimes\q'(t)]. \end{align*} \end{split}\]

The terms \(\H[\bar{\q}\otimes\q'(t)] + \H[\q'(t)\otimes\bar{\q}]\) can be interpreted as a linear transformation of \(\q'(t)\), hence an appropriate reduced-order model for \(\q'(t)\) has the fully quadratic form \(\ddt\qhat(t) = \chat + \Ahat\qhat(t) + \Hhat[\qhat(t)\otimes\qhat(t)]\).

Scaling / Non-dimensionalization#

Many engineering problems feature multiple variables with ranges across different scales. For such cases, it is often beneficial to scale the variables to similar ranges so that one variable does not overwhelm the other in the operator learning.

A simple scaling is given by

\[ \q'' = \frac{1}{\alpha}\q', \]

where \(\alpha\) is chosen by examining the range of the training data. For example, after centering the data, a scaling to \([-1, 1]\) is given by

\[ \q'' = \frac{1}{\alpha}\big(\q - \bar{\q}\big) = \frac{1}{\alpha}\q', \qquad \alpha = \max_{i,j}|\tilde{q}_{ij}'| \]

where \(\tilde{q}_{ij}'\) is the \(i\)th entry of \(\q_j' = \q_j - \bar{\q}\).

The scaling argument of the ShiftScaleTransformer determines the type of scaling transformation; see also scale().

# Extract the H2O molar concentration.
water = np.split(snapshots, 9, axis=0)[-1]

# Compare the scales of the variables.
print(
    "Pressure range (raw):",
    f"[{pressure.min():.2e}, {pressure.max():.2e}]",
    sep="\t\t",
)
print(
    "Pressure range (shifted):",
    f"[{pressure_shifted.min():.2e}, {pressure_shifted.max():.2e}]",
    sep="\t",
)
print(
    "Water range:",
    f"[{water.min():.2e}, {water.max():.2e}]",
    sep="\t\t\t",
)
Pressure range (raw):		[9.14e+05, 1.43e+06]
Pressure range (shifted):	[-2.33e+05, 2.74e+05]
Water range:			[0.00e+00, 7.70e-03]
# Apply a min-max scaling to [0, .01] on the shifted pressure snapshots.
pressure_scaled, pscale1, pscale2 = opinf.pre.scale(
    pressure_shifted,
    scale_to=(0, 1e-2),
)
# Compare the scales of the variables.
print(
    "Pressure range (raw):",
    f"[{pressure.min():.2e}, {pressure.max():.2e}]",
    sep="\t\t",
)
print(
    "Pressure range (shifted):",
    f"[{pressure_shifted.min():.2e}, {pressure_shifted.max():.2e}]",
    sep="\t",
)
print(
    "Pressure range (scaled):",
    f"[{pressure_scaled.min():.2e}, {pressure_scaled.max():.2e}]",
    sep="\t",
)
print(
    "Water range:",
    f"[{water.min():.2e}, {water.max():.2e}]",
    sep="\t\t\t",
)
Pressure range (raw):		[9.14e+05, 1.43e+06]
Pressure range (shifted):	[-2.33e+05, 2.74e+05]
Pressure range (scaled):	[0.00e+00, 1.00e-02]
Water range:			[0.00e+00, 7.70e-03]

Note

Choosing an advantageous preprocessing strategy is highly problem dependent, and the tools in this module are not the only ways to preprocess snapshot data. See, for example, [IK23] for a compelling application of Operator Inference to solar wind streams in which preprocessing plays a vital role.

Multivariable Data#

For systems where the full state consists of several variables (pressure, velocity, temperature, etc.), it may not be appropriate to apply the same scaling to each variable. The TransformerMulti class joins individual transformers together to handle multi-state data.

Below, we construct the following transformation for the nine state variables.

  • Pressure: center, then scale to \([-1, 1]\).

  • \(x\)-velocity: Scale to \([-1, 1]\).

  • \(y\)-velocity: Scale to \([-1, 1]\).

  • Temperature: center, then scale to \([-1, 1]\).

  • Specific volume: scale to \([0, 1]\).

  • Chemical species: scale to \([0, 1]\).

combustion_transformer = opinf.pre.TransformerMulti(
    transformers=[
        opinf.pre.ShiftScaleTransformer(
            name="pressure", centering=True, scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="x-velocity", scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="y-velocity", scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="temperature", centering=True, scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="specific volume", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="methane", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="oxygen", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="carbon dioxide", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="water", scaling="minmax", verbose=True
        ),
    ]
)

snapshots_preprocessed = combustion_transformer.fit_transform(snapshots)
<pressure>
Learned mean centering Q -> Q' and maxabs scaling Q' -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  9.141e+05 |  1.143e+06 |  1.432e+06 |  6.470e+04
Q'  | -2.332e+05 | -3.388e-12 |  2.743e+05 |  6.461e+04
Q'' | -8.503e-01 | -1.243e-17 |  1.000e+00 |  2.355e-01

<x-velocity>
Learned maxabs scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   | -3.194e+02 |  7.419e+01 |  3.127e+02 |  5.663e+01
Q'' | -1.000e+00 |  2.323e-01 |  9.791e-01 |  1.773e-01

<y-velocity>
Learned maxabs scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   | -2.048e+02 |  1.487e+00 |  1.341e+02 |  1.970e+01
Q'' | -1.000e+00 |  7.259e-03 |  6.545e-01 |  9.618e-02

<temperature>
Learned mean centering Q -> Q' and maxabs scaling Q' -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  2.670e+02 |  9.724e+02 |  2.742e+03 |  5.765e+02
Q'  | -1.306e+03 | -2.380e-15 |  1.722e+03 |  4.329e+02
Q'' | -7.586e-01 | -1.378e-18 |  1.000e+00 |  2.513e-01

<specific volume>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  1.064e-01 |  3.350e-01 |  1.017e+00 |  2.018e-01
Q'' |  0.000e+00 |  2.510e-01 |  1.000e+00 |  2.216e-01

<methane>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  0.000e+00 |  3.511e-02 |  5.861e-01 |  9.908e-02
Q'' |  0.000e+00 |  5.991e-02 |  1.000e+00 |  1.691e-01

<oxygen>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  0.000e+00 |  3.845e-02 |  6.603e-02 |  2.267e-02
Q'' |  0.000e+00 |  5.823e-01 |  1.000e+00 |  3.433e-01

<carbon dioxide>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  0.000e+00 |  1.043e-01 |  1.598e-01 |  4.223e-02
Q'' |  0.000e+00 |  6.528e-01 |  1.000e+00 |  2.643e-01

<water>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  0.000e+00 |  1.649e-03 |  7.705e-03 |  2.339e-03
Q'' |  0.000e+00 |  2.141e-01 |  1.000e+00 |  3.035e-01
# Extract a single variable from the processed snapshots.
oxygen_processed = combustion_transformer.get_var(
    "oxygen",
    snapshots_preprocessed,
)

oxygen_processed.shape
(535, 100)

Custom Transformers#

New transformers can be defined by inheriting from the TransformerTemplate. Once implemented, the verify() method may be used to test for consistency between the required methods.

class MyTransformer(opinf.pre.TransformerTemplate):
    """Custom pre-processing transformation."""

    def __init__(self, hyperparameters, name=None):
        """Set any transformation hyperparameters.
        If there are no hyperparameters, __init__() may be omitted.
        """
        super().__init__(name)
        # Process/store 'hyperparameters' here.

    # Required methods --------------------------------------------------------
    def fit_transform(self, states, inplace=False):
        """Learn and apply the transformation."""
        # Set self.state_dimension in this method, e.g.,
        self.state_dimension = len(states)
        raise NotImplementedError

    def transform(self, states, inplace=False):
        """Apply the learned transformation."""
        raise NotImplementedError

    def inverse_transform(self, states_transformed, inplace=False, locs=None):
        """Apply the inverse of the learned transformation."""
        raise NotImplementedError

    # Optional methods --------------------------------------------------------
    # These may be deleted if not implemented.
    def transform_ddts(self, ddts, inplace=False):
        """Apply the learned transformation to snapshot time derivatives."""
        return NotImplemented

    def save(self, savefile, overwrite=False):
        """Save the transformer to an HDF5 file."""
        return NotImplemented

    @classmethod
    def load(cls, loadfile):
        """Load a transformer from an HDF5 file."""
        return NotImplemented

See the TransformerTemplate page for details on the arguments for each method.

Developer Note

In order for a custom transformer to interface correctly with TransformerMulti, the save() and load() methods should be implemented using opinf.utils.hdf5_savehandle() and opinf.utils.hdf5_loadhandle(), respectively.