opinf.pre

`opinf.pre`#

Tools for preprocessing snapshot data after (optional) lifting but prior to compression.

Classes

`TransformerTemplate`	Template class for transformers.
`NullTransformer`	Identity transformation \(\q\mapsto\q\).
`ShiftTransformer`	Shift snapshots by a given reference snapshot \(\bar{\q}\).
`ScaleTransformer`	Scale (nondimensionalize) snapshots as a whole or by row.
`ShiftScaleTransformer`	Process snapshots by vector centering and/or affine scaling (in that order).
`TransformerPipeline`	Chain multiple transformers.
`TransformerMulti`	Join transformers together for states with multiple variables.

Functions

`shift`	Shift the columns of a snapshot matrix by a vector.
`scale`	Scale the entries of a snapshot matrix to a specified interval.

Overview

Operator Inference performance often improves when the training data are standardized. Multivariable data in particular benefits from preprocessing.
opinf.pre classes define invertible transformations for data standardization.

Example Data

The examples on this page use data downsampled from the combustion problem described in [SKHW20].

You can download the data here to repeat the experiments. The full dataset is available here.

import numpy as np
import matplotlib.pyplot as plt

import opinf

opinf.utils.mpl_config()

Preprocessing Data#

Raw dynamical systems data often need to be lightly preprocessed before use in Operator Inference. This module includes tools for centering/shifting and scaling/nondimensionalization of snapshot data after lifting (when applicable) and prior to dimensionality reduction.

Notation

On this page,

\(\q \in \RR^n\) denotes the unprocessed state variable for which we have \(k\) snapshots \(\q_0,\ldots,\q_{k-1}\in\RR^n\),
\(\q'\in\RR^n\) denotes state variable after being shifted (centered), and
\(\q''\in\RR^n\) denotes the state variable after being shifted and scaled (non-dimensionalized).

The tools demonstrated here define a mapping \(\mathcal{T}:\RR^n\to\RR^n\) with \(\q'' = \mathcal{T}(\q)\).

Lifting and Preprocessing

A lifting map can be viewed as a type of preprocessing map, \(\mathcal{L}:\RR^{n_1}\to\RR^{n_2}\). However, the preprocessing transformations defined in this module map from a vector space back to itself (\(n_1 = n_2\)) while lifting maps may augment the state with additional variables (\(n_2 \ge n_1\)).

Fit-and-Transform versus Transform

Pre-processing transformation classes are calibrated through user-provided hyperparameters in the constructor and/or training snapshots passed to fit() or fit_transform(). The transform() method applies but does not alter the transformation. Some transformations are designed so that the transformed training data has certain properties, but those properties are not guaranteed to hold for transformed data that was not used for training.

Shifting / Centering#

A common first preprocessing step is to shift the training snapshots by some reference snapshot \(\bar{\q}\in\RR^n\), i.e.,

\[ \q' = \q - \bar{\q}. \]

The ShiftTransformer receives a reference snapshot \(\bar{\q}\) and applies this transformation. This is useful for scenarios where a specific \(\bar{\q}\) can result in desirable properties in the shifted data, such as homogeneous boundary conditions.

# Load the example snapshot data.
snapshots = np.load("pre_example.npy")

snapshots.shape

(4815, 100)

# Extract the pressure variable from the snapshot data.
pressure = np.split(snapshots, 9, axis=0)[0]

# Initialize a ShiftTransformer for shifting the pressure so that
# each row has a minimum of 0.
pressure_shifter = opinf.pre.ShiftTransformer(
    pressure.min(axis=1),
    name="pressure",
)
print(pressure_shifter)

ShiftTransformer for variable 'pressure'
  state_dimension: 535

pressure_shifted = pressure_shifter.fit_transform(pressure)
pressure_shifted.shape

(535, 100)

print(f"minimum pressure before shift: {pressure.min():.2e}")
print(f"minimum pressure after shift:  {pressure_shifted.min():.2e}")

minimum pressure before shift: 9.14e+05
minimum pressure after shift:  0.00e+00

One strategy that is often effective for Operator Inference is to set the reference snapshot to be the average of the training snapshots:

\[ \bar{\q} := \frac{1}{k}\sum_{j=0}^{k-1}\q_{j}. \]

In this case, the transformed snapshots \(\q_j' = \q_j - \bar{\q}\) are centered around \(\0\). This type of transformation can be accomplished using a ShiftScaleTransformer with centering=True.

# Initialize a ShiftScaleTransformer for centering the pressure.
pressure_transformer = opinf.pre.ShiftScaleTransformer(
    centering=True,
    name="pressure",
    verbose=True,
)
print(pressure_transformer)

ShiftScaleTransformer for variable 'pressure'
  centering: True
  scaling:   None

# Shift the pressure snapshots by the average pressure snapshot.
pressure_shifted = pressure_transformer.fit_transform(pressure)

<pressure>
Learned mean centering Q -> Q'
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  9.141e+05 |  1.143e+06 |  1.432e+06 |  6.470e+04
Q'  | -2.332e+05 | -3.388e-12 |  2.743e+05 |  6.461e+04

# Plot the distribution of the entries of the raw and processed states.
fig, axes = plt.subplots(1, 2, sharey=True)
axes[0].hist(pressure.flatten(), bins=40)
axes[1].hist(pressure_shifted.flatten(), bins=40)

axes[0].set_ylabel("Frequency")
axes[0].set_xlabel("Pressure")
axes[1].set_xlabel("Shifted pressure")

fig.tight_layout()
plt.show()

../../_images/2227d8447c1da78e0092b3c5aadacf5f22e1a343c185afba01460290aa26c22f.png

Shifting Affects Model Form

Introducing a shift can cause a structural change in the governing dynamics. When shifting state variables, the structure of a reduced-order model should be determined based on the dynamics of the shifted variable, not the original variable.

Scaling / Non-dimensionalization#

Many engineering problems feature multiple variables with ranges across different scales. For such cases, it is often beneficial to scale the variables to similar ranges so that one variable does not overwhelm the other during operator learning. In other words, training data should be nondimensionalized when possible.

A scaling operation for a single variable is given by

\[ \q'' = \alpha\q', \]

where \(\alpha \neq 0\) and \(\q'\) is a training snapshot after shifting (when desired). The ScaleTransformer class receives a scaler \(\alpha\) and implements this transformation.

# Initialize a ScaleTransformer for scaling the pressure to [0, 1].
pressure_scaler = opinf.pre.ScaleTransformer(
    1 / pressure.max(), name="pressure"
)

print(pressure_scaler)

ScaleTransformer for variable 'pressure'
  scaler: 6.9816e-07

# Apply the scaling.
pressure_scaled = pressure_scaler.fit_transform(pressure)
pressure_scaled.shape

(535, 100)

print(f"min pressure before scaling: {pressure.min():.2e}")
print(f"max pressure before scaling: {pressure.max():.2e}")
print(f"min pressure after scaling:  {pressure_scaled.min():.2e}")
print(f"max pressure after scaling:  {pressure_scaled.max():.2e}")

min pressure before scaling: 9.14e+05
max pressure before scaling: 1.43e+06
min pressure after scaling:  6.38e-01
max pressure after scaling:  1.00e+00

The entries of the state can be scaled individually by passing a vector to ScaleTransformer.

# Scale the pressure so the maximum of each row is 1.
pressure_scaler = opinf.pre.ScaleTransformer(
    1 / pressure.max(axis=1), name="pressure"
)

print(pressure_scaler)

ScaleTransformer for variable 'pressure'
  state_dimension: 535
  scaling by row

# Apply the scaling.
pressure_scaled2 = pressure_scaler.fit_transform(pressure)
pressure_scaled2.shape

(535, 100)

print(
    "number of rows whose maximum is 1 (whole scaling): "
    f"{np.count_nonzero(np.isclose(pressure_scaled.max(axis=1), 1))}"
)
print(
    "number of rows whose maximum is 1 (row scaling): "
    f"{np.count_nonzero(np.isclose(pressure_scaled2.max(axis=1), 1))}"
)

number of rows whose maximum is 1 (whole scaling): 1
number of rows whose maximum is 1 (row scaling): 535

The ShiftScaleTransformer class implements several types of scalings that are calibrated from data. For example, setting scaling="maxabs" scales the training data by the inverse of its absolute maximum entry so that the resulting data lies in the interval \([-1, 1]\).

# Extract the velocity in the x direction.
xvelocity = np.split(snapshots, 9, axis=0)[1]

# Initialize a ShiftScaleTransformer for scaling the velocity to [-1, 1].
xvelocity_scaler = opinf.pre.ShiftScaleTransformer(
    centering=False,
    scaling="maxabs",
    name="x velocity",
)

print(xvelocity_scaler)

ShiftScaleTransformer for variable 'x velocity'
  centering: False
  scaling:  'maxabs'
  byrow:     False

# Apply the scaling.
xvelocity_scaled = xvelocity_scaler.fit_transform(xvelocity)
xvelocity_scaled.shape

(535, 100)

print(f"min x-velocity before scaling: {xvelocity.min():.2e}")
print(f"max x-velocity before scaling: {xvelocity.max():.2e}")
print(f"min x-velocity after scaling:  {xvelocity_scaled.min():.2e}")
print(f"max x-velocity after scaling:  {xvelocity_scaled.max():.2e}")

min x-velocity before scaling: -3.19e+02
max x-velocity before scaling: 3.13e+02
min x-velocity after scaling:  -1.00e+00
max x-velocity after scaling:  9.79e-01

The ShiftScaleTransformer class can perform a mean-centering shift, followed by a data-driven scaling. To link a custom shift with a custom scaling, instantiate a ShiftTransformer and a ScaleTransformer and pass them to a TransformerPipeline.

# Combine the shift to zero from before with a custom scaling.
pressure_scaler = opinf.pre.ScaleTransformer(1e-6, "pressure")
pressure_transformer2 = opinf.pre.TransformerPipeline(
    [pressure_shifter, pressure_scaler], name="pressure"
)

print(pressure_transformer2)

TransformerPipeline for variable 'pressure'
  state_dimension: 535
  num_transformers: 2
  transformers
    ShiftTransformer for variable 'pressure'
      state_dimension: 535
    ScaleTransformer for variable 'pressure'
      scaler: 1.0000e-06

# Apply the scaling.
pressure_transformed = pressure_transformer2.fit_transform(pressure)
pressure_transformed.shape

(535, 100)

print(f"min pressure before shifting/scaling: {pressure.min():.2e}")
print(f"max pressure before shifting/scaling: {pressure.max():.2e}")
print(f"min pressure after shifting/scaling: {pressure_transformed.min():.2e}")
print(f"max pressure after shifting/scaling: {pressure_transformed.max():.2e}")

min pressure before shifting/scaling: 9.14e+05
max pressure before shifting/scaling: 1.43e+06
min pressure after shifting/scaling: 0.00e+00
max pressure after shifting/scaling: 5.05e-01

No Free Lunch

Choosing an advantageous preprocessing strategy is highly problem dependent, and the tools in this module are not the only ways to preprocess snapshot data. See, for example, [IK23] for a compelling application of Operator Inference to solar wind streams in which preprocessing plays a vital role.

Multivariable Data#

For systems where the full state consists of several variables (pressure, velocity, temperature, etc.), it may not be appropriate to apply the same scaling to each variable. The TransformerMulti class joins individual transformers together to handle multi-state data.

Below, we construct the following transformation for the nine state variables.

Pressure: center, then scale to \([-1, 1]\).
\(x\)-velocity: Scale to \([-1, 1]\).
\(y\)-velocity: Scale to \([-1, 1]\).
Temperature: center, then scale to \([-1, 1]\).
Specific volume: scale to \([0, 1]\).
Chemical species: scale to \([0, 1]\).

combustion_transformer = opinf.pre.TransformerMulti(
    transformers=[
        opinf.pre.ShiftScaleTransformer(
            name="pressure", centering=True, scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="x-velocity", scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="y-velocity", scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="temperature", centering=True, scaling="maxabs", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="specific volume", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="methane", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="oxygen", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="carbon dioxide", scaling="minmax", verbose=True
        ),
        opinf.pre.ShiftScaleTransformer(
            name="water", scaling="minmax", verbose=True
        ),
    ]
)

snapshots_preprocessed = combustion_transformer.fit_transform(snapshots)

<pressure>
Learned mean centering Q -> Q' and maxabs scaling Q' -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  9.141e+05 |  1.143e+06 |  1.432e+06 |  6.470e+04
Q'  | -2.332e+05 | -3.388e-12 |  2.743e+05 |  6.461e+04
Q'' | -8.503e-01 | -1.243e-17 |  1.000e+00 |  2.355e-01

<x-velocity>
Learned maxabs scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   | -3.194e+02 |  7.419e+01 |  3.127e+02 |  5.663e+01
Q'' | -1.000e+00 |  2.323e-01 |  9.791e-01 |  1.773e-01

<y-velocity>
Learned maxabs scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   | -2.048e+02 |  1.487e+00 |  1.341e+02 |  1.970e+01
Q'' | -1.000e+00 |  7.259e-03 |  6.545e-01 |  9.618e-02

<temperature>
Learned mean centering Q -> Q' and maxabs scaling Q' -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  2.670e+02 |  9.724e+02 |  2.742e+03 |  5.765e+02
Q'  | -1.306e+03 | -2.380e-15 |  1.722e+03 |  4.329e+02
Q'' | -7.586e-01 | -1.378e-18 |  1.000e+00 |  2.513e-01

<specific volume>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  1.064e-01 |  3.350e-01 |  1.017e+00 |  2.018e-01
Q'' |  0.000e+00 |  2.510e-01 |  1.000e+00 |  2.216e-01

<methane>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  0.000e+00 |  3.511e-02 |  5.861e-01 |  9.908e-02
Q'' |  0.000e+00 |  5.991e-02 |  1.000e+00 |  1.691e-01

<oxygen>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  0.000e+00 |  3.845e-02 |  6.603e-02 |  2.267e-02
Q'' |  0.000e+00 |  5.823e-01 |  1.000e+00 |  3.433e-01

<carbon dioxide>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  0.000e+00 |  1.043e-01 |  1.598e-01 |  4.223e-02
Q'' |  0.000e+00 |  6.528e-01 |  1.000e+00 |  2.643e-01

<water>
Learned minmax scaling Q -> Q''
    |     min    |    mean    |     max    |    std
----|------------|------------|------------|------------
Q   |  0.000e+00 |  1.649e-03 |  7.705e-03 |  2.339e-03
Q'' |  0.000e+00 |  2.141e-01 |  1.000e+00 |  3.035e-01

print(combustion_transformer)

TransformerMulti
  state_dimension: 4815
  num_variables:   9
  transformers
    ShiftScaleTransformer for variable 'pressure'
      state_dimension: 535
      centering: True
      scaling:  'maxabs'
      byrow:     False
    ShiftScaleTransformer for variable 'x-velocity'
      state_dimension: 535
      centering: False
      scaling:  'maxabs'
      byrow:     False
    ShiftScaleTransformer for variable 'y-velocity'
      state_dimension: 535
      centering: False
      scaling:  'maxabs'
      byrow:     False
    ShiftScaleTransformer for variable 'temperature'
      state_dimension: 535
      centering: True
      scaling:  'maxabs'
      byrow:     False
    ShiftScaleTransformer for variable 'specific volume'
      state_dimension: 535
      centering: False
      scaling:  'minmax'
      byrow:     False
    ShiftScaleTransformer for variable 'methane'
      state_dimension: 535
      centering: False
      scaling:  'minmax'
      byrow:     False
    ShiftScaleTransformer for variable 'oxygen'
      state_dimension: 535
      centering: False
      scaling:  'minmax'
      byrow:     False
    ShiftScaleTransformer for variable 'carbon dioxide'
      state_dimension: 535
      centering: False
      scaling:  'minmax'
      byrow:     False
    ShiftScaleTransformer for variable 'water'
      state_dimension: 535
      centering: False
      scaling:  'minmax'
      byrow:     False

# Extract a single variable from the processed snapshots.
oxygen_processed = combustion_transformer.get_var(
    "oxygen",
    snapshots_preprocessed,
)

oxygen_processed.shape

(535, 100)

Custom Transformers#

New transformers can be defined by inheriting from the TransformerTemplate. Once implemented, the verify() method may be used to test for consistency between the required methods.

class MyTransformer(opinf.pre.TransformerTemplate):
    """Custom pre-processing transformation."""

    def __init__(self, hyperparameters, name=None):
        """Set any transformation hyperparameters.
        If there are no hyperparameters, __init__() may be omitted.
        """
        super().__init__(name)
        # Process/store 'hyperparameters' here.

    # Required methods --------------------------------------------------------
    def fit_transform(self, states, inplace=False):
        """Learn and apply the transformation."""
        # Set self.state_dimension in this method, e.g.,
        self.state_dimension = len(states)
        raise NotImplementedError

    def transform(self, states, inplace=False):
        """Apply the learned transformation."""
        raise NotImplementedError

    def inverse_transform(self, states_transformed, inplace=False, locs=None):
        """Apply the inverse of the learned transformation."""
        raise NotImplementedError

    # Optional methods --------------------------------------------------------
    # These may be deleted if not implemented.
    def transform_ddts(self, ddts, inplace=False):
        """Apply the learned transformation to snapshot time derivatives."""
        return NotImplemented

    def save(self, savefile, overwrite=False):
        """Save the transformer to an HDF5 file."""
        return NotImplemented

    @classmethod
    def load(cls, loadfile):
        """Load a transformer from an HDF5 file."""
        return NotImplemented

See the TransformerTemplate page for details on the arguments for each method.

Example: Hadamard Scaling#

The following class implements the transformation \(\mathcal{T}(\q) = \q \ast \w\) where \(\ast\) is the Hadamard (elementwise) product and \(\s\in\RR^{n}\) is a given vector with all nonzero entries. The inverse of this transform is \(\mathcal{T}^{-1}(\q) = \q \ast \w'\) where the entries of \(\w'\in\RR^{n}\) are the inverse of the entries of \(\w\). This transformation is equivalent to ScaleTransformer with scaler set to \(\w\) and can be interpreted as applying a diagonal weighting matrix to the state snapshots.

class HadamardTransformer(opinf.pre.TransformerTemplate):
    """Hadamard product transformer (weighting)."""

    def __init__(self, w, name=None):
        """Set the product vector."""
        super().__init__(name)
        self.w = w
        self.winv = 1 / w

    # Required methods --------------------------------------------------------
    def fit_transform(self, states, inplace=False):
        """Learn and apply the transformation."""
        self.state_dimension = self.w.size
        return self.transform(states, inplace=inplace)

    def transform(self, states, inplace=False):
        """Apply the learned transformation."""
        out = states if inplace else np.empty_like(states)
        w = self.w
        if states.ndim == 2:
            w = w.reshape((self.state_dimension, 1))
        out[:] = states * w
        return out

    def inverse_transform(self, states_transformed, inplace=False, locs=None):
        """Apply the inverse of the learned transformation."""
        winv = self.winv
        if locs is not None:
            winv = winv[locs]
        if states_transformed.ndim == 2:
            winv = winv.reshape((-1, 1))
        states = (
            states_transformed
            if inplace
            else np.empty_like(states_transformed)
        )
        states[:] = states_transformed * winv
        return states

    def transform_ddts(self, ddts, inplace=False):
        """Apply the learned transformation to snapshot time derivatives."""
        return self.transform(ddts, inplace=inplace)

    def save(self, savefile, overwrite=False):
        """Save the transformer to an HDF5 file."""
        with opinf.utils.hdf5_savehandle(savefile, overwrite) as hf:
            hf.create_dataset("w", data=self.w)
            if self.name is not None:
                meta = hf.create_dataset("meta", shape=(0,))
                meta.attrs["name"] = self.name

    @classmethod
    def load(cls, loadfile):
        """Load a transformer from an HDF5 file."""
        name = None
        with opinf.utils.hdf5_loadhandle(loadfile) as hf:
            w = hf["w"][:]
            if "meta" in hf:
                name = str(hf["meta"].attrs["name"])
        return cls(w, name=name)

w = np.random.uniform(size=pressure.shape[0])
ht = HadamardTransformer(w, name="Pressure weighter")
pressure_weighted = ht.fit_transform(pressure)

ht.verify()

transform() and inverse_transform() are consistent
transform() and transform_ddts() are consistent

Developer Notes

In this example, the state_dimension could be set in the constructor because the w argument is a vector of length \(n\). However, the state_dimension is not required to be set until fit_transform().
Because the transformation is dictated by the choice of w and not calibrated from data, fit_transform() simply calls transform().
When locs is provided in inverse_transform(), it is assumed that the states_transformed are the elements of the state vector at the given locations. That is,inverse_transform(transform(states)[locs], locs) == states[locs].