Skip to content

confusius.extract

extract

Signal extraction from fUSI data.

Modules:

  • labels

    Extraction of region-aggregated signals using integer label maps.

  • mask

    Extraction of signals using boolean masks.

  • reconstruction

    Reconstruction of fUSI DataArrays from N-D signals using masks.

Functions:

  • extract_with_labels

    Extract region-aggregated signals from fUSI data using an integer label map.

  • extract_with_mask

    Extract signals from fUSI data using a binary mask.

  • unmask

    Reconstruct a fUSI DataArray from N-D signals using a mask.

extract_with_labels

extract_with_labels(
    data: DataArray,
    labels: DataArray,
    reduction: Literal[
        "mean", "sum", "median", "min", "max", "var", "std"
    ] = "mean",
) -> DataArray

Extract region-aggregated signals from fUSI data using an integer label map.

For each unique non-zero label in labels, applies reduction across all voxels belonging to that region. The spatial dimensions are collapsed into a single regions dimension.

Parameters:

  • data

    (DataArray) –

    Input array with spatial dimensions matching labels. Can have any number of non-spatial dimensions (e.g., time, pose). The spatial dimensions must match those in labels.

  • labels

    (DataArray) –

    Integer label map in one of two formats:

    • Flat label map: Spatial dims only, e.g. (z, y, x). Background voxels labeled 0; each unique non-zero integer identifies a distinct, non-overlapping region. The region coordinate of the output holds the integer label values.
    • Stacked mask format: Has a leading mask dimension followed by spatial dims, e.g. (mask, z, y, x). Each layer has values in {0, region_id} and regions may overlap. The region coordinate of the output holds the mask coordinate values (e.g., region label).
  • reduction

    ((mean, sum, median, min, max, var, std), default: "mean" ) –

    Aggregation function applied across voxels in each region.

Returns:

  • DataArray

    Array with spatial dimensions replaced by a region dimension. All non-spatial dimensions are preserved.

    For example (flat label map):

    • (time, z, y, x)(time, region)
    • (time, pose, z, y, x)(time, pose, region)
    • (z, y, x)(region,)

Raises:

  • ValueError

    If labels dimensions don't match data's spatial dimensions, if coordinates don't match, if reduction is not a valid option, or if labels contains no non-zero values.

  • TypeError

    If labels is not integer dtype.

Notes

Uses flox for efficient, lazy groupby reductions on Dask-backed arrays. Data can be chunked along any dimension without restriction.

Examples:

>>> import xarray as xr
>>> import numpy as np
>>> from confusius.extract import extract_with_labels
>>>
>>> # 3D+t data: (time, z, y, x)
>>> data = xr.DataArray(
...     np.random.randn(100, 10, 20, 30),
...     dims=["time", "z", "y", "x"],
... )
>>> labels = xr.DataArray(
...     np.zeros((10, 20, 30), dtype=int),
...     dims=["z", "y", "x"],
... )
>>> labels[0, :, :] = 1  # Region 1: first z-slice.
>>> labels[1, :, :] = 2  # Region 2: second z-slice.
>>> signals = extract_with_labels(data, labels)
>>> signals.dims
('time', 'region')
>>> signals.coords["region"].values
array([1, 2])
>>>
>>> # Stacked mask format from Atlas.get_masks.
>>> mask = atlas_fusi.get_masks(["VISp", "AUDp"])
>>> signals = extract_with_labels(data, mask)
>>> signals.coords["region"].values
array(['VISp', 'AUDp'], dtype=object)

extract_with_mask

extract_with_mask(
    data: DataArray, mask: DataArray
) -> DataArray

Extract signals from fUSI data using a binary mask.

This function flattens the spatial dimensions specified by the mask into a single space dimension, while preserving all other dimensions (e.g., time, pose).

Parameters:

  • data

    (DataArray) –

    Input array with spatial dimensions matching the mask. Can have any number of non-spatial dimensions (e.g., time, pose). The spatial dimensions must match those in the mask.

  • mask

    (DataArray) –

    Mask defining which voxels to extract. Its dimensions define the spatial dimensions that will be flattened. Must have boolean dtype, or integer dtype with exactly one non-zero value (0 = background, one region id = foreground). The latter format is produced by Atlas.get_masks. Coordinates must match data.

Returns:

  • DataArray

    Array with spatial dimensions flattened into a space dimension. All non-spatial dimensions are preserved. The space dimension has a MultiIndex storing spatial coordinates.

    For example:

    • (time, z, y, x)(time, space)
    • (time, pose, z, y, x)(time, pose, space)
    • (z, y, x)(space,)

    For simple round-trip reconstruction, use .unstack("space") which re-creates the original DataArray using the smallest bounding box containing the masked voxels. For full mask shape reconstruction, use confusius.extract.unmask.

Raises:

  • ValueError

    If mask dimensions don't match data's spatial dimensions, or if data has fewer than 2 spatial dimensions.

  • TypeError

    If mask is not boolean dtype.

Examples:

>>> import xarray as xr
>>> import numpy as np
>>> from confusius.extract import extract_with_mask
>>>
>>> # 3D+t data: (time, z, y, x)
>>> data = xr.DataArray(
...     np.random.randn(100, 10, 20, 30),
...     dims=["time", "z", "y", "x"],
... )
>>> mask = xr.DataArray(
...     np.random.rand(10, 20, 30) > 0.5,
...     dims=["z", "y", "x"],
... )
>>> signals = extract_with_mask(data, mask)
>>> signals.dims
("time", "space")
>>>
>>> # 3D+t data with extra dim: (time, pose, z, y, x)
>>> pose_data = xr.DataArray(
...     np.random.randn(100, 5, 10, 20, 30),
...     dims=["time", "pose", "z", "y", "x"],
... )
>>> pose_signals = extract_with_mask(pose_data, mask)
>>> pose_signals.dims
("time", "pose", "space")

unmask

unmask(
    signals: ndarray | DataArray,
    mask: DataArray,
    new_dims: list[str] | None = None,
    new_dims_coords: dict[str, ndarray] | None = None,
    attrs: dict | None = None,
    fill_value: float = 0.0,
) -> DataArray

Reconstruct a fUSI DataArray from N-D signals using a mask.

Parameters:

  • signals

    (ndarray or DataArray) –

    Array with shape (..., space) where ... can be any number of dimensions. The last dimension must correspond to masked voxels.

    • If signals is a DataArray, it must have a space dimension as the last dimension. All other dimensions and their coordinates are preserved.
    • If signals is a Numpy array, you can specify names and coordinates for the leading dimensions using new_dims and new_dims_coords. If not provided, dimensions are named ["dim_0", "dim_1", ...] with integer coordinates.
  • mask

    (DataArray) –

    Boolean mask used for the original extraction. Provides spatial dimensions and coordinates for reconstruction. Must have the same spatial dimensions and coordinates as the original data.

  • new_dims

    (list of str, default: None ) –

    Names for leading dimensions when signals is a Numpy array. Must match the number of leading dimensions (ndim - 1). If not provided, uses ["dim_0", "dim_1", ...]. Ignored if signals is a DataArray.

  • new_dims_coords

    (dict[str, ndarray], default: None ) –

    Coordinates for leading dimensions when signals is a Numpy array. Keys must match dimension names in new_dims. If not provided, uses integer indices for all dimensions. Ignored if signals is a DataArray.

  • attrs

    (dict, default: None ) –

    Attributes to attach to the output DataArray.

  • fill_value

    (float, default: 0.0 ) –

    Value to fill in non-masked voxels.

Returns:

  • DataArray

    Reconstructed DataArray with shape (..., z, y, x) where spatial coordinates come from the mask.

Raises:

  • ValueError

    If signals shape doesn't match mask, or if new_dims/new_dims_coords are inconsistent with signals shape.

Examples:

>>> import xarray as xr
>>> import numpy as np
>>> from confusius.extract import extract_with_mask, unmask
>>> from sklearn.decomposition import PCA
>>>
>>> # Load data and mask
>>> data = xr.open_zarr("recording.zarr")["power_doppler"]
>>> mask = xr.open_zarr("brain_mask.zarr")["mask"]
>>>
>>> # Extract signals
>>> signals = extract_with_mask(data, mask)
>>>
>>> # Apply PCA
>>> pca = PCA(n_components=5)
>>> components = pca.fit_transform(signals.values)  # (time, 5)
>>>
>>> # Unmask - 2D case
>>> spatial_pca = unmask(
...     components.T,  # (5, n_voxels)
...     mask,
...     new_dims=["component"],
... )
>>> spatial_pca.dims
("component", "z", "y", "x")
>>>
>>> # Unmask - 3D case with custom coords
>>> pose_data = np.random.randn(5, 3, n_voxels)  # (component, pose, space)
>>> spatial_pose = unmask(
...     pose_data,
...     mask,
...     new_dims=["component", "pose"],
...     new_dims_coords={"component": [1, 2, 3, 4, 5], "pose": [0, 1, 2]},
... )
>>> spatial_pose.dims
("component", "pose", "z", "y", "x")