confusius.extract¶
extract ¶
Signal extraction from fUSI data.
Modules:
-
labels–Extraction of region-aggregated signals using integer label maps.
-
mask–Extraction of signals using boolean masks.
-
reconstruction–Reconstruction of fUSI DataArrays from N-D signals using masks.
Functions:
-
extract_with_labels–Extract region-aggregated signals from fUSI data using an integer label map.
-
extract_with_mask–Extract signals from fUSI data using a binary mask.
-
unmask–Reconstruct a fUSI DataArray from N-D signals using a mask.
extract_with_labels ¶
extract_with_labels(
data: DataArray,
labels: DataArray,
reduction: Literal[
"mean", "sum", "median", "min", "max", "var", "std"
] = "mean",
) -> DataArray
Extract region-aggregated signals from fUSI data using an integer label map.
For each unique non-zero label in labels, applies reduction across all voxels
belonging to that region. The spatial dimensions are collapsed into a single
regions dimension.
Parameters:
-
(data¶DataArray) –Input array with spatial dimensions matching
labels. Can have any number of non-spatial dimensions (e.g.,time,pose). The spatial dimensions must match those inlabels. -
(labels¶DataArray) –Integer label map in one of two formats:
- Flat label map: Spatial dims only, e.g.
(z, y, x). Background voxels labeled0; each unique non-zero integer identifies a distinct, non-overlapping region. Theregioncoordinate of the output holds the integer label values. - Stacked mask format: Has a leading
maskdimension followed by spatial dims, e.g.(mask, z, y, x). Each layer has values in{0, region_id}and regions may overlap. Theregioncoordinate of the output holds themaskcoordinate values (e.g., region label).
- Flat label map: Spatial dims only, e.g.
-
(reduction¶(mean, sum, median, min, max, var, std), default:"mean") –Aggregation function applied across voxels in each region.
Returns:
-
DataArray–Array with spatial dimensions replaced by a
regiondimension. All non-spatial dimensions are preserved.For example (flat label map):
(time, z, y, x)→(time, region)(time, pose, z, y, x)→(time, pose, region)(z, y, x)→(region,)
Raises:
-
ValueError–If
labelsdimensions don't matchdata's spatial dimensions, if coordinates don't match, ifreductionis not a valid option, or iflabelscontains no non-zero values. -
TypeError–If
labelsis not integer dtype.
Notes
Uses flox for efficient, lazy groupby reductions on Dask-backed arrays. Data can be chunked along any dimension without restriction.
Examples:
>>> import xarray as xr
>>> import numpy as np
>>> from confusius.extract import extract_with_labels
>>>
>>> # 3D+t data: (time, z, y, x)
>>> data = xr.DataArray(
... np.random.randn(100, 10, 20, 30),
... dims=["time", "z", "y", "x"],
... )
>>> labels = xr.DataArray(
... np.zeros((10, 20, 30), dtype=int),
... dims=["z", "y", "x"],
... )
>>> labels[0, :, :] = 1 # Region 1: first z-slice.
>>> labels[1, :, :] = 2 # Region 2: second z-slice.
>>> signals = extract_with_labels(data, labels)
>>> signals.dims
('time', 'region')
>>> signals.coords["region"].values
array([1, 2])
>>>
>>> # Stacked mask format from Atlas.get_masks.
>>> mask = atlas_fusi.get_masks(["VISp", "AUDp"])
>>> signals = extract_with_labels(data, mask)
>>> signals.coords["region"].values
array(['VISp', 'AUDp'], dtype=object)
extract_with_mask ¶
Extract signals from fUSI data using a binary mask.
This function flattens the spatial dimensions specified by the mask into a
single space dimension, while preserving all other dimensions (e.g., time,
pose).
Parameters:
-
(data¶DataArray) –Input array with spatial dimensions matching the mask. Can have any number of non-spatial dimensions (e.g.,
time,pose). The spatial dimensions must match those in the mask. -
(mask¶DataArray) –Mask defining which voxels to extract. Its dimensions define the spatial dimensions that will be flattened. Must have boolean dtype, or integer dtype with exactly one non-zero value (0 = background, one region id = foreground). The latter format is produced by
Atlas.get_masks. Coordinates must match data.
Returns:
-
DataArray–Array with spatial dimensions flattened into a
spacedimension. All non-spatial dimensions are preserved. Thespacedimension has a MultiIndex storing spatial coordinates.For example:
(time, z, y, x)→(time, space)(time, pose, z, y, x)→(time, pose, space)(z, y, x)→(space,)
For simple round-trip reconstruction, use
.unstack("space")which re-creates the original DataArray using the smallest bounding box containing the masked voxels. For full mask shape reconstruction, useconfusius.extract.unmask.
Raises:
-
ValueError–If
maskdimensions don't matchdata's spatial dimensions, or ifdatahas fewer than 2 spatial dimensions. -
TypeError–If
maskis not boolean dtype.
Examples:
>>> import xarray as xr
>>> import numpy as np
>>> from confusius.extract import extract_with_mask
>>>
>>> # 3D+t data: (time, z, y, x)
>>> data = xr.DataArray(
... np.random.randn(100, 10, 20, 30),
... dims=["time", "z", "y", "x"],
... )
>>> mask = xr.DataArray(
... np.random.rand(10, 20, 30) > 0.5,
... dims=["z", "y", "x"],
... )
>>> signals = extract_with_mask(data, mask)
>>> signals.dims
("time", "space")
>>>
>>> # 3D+t data with extra dim: (time, pose, z, y, x)
>>> pose_data = xr.DataArray(
... np.random.randn(100, 5, 10, 20, 30),
... dims=["time", "pose", "z", "y", "x"],
... )
>>> pose_signals = extract_with_mask(pose_data, mask)
>>> pose_signals.dims
("time", "pose", "space")
unmask ¶
unmask(
signals: ndarray | DataArray,
mask: DataArray,
new_dims: list[str] | None = None,
new_dims_coords: dict[str, ndarray] | None = None,
attrs: dict | None = None,
fill_value: float = 0.0,
) -> DataArray
Reconstruct a fUSI DataArray from N-D signals using a mask.
Parameters:
-
(signals¶ndarray or DataArray) –Array with shape
(..., space)where...can be any number of dimensions. The last dimension must correspond to masked voxels.- If
signalsis a DataArray, it must have aspacedimension as the last dimension. All other dimensions and their coordinates are preserved. - If
signalsis a Numpy array, you can specify names and coordinates for the leading dimensions usingnew_dimsandnew_dims_coords. If not provided, dimensions are named["dim_0", "dim_1", ...]with integer coordinates.
- If
-
(mask¶DataArray) –Boolean mask used for the original extraction. Provides spatial dimensions and coordinates for reconstruction. Must have the same spatial dimensions and coordinates as the original data.
-
(new_dims¶list of str, default:None) –Names for leading dimensions when
signalsis a Numpy array. Must match the number of leading dimensions(ndim - 1). If not provided, uses["dim_0", "dim_1", ...]. Ignored ifsignalsis a DataArray. -
(new_dims_coords¶dict[str, ndarray], default:None) –Coordinates for leading dimensions when
signalsis a Numpy array. Keys must match dimension names innew_dims. If not provided, uses integer indices for all dimensions. Ignored ifsignalsis a DataArray. -
(attrs¶dict, default:None) –Attributes to attach to the output DataArray.
-
(fill_value¶float, default:0.0) –Value to fill in non-masked voxels.
Returns:
-
DataArray–Reconstructed DataArray with shape
(..., z, y, x)where spatial coordinates come from the mask.
Raises:
-
ValueError–If
signalsshape doesn't matchmask, or ifnew_dims/new_dims_coordsare inconsistent withsignalsshape.
Examples:
>>> import xarray as xr
>>> import numpy as np
>>> from confusius.extract import extract_with_mask, unmask
>>> from sklearn.decomposition import PCA
>>>
>>> # Load data and mask
>>> data = xr.open_zarr("recording.zarr")["power_doppler"]
>>> mask = xr.open_zarr("brain_mask.zarr")["mask"]
>>>
>>> # Extract signals
>>> signals = extract_with_mask(data, mask)
>>>
>>> # Apply PCA
>>> pca = PCA(n_components=5)
>>> components = pca.fit_transform(signals.values) # (time, 5)
>>>
>>> # Unmask - 2D case
>>> spatial_pca = unmask(
... components.T, # (5, n_voxels)
... mask,
... new_dims=["component"],
... )
>>> spatial_pca.dims
("component", "z", "y", "x")
>>>
>>> # Unmask - 3D case with custom coords
>>> pose_data = np.random.randn(5, 3, n_voxels) # (component, pose, space)
>>> spatial_pose = unmask(
... pose_data,
... mask,
... new_dims=["component", "pose"],
... new_dims_coords={"component": [1, 2, 3, 4, 5], "pose": [0, 1, 2]},
... )
>>> spatial_pose.dims
("component", "pose", "z", "y", "x")