confusius.signal¶
signal ¶
Signal processing module for fUSI time series.
Modules:
-
censor–Sample censoring and interpolation for motion scrubbing.
-
confounds–Confound regression functions for signal preprocessing.
-
core–Signal cleaning pipeline for fUSI time series.
-
detrending–Detrending functions for signal preprocessing.
-
filters–Filtering functions for signal preprocessing.
-
standardization–Standardization functions for signal preprocessing.
Functions:
-
censor_samples–Remove censored samples from signals.
-
clean–Clean signals with detrending, filtering, confound regression, and scrubbing.
-
compute_compcor_confounds–Extract noise components using the CompCor method.
-
detrend–Remove trends from signals across time.
-
filter_butterworth–Apply a low-pass, high-pass, or band-pass Butterworth digital filter to signals.
-
interpolate_samples–Interpolate censored samples from signals.
-
regress_confounds–Remove confounds from signals via linear regression.
-
standardize–Standardize signals across time.
censor_samples ¶
censor_samples(
signals: DataArray, sample_mask: DataArray
) -> DataArray
Remove censored samples from signals.
This function removes timepoints marked as censored (bad) from the signals. After censoring, the time series becomes irregular (non-uniform time steps). Be cautious with subsequent time-domain analyses that assume uniform sampling.
Parameters:
-
(signals¶(time, ...) xarray.DataArray) –Array to censor. Must have a
timedimension. Can be any shape, e.g., extracted signals(time, space), full 3D+t imaging data(time, z, y, x), or confounds(time, n_confounds). -
(sample_mask¶(time,) xarray.DataArray) –Boolean sample mask indicating which timepoints to keep (
True) vs. remove (False). Must have atimedimension matchingsignals. If bothsignalsandsample_maskhavetimecoordinates, they must match exactly.
Returns:
-
DataArray–Signals with censored timepoints removed. Shape is
(n_kept, ...)wheren_keptis the number ofTruevalues insample_mask. Time coordinates are subsetted to kept samples.
Raises:
-
ValueError–If
signalsdoesn't have atimedimension ortimecoordinates. -
TypeError–If
sample_maskis not an xarray.DataArray. -
ValueError–If
sample_maskhas wrong dtype, length, missing time dimension, or mismatched coordinates. -
ValueError–If all samples are censored (cannot interpolate).
Warns:
-
UserWarning–If all samples are kept (no censoring performed).
Examples:
Remove high-motion volumes:
>>> import xarray as xr
>>> import numpy as np
>>> from confusius.signal import censor_samples
>>> # Create signals.
>>> signals = xr.DataArray(
... np.random.randn(100, 50),
... dims=["time", "space"],
... coords={"time": np.arange(100) / 500}
... )
>>> # Mark frames to keep (False = remove).
>>> mask_values = np.ones(100, dtype=bool)
>>> mask_values[[10, 25, 60]] = False # Remove these frames.
>>> sample_mask = xr.DataArray(
... mask_values, dims=["time"], coords={"time": signals.coords["time"]}
... )
>>> # Remove censored samples.
>>> censored = censor_samples(signals, sample_mask)
>>> censored.sizes["time"] # 97 timepoints (3 removed).
97
Complete pre-scrubbing workflow:
>>> from confusius.signal import interpolate_samples, filter_butterworth
>>> # 1. Interpolate censored samples.
>>> interpolated = interpolate_samples(signals, sample_mask)
>>> # 2. Apply temporal filter.
>>> filtered = filter_butterworth(interpolated, high_cutoff=0.1)
>>> # 3. Remove censored samples.
>>> cleaned = censor_samples(filtered, sample_mask)
clean ¶
clean(
signals: DataArray,
*,
detrend_order: int | None = None,
standardize_method: Literal["zscore", "psc"]
| None = None,
low_cutoff: float | None = None,
high_cutoff: float | None = None,
filter_butterworth_kwargs: dict | None = None,
confounds: DataArray | None = None,
standardize_confounds: bool = True,
sample_mask: DataArray | None = None,
interpolate_method: InterpOptions = "linear",
) -> DataArray
Clean signals with detrending, filtering, confound regression, and scrubbing.
Cleaning steps are applied in the following order, according to recommendations by Lindquist et al. (2019):
- Interpolate censored samples (pre-scrubbing).
- Detrend.
- Butterworth filter.
- Censor samples.
- Regress confounds.
- Standardize.
This function is inspired by nilearn.signal.clean.
Parameters:
-
(signals¶(time, ...) xarray.DataArray) –Signals to clean. Must have a
timedimension. Can be any shape, e.g., extracted signals(time, voxels), full 3D+t imaging data(time, z, y, x), or regional signals(time, regions).Chunking along time is not supported
The
timedimension must NOT be chunked, except when using standardization or scrubbing only. Chunk only spatial dimensions:data.chunk({'time': -1}). -
(detrend_order¶int, default:None) –Polynomial order for detrending:
0: Remove mean (constant detrending).1: Remove linear trend using least squares regression (default).2+: Remove polynomial trend of specified order.
If not provided, no detrending is applied.
-
(standardize_method¶(zscore, psc), default:"zscore") –Standardization method. If not provided, no standardization is applied.
-
(low_cutoff¶float, default:None) –Low cutoff frequency in Hz. Frequencies below this are attenuated (acts as high-pass filter). If not provided, no high-pass filtering is applied.
-
(high_cutoff¶float, default:None) –High cutoff frequency in Hz. Frequencies above this are attenuated (acts as low-pass filter). If not provided, no low-pass filtering is applied.
-
(filter_butterworth_kwargs¶dict, default:None) –Extra keyword arguments passed to
confusius.signal.filter_butterworth. -
(confounds¶DataArray, default:None) –Confound regressors with shape
(time, n_confounds). When provided, confounds are detrended and filtered along with signals and then removed via regression. -
(confounds¶(time, n_confounds) xarray.DataArray, default:None) –Confound regressors to remove. Can have shape
(time,)for a single confound. The time dimension and coordinates must match the signals exactly. If not provided, no confound regression is applied. -
(standardize_confounds¶bool, default:True) –Whether to standardize confounds by their maximum absolute value before regression. This improves numerical stability while preserving constant terms.
-
(sample_mask¶(time,) xarray.DataArray, default:None) –Boolean sample mask indicating which timepoints to keep (
True) vs. remove (False). Must have atimedimension matchingsignals. If bothsignalsandsample_maskhavetimecoordinates, they must match exactly. If not provided, no scrubbing is applied. -
(interpolate_method¶(linear, nearest, zero, slinear, quadratic, cubic, quintic, polynomial, pchip, barycentric, krogh, akima, makima), default:"linear") –Interpolation method passed to
confusius.signal.interpolate_sampleswhen using pre-scrubbing interpolation. Ignored ifsample_maskis not provided or if no detrending or filtering is applied. DataArray.interp`. Common options:"nearest": Nearest-neighbor interpolation (fastest, least smooth)."linear": Linear interpolation (faster, less smooth)."cubic": Cubic spline interpolation (slower, smooth).
See
xarray.DataArray.interpfor all available methods.
Returns:
-
DataArray–Cleaned signals.
References
-
Lindquist, Martin A., et al. “Modular Preprocessing Pipelines Can Reintroduce Artifacts into fMRI Data.” Human Brain Mapping, vol. 40, no. 8, June 2019, pp. 2358–76. DOI.org (Crossref), https://doi.org/10.1002/hbm.24528. ↩
compute_compcor_confounds ¶
compute_compcor_confounds(
signals: DataArray,
noise_mask: DataArray | None = None,
variance_threshold: float | None = None,
n_components: int = 5,
detrend: bool = False,
skipna: bool = False,
) -> DataArray
Extract noise components using the CompCor method.
CompCor (Component-based Noise Correction) extracts principal components from noise regions (aCompCor) or high-variance voxels (tCompCor) to use as confound regressors [^1].
Parameters:
-
(signals¶(time, ...) xarray.DataArray) –Signals from which to extract components. Must have a
timedimension. For extracted signals, shape is typically(time, space). For full imaging data, shape is typically(time, z, y, x).Chunking along time is not supported
The
timedimension must NOT be chunked. Chunk only spatial dimensions:data.chunk({'time': -1}). -
(noise_mask¶DataArray, default:None) –Binary mask indicating voxels to consider. Must have the same spatial dimensions and coordinates as
signals(excluding time). Can be combined withvariance_thresholdfor hybrid selection. -
(variance_threshold¶float, default:None) –Variance percentile threshold (0-1) for selecting high-variance voxels. For example, 0.02 selects the top 2% highest-variance voxels from the voxels selected by
noise_mask(if provided) or all voxels. Can be combined withnoise_maskfor hybrid selection. -
(n_components¶int, default:5) –Number of principal components to extract.
-
(detrend¶bool, default:False) –Whether to linearly detrend the selected voxels before SVD. Can improve component quality by removing slow drifts.
-
(skipna¶bool, default:False) –Whether to skip NaN values when computing variance quantiles for tCompCor. If
False, uses fast quantile calculation. IfTrue, uses slower NaN-aware quantile calculation. Set toTrueonly if your data contains NaN values.
Returns:
-
(time, component) xarray.DataArray–Extracted CompCor components. Each column (component) is a principal component that can be used as a confound regressor. The DataArray includes:
timedimension with coordinates matching the input signals.componentdimension (0 ton_components - 1).explained_variance_ratiocoordinate oncomponentdimension, containing the proportion of total variance explained by each component.
Raises:
-
ValueError–If
signalsdoes not have atimedimension. -
ValueError–If mask doesn't have the right dtype or its dimensions/coordinates don't match signal spatial dimensions.
-
ValueError–If both
noise_maskandvariance_thresholdareNone(must specify at least one). -
ValueError–If
variance_thresholdis not in range(0, 1). -
ValueError–If
n_componentsis not positive. -
ValueError–If no voxels are selected (empty mask or threshold too high).
Notes
- aCompCor: Specify only
noise_maskto extract components from anatomically-defined noise regions (e.g., white matter, CSF). Useful when anatomical segmentation is available. - tCompCor: Specify only
variance_thresholdto extract components from high-variance voxels. Useful when no anatomical masks are available. - Hybrid: Specify both
noise_maskandvariance_thresholdto extract components from high-variance voxels within a specific anatomical region (e.g., high-variance white matter voxels).
References
-
Behzadi, Yashar, et al. “A Component Based Noise Correction Method (CompCor) for BOLD and Perfusion Based fMRI.” NeuroImage, vol. 37, no. 1, Aug. 2007, pp. 90–101. DOI.org (Crossref), https://doi.org/10.1016/j.neuroimage.2007.04.042. ↩
Examples:
Extract aCompCor components from white matter:
>>> import xarray as xr
>>> import numpy as np
>>> from confusius.signal import compute_compcor_confounds, regress_confounds
>>> signals = xr.DataArray(
... np.random.randn(100, 50),
... dims=["time", "space"],
... coords={"time": np.arange(100) * 0.1}
... )
>>> wm_mask = xr.DataArray(
... np.zeros(50, dtype=bool),
... dims=["space"]
... )
>>> wm_mask.values[:10] = True
>>> a_compcor = compute_compcor_confounds(
... signals,
... noise_mask=wm_mask,
... n_components=5,
... detrend=True
... )
>>> a_compcor.shape
(100, 5)
Extract tCompCor from high-variance voxels:
>>> t_compcor = compute_compcor_confounds(
... signals,
... variance_threshold=0.2,
... n_components=5,
... detrend=True
... )
Hybrid mode - high-variance WM voxels only:
>>> hybrid_compcor = compute_compcor_confounds(
... signals,
... noise_mask=wm_mask,
... variance_threshold=0.5,
... n_components=5
... )
Combine different CompCor variants for cleaning:
detrend ¶
Remove trends from signals across time.
This function operates along the time dimension and works with arrays of any
shape, making it flexible for both extracted signals and full fUSI data.
Parameters:
-
(signals¶(time, ...) xarray.DataArray) –Array to detrend. Must have a
timedimension. Can be any shape, e.g., extracted signals(time, space), full 3D+t imaging data(time, z, y, x), or regional signals(time, region).Chunking along time is not supported
The
timedimension must NOT be chunked. Chunk only spatial dimensions:data.chunk({'time': -1}). -
(order¶int, default:1) –Polynomial order for detrending:
0: Remove mean (constant detrending).1: Remove linear trend using least squares regression (default).2+: Remove polynomial trend of specified order.
Returns:
-
DataArray–Detrended signals with same shape and coordinates as input.
Raises:
-
ValueError–If
signalsdoes not have atimedimension. Iforderis negative.
Warns:
-
UserWarning–If
signalshave only one timepoint (detrending is skipped).
Notes
- For
order=0andorder=1, usesscipy.signal.detrend. - For
order >= 2, fits a polynomial of the specified order and subtracts it.
Examples:
Remove mean (constant detrending):
>>> import xarray as xr
>>> import numpy as np
>>> signals = xr.DataArray(
... np.random.randn(100, 50) + 10, # Noise with offset.
... dims=["time", "space"]
... )
>>> detrended = detrend(signals, order=0) # Remove mean.
Remove linear trend (default):
>>> time = np.arange(100)
>>> drift = time[:, None] * 0.5 # Linear drift.
>>> noise = np.random.randn(100, 50)
>>> signals = xr.DataArray(drift + noise, dims=["time", "space"])
>>> detrended = detrend(signals) # order=1 by default.
Remove quadratic trend:
>>> quadratic_drift = (time[:, None] ** 2) * 0.01
>>> signals_quad = xr.DataArray(quadratic_drift + noise, dims=["time", "space"])
>>> detrended_quad = detrend(signals_quad, order=2)
Works on 3D+t data:
filter_butterworth ¶
filter_butterworth(
signals: DataArray,
low_cutoff: float | None = None,
high_cutoff: float | None = None,
order: int = 5,
padtype: str = "odd",
padlen: int | None = None,
uniformity_tolerance: float = 0.01,
) -> DataArray
Apply a low-pass, high-pass, or band-pass Butterworth digital filter to signals.
This function filters along the time dimension and works with arrays of any
shape, making it flexible for both extracted signals and full fUSI data.
Parameters:
-
(signals¶(time, ...) xarray.DataArray) –Array to filter. Must have a
timedimension. Can be any shape, e.g., extracted signals(time, space), full 3D+t imaging data(time, z, y, x), or regional signals(time, region).Chunking along time is not supported
The
timedimension must NOT be chunked. Chunk only spatial dimensions:data.chunk({'time': -1}). -
(low_cutoff¶float, default:None) –Low cutoff frequency in Hz. Frequencies below this are attenuated (acts as high-pass filter). If not provided, no high-pass filtering is applied.
-
(high_cutoff¶float, default:None) –High cutoff frequency in Hz. Frequencies above this are attenuated (acts as low-pass filter). If not provided, no low-pass filtering is applied.
-
(order¶int, default:5) –Filter order. Higher orders give steeper roll-off but may be less stable.
-
(padtype¶(odd, even, constant, None), default:"odd") –Type of padding to use. See
scipy.signal.sosfiltfiltfor details."odd"pads with odd-extension (reduces edge effects),"even"pads with even-extension,"constant"pads with zeros. If not provided, no padding is applied. -
(padlen¶int, default:None) –Number of elements to pad at each end. If not provided, uses scipy's default:
3 * (2 * n_sections + 1 - compensation)wheren_sections = ceil(order/2). -
(uniformity_tolerance¶float, default:1e-2) –Maximum allowed relative range of consecutive time intervals, defined as
(max_interval - min_interval) / median_interval. Raise aValueErrorif the time coordinate exceeds this threshold. Increase this value to tolerate slight timestamp jitter (e.g. from acquisition clocks or dropped volumes).
Returns:
-
DataArray–Filtered signals with same shape and coordinates as input.
Raises:
-
ValueError–If
signalsdoes not have atimedimension ortimecoordinates, if time coordinates are not uniformly sampled within the specified tolerance, or if insufficient timepoints for the filter order (needs more than3 * (2 * ceil(order / 2) + 1)). -
ValueError–If both
low_cutoffandhigh_cutoffareNone(no filtering), or if cutoff frequencies are invalid (negative, above Nyquist, or if high_cutoff <= low_cutoff for band-pass). -
ValueError–If
orderis not positive.
Notes
- Uses
scipy.signal.butterwith second-order sections (SOS) for numerical stability. - Uses
scipy.signal.sosfiltfiltfor zero-phase filtering (forward-backward). - Edge effects are handled automatically using padding.
Examples:
Low-pass filter to remove high-frequency noise:
>>> import xarray as xr
>>> import numpy as np
>>> from confusius.signal import filter_butterworth
>>> # Create signals with time coordinates (500 Hz sampling).
>>> signals = xr.DataArray(
... np.random.randn(1000, 50),
... dims=["time", "space"],
... coords={"time": np.arange(1000) / 500}
... )
>>> # Keep frequencies below 0.1 Hz (sampling rate computed from time coords).
>>> filtered = filter_butterworth(signals, high_cutoff=0.1)
High-pass filter to remove slow drift:
>>> # Keep frequencies above 0.01 Hz (remove lower frequencies).
>>> filtered = filter_butterworth(signals, low_cutoff=0.01)
Band-pass filter:
interpolate_samples ¶
interpolate_samples(
signals: DataArray,
sample_mask: DataArray,
method: InterpOptions = "linear",
**kwargs,
) -> DataArray
Interpolate censored samples from signals.
This function interpolates values at censored (bad) timepoints using samples marked
as good. The typical use case is to fill in high-motion volumes before temporal
filtering, then remove them afterward with censor_samples. This allows retaining
regular time sampling during filtering.
Parameters:
-
(signals¶(time, ...) xarray.DataArray) –Array to interpolate. Must have a
timedimension andtimecoordinates. Can be any shape, e.g., extracted signals(time, voxels), full 3D+t imaging data(time, z, y, x), or confounds(time, n_confounds). -
(sample_mask¶(time,) xarray.DataArray) –Boolean sample mask indicating which timepoints to keep (
True) vs. interpolate (False). Must have atimedimension matchingsignals. If bothsignalsandsample_maskhavetimecoordinates, they must match exactly. -
(method¶(linear, nearest, zero, slinear, quadratic, cubic, quintic, polynomial, pchip, barycentric, krogh, akima, makima), default:"linear") –Interpolation method passed to
xarray.DataArray.interp. Common options:"nearest": Nearest-neighbor interpolation (fastest, least smooth)."linear": Linear interpolation (faster, less smooth)."cubic": Cubic spline interpolation (slower, smooth).
See
xarray.DataArray.interpfor all available methods. -
–**kwargs¶Additional keyword arguments passed to
xarray.DataArray.interp.
Returns:
-
DataArray–Signals with interpolated values at censored positions. Same shape and coordinates as input.
Raises:
-
ValueError–If
signalsdoesn't have atimedimension ortimecoordinates. -
TypeError–If
sample_maskis not an xarray.DataArray. -
ValueError–If
sample_maskhas wrong dtype, length, missing time dimension, or mismatched coordinates. -
ValueError–If all samples are censored (cannot interpolate).
Warns:
-
UserWarning–If all samples are marked as good (no interpolation needed).
Notes
- Kept samples (
sample_mask=True) are unchanged; only censored samples (sample_mask=False) are replaced with interpolated values. - Uses
xarray.DataArray.interpwhich handles coordinates and Dask arrays automatically.
Examples:
Interpolate high-motion volumes before filtering:
>>> import xarray as xr
>>> import numpy as np
>>> from confusius.signal import interpolate_samples, filter_butterworth, censor_samples
>>> # Create signals with time coordinates.
>>> signals = xr.DataArray(
... np.random.randn(100, 50),
... dims=["time", "space"],
... coords={"time": np.arange(100) / 500} # 500 Hz.
... )
>>> # Mark high-motion frames (e.g., frames 10, 25, 60 are bad).
>>> motion_outliers = np.array([10, 25, 60])
>>> mask_values = np.ones(100, dtype=bool)
>>> mask_values[motion_outliers] = False # False = censor.
>>> sample_mask = xr.DataArray(
... mask_values, dims=["time"], coords={"time": signals.coords["time"]}
... )
Pre-scrubbing workflow (recommended):
>>> # 1. Interpolate censored samples.
>>> interpolated = interpolate_samples(signals, sample_mask, method="cubic")
>>> # 2. Apply temporal filter to complete signal.
>>> filtered = filter_butterworth(interpolated, high_cutoff=0.1)
>>> # 3. Remove censored samples after filtering.
>>> cleaned = censor_samples(filtered, sample_mask)
Control boundary behavior:
regress_confounds ¶
regress_confounds(
signals: DataArray,
confounds: DataArray,
standardize_confounds: bool = True,
) -> DataArray
Remove confounds from signals via linear regression.
This function performs confound regression by projecting the signals onto the orthogonal complement of the confound space. This removes variance in the signals that can be explained by the confounds.
This function was adapted from nilearn.signal.clean.
Parameters:
-
(signals¶(time, ...) xarray.DataArray) –Signals to clean. Must have a
timedimension. Can be any shape, e.g., extracted signals(time, space), full 3D+t imaging data(time, z, y, x), or regional signals(time, region).Chunking along time is not supported
The
timedimension must NOT be chunked. Chunk only spatial dimensions:data.chunk({'time': -1}). -
(confounds¶(time, n_confounds) xarray.DataArray) –Confound regressors to remove. Can have shape
(time,)for a single confound. The time dimension and coordinates must match the signals exactly. -
(standardize_confounds¶bool, default:True) –Whether to standardize confounds by their maximum absolute value before regression. This improves numerical stability while preserving constant terms.
Returns:
-
DataArray–Residuals after confound regression, same shape and coordinates as input signals.
Raises:
-
ValueError–If
signalsdoes not have atimedimension, or ifconfoundshave mismatched time dimension or invalid shape. -
TypeError–If
confoundsis not an xarray DataArray.
Notes
- Uses QR decomposition with column pivoting for numerical stability.
- Handles rank-deficient confound matrices (e.g., collinear confounds) by removing redundant columns.
- Based on the projection method from Friston et al. (1994).
References
-
Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J. P., Frith, C. D., & Frackowiak, R. S. (1994). Statistical parametric maps in functional imaging: a general linear approach. Human brain mapping, 2(4), 189-210. ↩
Examples:
Remove motion parameters from extracted signals:
>>> import xarray as xr
>>> import numpy as np
>>> from confusius.signal import regress_confounds
>>> # Create signals (100 timepoints, 50 voxels)
>>> signals = xr.DataArray(
... np.random.randn(100, 50),
... dims=["time", "space"],
... coords={"time": np.arange(100) * 0.1}
... )
>>> # Create motion confounds (6 motion parameters)
>>> motion_params = xr.DataArray(
... np.random.randn(100, 6),
... dims=["time", "confound"],
... coords={"time": signals.coords["time"]},
... )
>>> # Remove motion effects
>>> cleaned = regress_confounds(signals, motion_params)
Works on 3D+t imaging data:
standardize ¶
Standardize signals across time.
This function operates along the time dimension and works with arrays of any
shape, making it flexible for both extracted signals and full fUSI data.
Parameters:
-
(signals¶(time, ...) xarray.DataArray) –Array to standardize. Must have a
timedimension. Can be any shape, e.g., extracted signals(time, space), full 3D+t imaging data(time, z, y, x), or regional signals(time, region). -
(method¶(zscore, psc), default:"zscore") –Standardization method:
"zscore":(x - mean) / stdusing sample standard deviation with Bessel's correction. Elements with zero or near-zero variance will be set tonumpy.nan."psc": Percent signal change:(x - mean) / |mean| * 100. Elements with zero or near-zero mean will be set tonumpy.nan.
Returns:
-
DataArray–Standardized array with same shape and coordinates as input.
Raises:
-
ValueError–If
signalsdoes not have atimedimension. -
ValueError–If
methodis not"zscore"or"psc".
Warns:
-
UserWarning–If
signalshave only one timepoint (standardization is skipped).
Examples:
Standardize extracted signals:
>>> import xarray as xr
>>> import numpy as np
>>> signals = xr.DataArray(
... np.random.randn(100, 50),
... dims=["time", "space"]
... )
>>> standardized = standardize(signals, method="zscore")
>>> standardized.mean(dim="time").values # Should be close to 0.
>>> standardized.std(dim="time", ddof=1).values # Should be close to 1.
Standardize 3D+t imaging data directly: