Skip to content

Input/Output

Overview

ConfUSIus is designed to handle large-scale fUSI datasets efficiently. This guide explains how to work with different data formats and convert them to work with ConfUSIus.

Working with Xarray

ConfUSIus uses Xarray as its core data structure for representing multi-dimensional fUSI data. Xarray provides several advantages over raw NumPy arrays:

  • Named dimensions: Access data using meaningful names (e.g., time, z, y, x) instead of remembering axis indices.
  • Coordinates: Associate physical coordinates with each dimension (e.g., time in seconds, depth in millimeters).
  • Metadata storage: Keep acquisition parameters, units, and other metadata alongside your data.
  • Unified API: Use the same operations regardless of the underlying storage format.

Xarray-Compatible Formats

Xarray can read and write data from multiple storage formats, including:

  • Zarr: Chunked, compressed, cloud-native format (recommended for large datasets).
  • HDF5: Hierarchical format widely used in research.
  • netCDF: Self-describing format common in scientific computing (e.g., basis of the MINC 1.0 format).

Additionally, ConfUSIus provides utilities to read and write NIfTI files (the standard neuroimaging format for BIDS) as Xarray DataArrays, enabling seamless integration with BIDS-compliant fUSI datasets.

ConfUSIus works with all Xarray-compatible formats, but two formats are particularly well-suited for fUSI workflows:

NIfTI is the standard format for:

  • fUSI-BIDS compliance: Required for sharing datasets following the BIDS specification.
  • Neuroimaging pipelines: Compatible with tools like Nilearn, ANTsPy, and FSL, which have been used in some fUSI studies.
  • Derived acquisitions: Power Doppler, velocity, and other processed signals.

Use NIfTI when you need to share data, ensure BIDS compliance, or integrate with existing neuroimaging analysis tools.

Zarr excels at handling massive datasets through:

  • Out-of-core processing: Work with datasets larger than memory by loading only needed chunks.
  • Compression: Reduce storage footprint without sacrificing access speed.
  • Parallel I/O: Read and write data concurrently from multiple processes.
  • Cloud compatibility: Store and access data on remote object storage (S3, GCS, etc.).

Use Zarr for beamformed IQ data, large-scale analyses, and cloud-based processing workflows. ConfUSIus converts beamformed IQ data to Zarr by default for optimal performance with large-scale datasets.

fUSI Data Types

fUSI workflows involve two main categories of data:

Complex-valued signals resulting from ultrasound beamforming of RF signals. These signals are typically further processed to extract derived signals such as power Doppler or velocity. Beamformed IQ datasets are typically very large (10s to 100s of GB per acquisition session), and stored in proprietary formats depending on the acquisition system (e.g., Iconeus, AUTC, EchoFrame, etc.).

Processed data products such as power Doppler, velocity, or other signals derived from beamformed IQ data. These datasets are generally much smaller than the original IQ data (often 10-100x smaller) and are frequently stored in standardized formats like NIfTI for interoperability and BIDS compliance.

Zarr for large derived datasets

Large-scale derived acquisitions (e.g., long power Doppler recordings) can also benefit from storage in Zarr for efficient processing.

Converting Beamformed IQ Data

Most fUSI acquisition systems output beamformed IQ data in proprietary binary formats. ConfUSIus currently provides built-in conversion utilities for AUTC and EchoFrame systems to transform their data into Zarr for efficient processing.

This format consists of a series of binary .dat files (often split into parts), where each file contains multiple acquisition blocks.

To convert a folder of AUTC DAT files to Zarr, use the convert_autc_dats_to_zarr function.

from confusius.io import convert_autc_dats_to_zarr

convert_autc_dats_to_zarr(
    dats_root="path/to/data_folder",
    output_path="sub-01_task-awake_iq.zarr",
    # Optional: specify block start times, transmit frequency, axis coordinates, and
    # other metadata via keyword arguments (see API for details).
    block_times=block_times,
    compound_sampling_frequency=500.0,
    transmit_frequency=15.625e6,
    sound_velocity=1510.0,
)

This will create a Zarr group containing:

  • iq: Beamformed IQ data with dimensions (time, z, y, x).
  • time, z, y, x: Coordinate arrays.
  • Voxel sizes (voxdim) as per-coordinate attributes on z, y, and x.
  • Metadata attributes (e.g., transmit_frequency, plane_wave_angles) as provided via keyword arguments.

This format consists of a binary .dat file containing the beamformed data and a .mat file containing metadata (sequence parameters).

To convert EchoFrame data to Zarr, use the convert_echoframe_dat_to_zarr function.

from confusius.io import convert_echoframe_dat_to_zarr

convert_echoframe_dat_to_zarr(
    dat_path="path/to/data.dat",
    meta_path="path/to/metadata.mat",
    output_path="sub-01_task-awake_iq.zarr",
    # Optional: specify block start times. Other metadata (e.g., transmit frequency,
    # axis coordinates) will be automatically extracted from the metadata file.
    block_times=block_times,
)

This will create a Zarr group containing:

  • iq: Beamformed IQ data with dimensions (time, z, y, x).
  • time, z, y, x: Coordinate arrays.
  • Voxel sizes (voxdim) as per-coordinate attributes on z, y, and x.
  • Metadata attributes (e.g., transmit_frequency, plane_wave_angles) as extracted from the metadata file.

Other Systems

If you're working with IQ data from a system other than AUTC or EchoFrame, load it using your own loader and use validate_iq to ensure it meets ConfUSIus requirements before processing:

import xarray as xr
from confusius.validation import validate_iq

# Load IQ data from an unsupported format (example using your own loading function).
iq = your_loader_function("path/to/iq_data")

# Validate the IQ data structure and required attributes.
try:
    iq_validated = validate_iq(iq)
    print("✓ IQ data is valid and ready for processing")
except ValueError as e:
    print(f"✗ Validation failed: {e}")

The validate_iq function checks that your data:

  • Has the correct dimensions: (time, z, y, x).
  • Is complex-valued (numpy.complex64 or numpy.complex128).
  • Contains required attributes:
    • compound_sampling_frequency: Effective IQ sampling frequency in Hz.
    • transmit_frequency: Ultrasound transmit frequency in Hz.
    • sound_velocity: Speed of sound in m/s.

Finding attribute values

If you're unsure about the correct values for these attributes:

  • compound_sampling_frequency: Check your acquisition software settings. The compound sampling frequency is generally around 500-1000 Hz for fUSI acquisitions, but can vary based on the system and settings used.
  • transmit_frequency: Found in your probe specifications or acquisition settings. Generally around 5-10 MHz for clinical probes, and 12-20 MHz for high-frequency probes used in small animal imaging.
  • sound_velocity: Typically 1540 m/s for brain tissues, but may vary with temperature and tissue type.

Loading Data

All ConfUSIus loaders return lazy DataArrays backed by Dask—data stays on disk until an operation requires it.

Load into memory when it fits

Lazy loading is essential for datasets larger than available RAM, but it introduces Dask scheduling overhead on every operation. If your data fits comfortably in memory (leaving enough headroom for intermediate results), load it eagerly with .compute() for better performance:

da = cf.load("sub-01_task-awake_pwd.nii.gz").compute()

Loading Zarr Files

Once your data is in Zarr format, load it with confusius.load:

import confusius as cf

# Load beamformed IQ data (returns the first variable as a DataArray by default).
iq_data = cf.load("sub-01_task-awake_iq.zarr")

# Or specify the variable name if there are multiple variables in the Zarr store.
iq_data = cf.load("sub-01_task-awake_iq.zarr", variable="iq")

print(iq_data)

Loading a full Dataset

confusius.load always returns a single DataArray. To load all variables in a Zarr store as a Dataset, use xarray.open_zarr directly:

import xarray as xr

ds = xr.open_zarr("sub-01_task-awake_iq.zarr")

This returns an DataArray lazily loaded from the Zarr store, ready for processing:

<xarray.DataArray 'iq' (time: 1168500, z: 1, y: 118, x: 52)> Size: 57GB
dask.array<open_dataset-iq, shape=(1168500, 1, 118, 52), dtype=complex64, chunksize=(300, 1, 118, 52), chunktype=numpy.ndarray>
Coordinates:
  * time     (time) float64 9MB 5.551 5.553 5.555 ... 2.355e+03 2.355e+03
  * z        (z) float64 8B 0.0
  * y        (y) float64 944B 4.656 4.705 4.753 4.802 ... 10.23 10.28 10.33
  * x        (x) float64 416B -2.671 -2.57 -2.469 -2.369 ... 2.268 2.369 2.469
Attributes:
    transmit_frequency:           15625000.0
    probe_n_elements:             128
    probe_pitch:                  0.0001
    sound_velocity:               1510.0
    plane_wave_angles:            [-10.0, -9.310344696044922, -8.620689392089...
    compound_sampling_frequency:  500.0
    pulse_repetition_frequency:   15000.0
    beamforming_method:           Fourier

Notice that the data remains on disk (shown by dask.array<...>) until you explicitly compute operations on it.

Loading Iconeus SCAN Files

Use load_scan to load Iconeus .scan files (HDF5 files produced by IcoScan) as lazy Xarray DataArrays. Three acquisition modes are supported, each yielding a DataArray with different dimensions:

Mode Dimensions Description
2Dscan (time, z, y, x) 2D+t fUSI
3Dscan (pose, z, y, x) Multi-pose anatomical volume
4Dscan (time, pose, z, y, x) Multi-pose time-series (3D+t fUSI)

All spatial coordinates are in millimeters; the time coordinate is in seconds.

import confusius as cf

da = cf.load("sub-01_task-awake_pwd.source.scan")

print(da.dims)
# Output: ('time', 'z', 'y', 'x')
import confusius as cf

da = cf.load("sub-01_acq-anat_pwd.source.scan")

print(da.dims)
# Output: ('pose', 'z', 'y', 'x')

The pose dimension indexes each probe position in the multi-pose acquisition. Each pose has its own physical_to_lab affine (shape (npose, 4, 4)) stored in da.attrs["affines"]["physical_to_lab"].

import confusius as cf

da = cf.load("sub-01_task-awake_pwd.source.scan")

print(da.dims)
# Output: ('time', 'pose', 'z', 'y', 'x')

In addition to the time coordinate (earliest timestamp per block), a pose_time non-dimension coordinate of shape (time, pose) stores the exact per-pose acquisition timestamps.

The DataArray is loaded lazily: data remains on disk until explicitly computed. SCAN files stay open while the Dask graph is un-computed, so keep the DataArray in scope or call .compute() before discarding it.

Provenance metadata from the file is stored in da.attrs: scan_mode, subject, session, scan, project, date, neuroscan_version, and machine_sn.

Converting SCAN Data to NIfTI

Since load_scan returns a standard Xarray DataArray with ConfUSIus-compatible dimensions and coordinates, you can save it directly to NIfTI using save_nifti or the Xarray accessor.

For 2Dscan data, save it directly:

import confusius as cf

da = cf.load("sub-01_task-awake_pwd.scan")
cf.save(da, "sub-01_task-awake_pwd.nii.gz")
# Or equivalently:
da.fusi.save("sub-01_task-awake_pwd.nii.gz")

For 3Dscan and 4Dscan data, consolidate the poses into a single volume before saving, or save each pose separately if you want to retain the multi-pose structure:

import confusius as cf

anat = cf.load("sub-01_acq-anat_pwd.scan")
volume = cf.multipose.consolidate_poses(anat)
cf.save(volume, "sub-01_acq-anat_pwd.nii.gz")
import confusius as cf

anat = cf.load("sub-01_acq-anat_pwd.scan")
for pose in anat.pose:
    pose_da = anat.sel(pose=pose)
    cf.save(pose_da, f"sub-01_acq-anat_pose-{pose.values}.nii.gz")

Loading NIfTI Files

Use confusius.load to load NIfTI files as lazy Xarray DataArrays:

import confusius as cf

# Load with automatic BIDS sidecar metadata.
da = cf.load("sub-01_task-awake_pwd.nii.gz")
print(da.dims)
# Output: ('time', 'z', 'y', 'x')

ConfUSIus automatically loads a JSON sidecar file with the same basename (e.g., sub-01_task-awake_pwd.json) if present, following the BIDS specification for fUSI metadata.

Saving Data

You can save DataArrays to NIfTI and Zarr using confusius.save or the Xarray accessor:

import confusius as cf

# Save to NIfTI with automatic JSON sidecar creation.
cf.save(data_array, "output.nii.gz")

# Save to Zarr.
cf.save(data_array, "output.zarr")
import confusius

# Save to NIfTI with automatic JSON sidecar creation.
data_array.fusi.save("output.nii.gz")

# Save to Zarr.
data_array.fusi.save("output.zarr")

When saving to NIfTI, a JSON sidecar file will be automatically created. Spatial coordinates and units are encoded in the NIfTI header itself; the sidecar stores custom attributes and BIDS timing fields (e.g., RepetitionTime, DelayAfterTrigger, VolumeTiming).

Format Conversion Reference

Quick reference for converting between formats:

From To Function
AUTC DATs Zarr confusius.io.convert_autc_dats_to_zarr
EchoFrame DAT Zarr confusius.io.convert_echoframe_dat_to_zarr
Iconeus SCAN Xarray DataArray confusius.load
NIfTI Xarray DataArray confusius.load
Zarr Xarray DataArray confusius.load / xarray.open_zarr (Dataset)
Xarray DataArray NIfTI confusius.save / .fusi.save
Xarray DataArray Zarr confusius.save / .fusi.save / .to_zarr