C-PAC & DOCKER

Author: Julia Wind

This blog post will be focusing on the basics of analyzing a BIDS formatted dataset using Docker and the cpac python package on macOS. For alternative methods of using C-PAC, please reference the C-PAC User Documentation.

What is C-PAC?

The main goal of the Configurable Pipeline for the Analysis of Connectomes (C-PAC) is to build off the Nipype platform to form an automated processing pipeline for resting state functional MRI (R-fMRI) data that doesn’t require extensive programming knowledge.

nitrc.org/projects/cpac

C-PAC is:

configurable
open-source
Nipype and python based

(fcp-indi.github.io/docs/latest/user/index)

C-PAC uses Nipype to provide a uniform interface to existing software packages including AFNI, FSL, and ANTS. Users can combine neuroimaging tools to form analysis pipelines that are easily scalable to large datasets, with many choices for processing tools leading to robust results. “C-PAC makes it easy to explore the impact of particular processing decisions by allowing users to run a factorial number of analysis pipelines, each with a different set of preprocessing and analysis options.” fcp-indi.github.io/

fcp-indi.github.io/docs/nightly/user/quick

“C-PAC comes pre-packaged with a default pipeline, as well as a growing library of pre-configured pipelines.

However, you can edit any of these pipelines, or build your own from scratch, using our pipeline builder.
C-PAC can also pull input data from the cloud (AWS S3 buckets), so you can pull public data from any of the open-data releases immediately.”

Ways to Run C-PAC

C-PAC can be run with or without installation. A tutorial for installing C-PAC can be found here.

Note: C-PAC requires a *nix-like environment and thus does not support Windows.

Three options to run C-PAC without installing:

With a Docker or Singularity container (optionally with a Python commandline interface)
- Unlike Docker, Singularity does not require administrator rights and offers more secure deployment on shared cluster environments. More: Docker vs Singularity
On the Amazon AWS Cloud
Through OpenNeuro

Container engines like Singularity and Docker offer an alternative to C-PAC’s lengthy installation process. C-PAC has many dependencies to install that may lead to compatibility issues with existing software. Differences in software versions are especially headache inducing when trying to install neuroimaging tools. Perhaps more importantly, they can lead to a lack of reproducibility in analysis results, as different research teams may have differing setups. In recent years many neuroimaging researchers have turned to container engines for a solution. In this blog post I will be focusing on Docker, the most widely used container engine.

What is Docker?

Docker is an open source platform to easily create, deploy, and execute applications across many systems through the use of containers. A container is a “lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.” docker.com/resources/what-container/ With containers, you only need to download one software package that includes C-PAC and all of its dependencies.

(docker.com/resources/what-container/)

Containers are similar to virtual machines in that they allow for apps to be isolated and in a controlled environment. They differ in the key aspect that virtual machines create a copy of the entire operating system in order to isolate each individual app, while containers can share the same OS kernel. For this reason containers can handle more applications and use less storage.

Docker container images of neuroimaging software are an isolated environment, are easily shareable, and offer standardization - making them ideal for reproducibility!

More resources:

xenonstack.com/blog/docker-container

Installing Docker Desktop

Docker desktop is free, installation tutorial and requirements can be found here: docker.com/products/docker-desktop/

Docker installation tutorial

Docker Logs and Troubleshooting

Verify that docker has installed correctly by entering docker in terminal or command prompt, or open the docker app.

Downloading cpac

Information from fcp-indi.github.io/docs/latest/user/quick#cpac-python-package

cpac is a python package that acts as a command line interface between the user and the containers C-PAC is run on, simplifying the interaction. Cpac requires pip and Python 3.6 or greater.

To check if pip and python are already installed:

pip3 --version

python3 --version

To get cpac:

pip3 install cpac

To download C-PAC:

cpac pull

To upgrade C-PAC:

cpac upgrade

Tags for pull and upgrade:

--platform {docker, singularity}

If neither platform nor image is specified, cpac will try Docker first, then try Singularity if Docker fails.

--image IMAGE

path to Singularity image file OR name of Docker image (eg, "fcpindi/c-pac"). Will attempt to pull from Singularity Hub or Docker Hub if not provided. If image is specified but platform is not, platform is assumed to be Singularity if image is a path or Docker if image is an image name.

--tag TAG

tag of the Docker image to use (eg, "latest" or "nightly").

For example, if you wanted to get the latest C-PAC docker container:

cpac --platform docker --tag latest pull

Tip: the latest C-PAC docker container is around 10GB, and to avoid running out of application memory I had to quit all other applications while downloading.

If your machine runs out of application memory it may cause docker to stop responding. In some cases, especially with the latest version of Docker on macOS, you may need to purge data, reset to factory defaults, or uninstall and reinstall Docker. See this discussion for more details.

Running C-PAC

Information and examples from fcp-indi.github.io/docs/latest/user/quick#run-c-pac-with-cpac

C-PAC requires at least one pipeline configuration file and one data configuration file in order to run an analysis.

However, if your data is in BIDS format no data configuration file is needed, and if no pipeline configuration is specified the default pipeline configuration is used. For example, the most basic configuration of C-PAC would be:

cpac run /Users/You/local_bids_data /Users/You/some_folder_for_outputs participant

Where only the three required arguments are used.

The three required positional arguments:

bids_dir

The directory with the input dataset formatted according to the BIDS standard.

output_dir

The directory where the output files should be stored. If you are running group level analysis this folder should be prepopulated with the results of the participant level analysis.

{participant, group, test_config, cli}

Level of the analysis that will be performed. Multiple participant level analyses can be run independently (in parallel) using the same output_dir. test_config will run through the entire configuration process but will not execute the pipeline.

Common optional arguments include:

--pipeline_file PIPELINE_FILE

Path for the pipeline configuration file to use.

--group_file GROUP_FILE

Path for the group analysis configuration file to use.

--data_config_file DATA_CONFIG_FILE

Yaml file containing the location of the data that is to be processed. This file is not necessary if the data in bids_dir is organized according to the BIDS format. This enables support for legacy data organization and cloud based storage. A bids_dir must still be specified when using this option, but its value will be ignored.

--preconfig PRECONFIG

Name of the pre-configured pipeline to run.

As well as tags specifying a portion of your dataset to process.

To run C-PAC with a specific pipeline configuration file:

cpac run /Users/You/local_bids_data /Users/You/some_folder_for_outputs participant --pipeline_file /Users/You/Documents/pipeline_config.yml

To run C-PAC with a specific data configuration file:

cpac run /Users/You/any_directory /Users/You/some_folder_for_outputs participant --data_config_file /Users/You/Documents/data_config.yml

A list of all flags and arguments can be found by running

cpac run --help

Configuration Files

Information from fcp-indi.github.io/docs/nightly/user/subject_list_config

“C-PAC requires at least one pipeline configuration file and one data configuration file in order to run an analysis. These configuration files are in the YAML file format, which matches contents in a key: value relationship much like a dictionary.”

Data Configuration Files (Participant List)

Information from fcp-indi.github.io/docs/nightly/user/subject_list_config

“The data configuration file is essentially a list of file paths to anatomical and functional MRI scans keyed by their unique IDs, and listed with any additional information as necessary.” The data configuration file is required so that C-PAC can locate the files to be processed.

If the data is in BIDS format, C-PAC will already know how the dataset is structured (yay standardization!) so a data configuration file is not required. However, they can still be useful for specifying specific portions of the dataset that should be processed by C-PAC. data_settings.yml contents A data configuration file can be created manually, or from a data_settings.yml template file.

Pipeline Configuration Files

Information from fcp-indi.github.io/docs/latest/user/pipelines/pipeline_config

C-PAC pipeline configurations can be made with either:

The online C-PAC GUI (fcp-indi.github.io/C-PAC_GUI/versions/latest/browser/#/)
A text editor (more practical for remote servers)

(fcp-indi.github.io/)

Edit pipeline through the browser and then save the pipeline configuration YAML file

With a text editor:

To generate a default pipeline configuration YAML from terminal:

cpac utils pipe_config new_template

You can then edit the file as needed by changing the values in the key:value pairs.

“If you want to base a pipeline on another pipeline configuration YAML file, you can specify

FROM: /path/to/pipeline.yml

in your pipeline configuration file. You can use the name of a preconfigured pipeline instead of a filepath if you want to base a configuration file on a preconfigured pipeline. If FROM is not specified, the pipeline will be based on the default pipeline.”

An example YAML file, the default pipeline configuration YAML file:

Workflow Options

fcp-indi.github.io/docs/latest/user/pipelines/pipeline_config#definitions

Common Definitions:

“Workflow
- A workflow accomplishes a particular processing task (e.g. functional preprocessing, scrubbing, nuisance correction).
Pipeline
- A pipeline is a combination of workflows.
Derivative
- Derivatives are the results of processing a participant’s raw data (i.e., connectivity measures).”

Selecting a Configuration:

Configurable Settings:

Below is an overview of the workflows available with C-PAC and instructions on their configuration.

Derivatives
- Seed-based Correlation Analysis (SCA) and Dual Regression - Analyze the connectivity between brain regions.
- Voxel-mirrored Homotopic Connectivity (VMHC) - Investigate connectivity between hemispheres.
- Amplitude of Low Frequency Fluctuations (ALFF) and fractional ALFF (fALFF) - Measure the power of slow fluctuations in brain activity.
- Regional Homogeneity (ReHo) - Measure the similarity of activity patterns across neighboring voxels.
- Network Centrality - Analyze the structure of functional networks.

(sciencedirect.com/science/article/pii/B9780124079083000054)

PyPEER Integration - estimating eye gaze from the fMRI signal in the eye’s orbit

Data Management and Environment Settings
- Computer Settings - allocating computational resources for analysis
- Output Settings - designating directories and requesting additional outputs
- Random State - set the random seed for reproducibility/variability experiments
Pre- and post-processing
Quality Control
- QC Pages - Visual Data Quality Control, found in the Output Directory under each participant’s directory level.
  - An example of skull stripping and segmentation quality
- XCP QC files - eXtensible Connectivity Pipeline-style quality control files
Group- Level Analysis options

Jahanikia NeuroLab @ASDRP

C-PAC & DOCKER

Author: Julia Wind