Analysis of single-data obtained from CyTOF at Stockholm node

As mass cytometry significantly expands the number of phenotypic and functional characteristics that can be measured at the single-cell level, it provides researchers with a tool to study the biological complexity. The high-dimensional data generated from CyTOF has warranted the development of new data analysis approaches.

The Analysis Framework

  • Many different algorithms have been developed but a lack of consensus on how to analyze the data, which algorithm is the best for analyzing a dataset pose a challenge to the researcher. Further, it’s very important to consider data analysis when designing an experiment. As the data format obtained from CyTOF is an FCS (Flow Cytometry Standard) file that is similar to flow cytometry, FlowJo software can be used to QC staining profiles (using regular biaxial plots) and gating cell populations in a hierarchical manner, followed by exporting single-cell CSV tables for further algorithm-guided unsupervised analysis.

Data Processing Steps

  1. Parameter harmonization: When dealing with files from multiple different runs, different instruments and different staining panels, the channel names need to be harmonized to compare samples.
  2. Bead normalization - As CyTOF instrument performance may vary over time (both within a single run, but more prominently between different runs), it is important to normalize data in a way that limits the impact of the technical variation. We employ MATLAB normalization (Finck et al., Cytometry: Part A. 2013) method to normalize data. The MATLAB method normalizes using median bead intensities calculated across given experimental data files (e.g. files from different runs on the same machine)
  3. Data randomization - The raw data generated from the machine consists of a large set of mass spectra that are integrated into more easily interpretable FCS-files. The result of this processing is a data set with discrete rather than continuous values. Such discrete values work fine for statistical summaries and analysis but can interfere with plotting due to over plotting. To facilitate plotting, randomization can be performed. During randomization, each discrete measurement is slightly “fuzzed up”. That is, it is randomized to a value close to itself. The result is a continuous data set but with much the same properties as the discrete one.
  4. ArcSinh Transformation - Commonly, mass cytometry ion counts will be ArcSinh transformed - this transformation will retain linearity in the low end of the spectrum, but it resembles a log transformation in the high end. If counts are divided by a co-factor (typically 5) before the transformation, the range of ion counts for which linearity is retained may be adjusted. Transformations are done primarily for visualization and co-factors are empirically determined.
  5. Pre-gating - This is done in 4-5 steps
    • Batch correction - To correct for batch effects channel-by-channel using a range based approach (basically a linear alignment of densities) or a warping normalization that enables non-linear adjustments to individual peaks in the data.
    • Gating for singlets (event length vs DNA) - To gate out doublets, a biaxial plot is made with event length vs DNA channel and gating for the events which have a length within the range of most events.
    • Gating for cells (beads vs. DNA) - The first step is to gate out beads, cell/bead doublets, and debris. This is done by generating a biaxial plot with a DNA channel (e.g. 191 or 193Ir) on one axis and a bead-only isotope (140Ce) on the other.
    • Gating for intact cells (DNA1 vs DNA2) - to gate for intact cells, a biaxial plot is used with the two DNA channels (191Ir and 193Ir) and gating for events that are equally positive for both. If events express only little DNA it is likely debris, but if DNA expression is too high, they are likely cell doublets.
    • Gating for live cells (live/dead stain vs DNA) - This step is to remove dead cells. By using a biaxial plot with viability stain vs a DNA channel, cells with a high amount of staining are gated out.
  6. Debarcoding - Debarcoding/deconvolution of barcoded FCS files is done to separate FCS files of each barcoded sample.