nulogo

Getting Started

The fastest way to familiarize yourself with Fly-QMA is to start with a working example. We recommend starting with the Fly-QMA Tutorial.

We also recommend reading the sections below before working with your own microscopy data.

Pipeline Overview

flyqma-pipeline

Preparing Images

Fly-QMA uses a hierarchical file structure that is organized into three levels:

  1. EXPERIMENT: One or more tissue samples imaged under the same conditions.

  2. STACK: All images of a particular tissue sample, such as an individual z-stack.

  3. LAYER: All analysis relevant to a single 2-D image, such as an individual layer.

Before using Fly-QMA, microscopy data should be manually arranged into a collection of STACK directories that reside within a particular EXPERIMENT directory. Note that the actual names of these directories don’t matter, but their hierarchical positions do:

EXPERIMENT
│
├── STACK 0         # First STACK directory
├── STACK 1
└── ... STACK N     # Nth STACK directory

Each STACK directory should contain one or more 2-D images of a unique tissue sample. Images must be supplied in .tif format with ZXYC orientation. Each image file may depict a single layer, a regularly-spaced z-stack, or an irregularly-spaced collections of layers.

EXPERIMENT
│
├── STACK 0
│   └── image.tif   # 3-D image with ZXYC orientation
│
├── STACK 1
│   └── image.tif
│
└── ... STACK N
        └── image.tif

Warning

Image segmentation is performed on a layer-by-layer basis. Because cells often span several adjacent layers in a confocal z-stack, individual layers must be spaced far enough apart to avoid measuring the same cells twice. Overlapping layers may also be manually excluded using the provided ROI Selector.

Loading Images

Next, instantiate an Experiment using the EXPERIMENT directory path:

>>> from flyqma.data import Experiment
>>> experiment = Experiment('./EXPERIMENT')

This instance will serve as the entry-point for managing all of the data in the EXPERIMENT directory. Lower levels of the data hierarchy may then be accessed in a top-down manner. To access an individual stack:

# load specific stack
stack = experiment.load_stack(stack_id)

# alternatively, by sequential iteration
for stack in experiment:
  stack.do_stuff()

The experiment.load_stack() method includes a full keyword argument that may be set to False in order to skip loading the stack’s .tif file into memory. This offers some performance benefit when only saved measurement data are needed. Of course, loading the image data is necessary if any segmentation, measurement, ROI definition, or bleedthrough correction operations are to be performed.

To begin analyzing an image stack, layers must be added to the corresponding stack directory. Calling stack.initialize() creates a layers subdirectory containing an additional subdirectory for each 2-D layer in the 3-D image stack. A stack metadata file is also added to the STACK directory at this time, resulting in:

EXPERIMENT
│
├── STACK 0
│   ├── image.tif
│   ├── metadata.json   # stack metadata (number of layers, image bit depth, etc.)
│   └── layers
│       ├── 0           # first LAYER directory
│       ├── 1
│       └── ... M       # Mth LAYER directory
│
├── STACK 1
└── ... STACK N

Image layers may now be analyzed individually:

# load specific layer
layer = stack.load_layer(layer_id)

# alternatively, by sequential iteration
for layer in stack:
  layer.do_stuff()

Methods acting upon lower level Stack or Layer instances are executed in place, meaning you won’t lose progress by iterating across instances or by coming back to a given instance at a different time. This peristence is possible because new subdirectories and files are automatically added to the appropriate STACK or LAYER directory each time a segmentation, measurement, annotation, bleedthrough correction, or ROI selection is saved, overwriting any existing files of the same type.

Segmenting Images

See the measurement documentation for a list of the specific parameters needed to customize the segmentation routine to suit your data. At a minimum, users must specify the background channel - that is, the index of the fluorescence channel used to identify cells or nuclei.

To segment an image layer, measure the segment properties, and save the results:

>>> channel = 2
>>> layer.segment(channel)
>>> layer.save()

Alternatively, to segment all layers within an image stack:

>>> channel = 2
>>> stack.segment(channel, save=True)

In both cases, measurement data are generated on a layer-by-layer basis. To ensure that the segmentation results and corresponding measurement data will remain available after the session is terminated, specify save=True or call layer.save(). This will save the segmentation parameters within a layer metadata file and create a segmentation subdirectory containing a segment labels mask. It will also create a measurements subdirectory containing the corresponding raw expression measurement data (measurements.hdf), as well as a duplicate version that is subject to all subsequent processing operations (processed.hdf). The raw measurements will remain the same until a new segmentation is executed and saved, while the processed measurements are updated each time a new operation is applied and saved. Following segmentation, each LAYER directory will resemble:

EXPERIMENT
│
├── STACK 0
│   ├── image.tif
│   ├── metadata.json
│   └── layers
│       ├── 0
│       │   ├── metadata.json          # layer metadata (background channel, parameter values, etc.)
│       │   ├── segmentation
│       │   │   ├── labels.npy         # segmentation mask (np.ndarray[int])
│       │   │   └── segmentation.png   # layer image overlayed with segment contours (optional)
│       │   └── measurements
│       │       ├── measurements.hdf   # raw expression measurements
│       │       └── processed.hdf      # processed expression measurements
│       ├── 1
│       └── ... M
├── STACK 1
└── ... STACK N

Measurement Data

Raw and processed measurement data are accessed via the Layer.measurements and Layer.data attributes, respectively. Both are stored in Pandas DataFrames in which each sample (row) reflects an individual segment. Columns depict a mixture of continuous and categorical features, including:

  • segment_id: unique integer identifier assigned to the segment

  • pixel_count: total number of pixels within the segment

  • centroid_x: mean x-coordinate of all pixels

  • centroid_y: mean y-coordinate of all pixels

  • chN: mean intensity of the Nth channel across all pixels

  • chN_std: standard deviation of the Nth channel across all pixels

  • chN_normalized: normalized mean intensity of the Nth channel

To aggregate processed measurement data across all layers in an image stack:

>>> stack_data = stack.aggregate_measurements()

Similarly, to aggregate across an entire experiment:

>>> experiment_data = experiment.aggregate_measurements()

Each of these operations returns measurement data in the same DataFrame format. However, in order to preserve the unique identity of each measurement the index is replaced by a hierarchical index depicting the unique layer and/or stack from which each segment was derived.

Analysis

The measurement data stored in the layer.measurements attribute and measurements.hdf file reflect raw measurements of mean pixel intensity for each segment. These measurements may then be subject to one or more processing operations such as:

  • ROI definition

  • Bleedthrough correction

  • Automated annotation

  • Manual annotation

The objects that perform these operations all behave in a similar manner. They are manually defined for each disc (see the Tutorial for examples), but may then be saved for repeated use. When saved, each object creates its own subdirectory within the corresponding LAYER directory:

EXPERIMENT
│
├── STACK 0
│   ├── image.tif
│   ├── metadata.json
│   └── layers
│       ├── 0
│       │   ├── metadata.json
│       │   ├── segmentation
│       │   │   └── ...
│       │   ├── measurements
│       │   │   └── ...
│       │   ├── annotation
│       │   │   └── ...
│       │   ├── correction
│       │   │   └── ...
│       │   └── selection
│       │       └── ...
│       ├── 1
│       └── ... M
├── STACK 1
└── ... STACK N

The added subdirectories include all the files and metadata necessary to load and execute the data processing operations performed by the respective object. Saved operations are automatically applied to the raw measurement data each time a layer is loaded, appending a number of additional features to the layer.data DataFrame:

  • chN_predicted: estimated bleedthrough contribution into the Nth channel

  • chNc: bleedthrough-corrected intensity of the Nth channel

  • chNc_normalized: normalized bleedthrough-corrected intensity of the Nth channel

  • selected: boolean flag indicating whether the segment falls within the ROI

  • boundary: boolean flag indicating whether the segment lies within a boundary region

  • manual_label: segment label manually assigned using FlyEye Silhouette

Furthermore, the annotation module may be used to assign one or more labels to each segment. Users are free to specify the names of these additional features as they please.