Welcome to LineageOT’s documentation!

LineageOT is a package for analyzing lineage-traced single-cell sequencing time series. It extends Waddington-OT to compute temporal couplings using measurements of both gene expression and lineage trees. The LineageOT couplings can be used directly by the downstream analysis tools of the Waddington-OT package, which we do not duplicate here. For full details, see our paper.

All of the functionality required for running LineageOT is in the core module. The remaining modules have implementation functions and code for reproducing analyses in the paper.

The source code, with installation instructions and examples, is available at https://github.com/aforr/LineageOT.

Core pipeline

lineageot.core.fit_lineage_coupling(adata, time_1, time_2, lineage_tree_t2, time_key='time', state_key=None, epsilon=0.05, normalize_cost=True, ot_method='sinkhorn', marginal_1=[], marginal_2=[], balance_reg=inf)

Fits a LineageOT coupling between the cells in adata at time_1 and time_2. In the process, annotates the lineage tree with observed and estimated cell states.

Parameters
  • adata (AnnData) – Annotated data matrix

  • time_1 (Number) – The earlier time point in adata. All times are relative to the root of the tree.

  • time_2 (Number) – The later time point in adata. All times are relative to the root of the tree.

  • lineage_tree_t2 (Networkx DiGraph) – The lineage tree fitted to cells at time_2. Nodes should already be annotated with times. Annotations related to cell state will be added.

  • time_key (str (default 'time')) – Key in adata.obs and lineage_tree_t2 containing cells’ time labels

  • state_key (str (default None)) – Key in adata.obsm containing cell states. If None, uses adata.X.

  • epsilon (float (default 0.05)) – Entropic regularization parameter for optimal transport

  • normalize_cost (bool (default True)) – Whether to rescale the cost matrix by its median before fitting a coupling. Normalizing this way allows us to choose a reasonable default epsilon for data of any scale

  • ot_method (str (default 'sinkhorn')) – Method used for the optimal transport solver. Choose from ‘sinkhorn’, ‘greenkhorn’, ‘sinkhorn_stabilized’ and ‘sinkhorn_epsilon_scaling’ for balanced transport and ‘sinkhorn’, ‘sinkhorn_stabilized’, and ‘sinkhorn_reg_scaling’ for unbalanced transport. ‘sinkhorn’ is recommended unless you encounter numerical problems. See PythonOT docs for more details.

  • marginal_1 (Vector (default [])) – Marginal distribution (relative growth rates) for cells at time 1. If empty, assumed uniform.

  • marginal_2 (Vector (default [])) – Marginal distribution (relative growth rates) for cells at time 2. If empty, assumed uniform.

  • balance_reg (Number) – Regularization parameter for unbalanced transport. Smaller values allow more flexibility in growth rates. If infinite, marginals are treated as hard constraints.

Returns

coupling – AnnData containing the lineage coupling. Cells from time_1 are in coupling.obs, cells from time_2 are in coupling.var, and the coupling matrix is coupling.X

Return type

AnnData

lineageot.core.fit_tree(adata, time, barcodes_key='barcodes', clones_key='X_clone', clone_times=None, method='neighbor join')

Fits a lineage tree to lineage barcodes of all cells in adata. To compute the lineage tree for a specific time point, filter adata before calling fit_tree. The fitted tree is annotated with node times but not states.

Parameters
  • adata (AnnData) – Annotated data matrix with lineage-traced cells

  • time (Number) – Time of sampling of the cells of adata relative to most recent common ancestor (for dynamic lineage tracing) or labeling time (for static lineage tracing).

  • barcodes_key (str, default 'barcodes') – Key in adata.obsm containing cell barcodes. Ignored if using clonal data. If using barcode data, each row of adata.obsm[barcodes_key] should be a barcode where each entry corresponds to a possibly-mutated site. A positive number indicates an observed mutation, zero indicates no mutation, and -1 indicates the site was not observed.

  • clones_key (str, default 'X_clone') – Key in adata.obsm containing clonal data. Ignored if using barcodes directly. If using clonal data, adata.obsm[clones_key] should be a num_cells x num_clones boolean matrix. Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.

  • clone_times (Vector of length num_clones, default None) – Ignored unless method is ‘clones’. Each entry contains the time of labeling of the corresponding column of adata.obsm[clones_key].

  • method (str) – Inference method used to fit tree. Current options are ‘neighbor join’ (for barcodes from dynamic lineage tracing), ‘non-nested clones’ (for non-nested clones from static lineage tracing), or ‘clones’ (for possibly-nested clones from static lineage tracing).

Returns

tree – A fitted lineage tree. Each node is annotated with ‘time_to_parent’ and ‘time’ (which indicates either the time of sampling (for observed cells) or the time of division (for unobserved ancestors)). Edges are directed from parent to child and are annotated with ‘time’ equal to the child node’s ‘time_to_parent’. Observed node indices correspond to their row in adata.

Return type

Networkx DiGraph

lineageot.core.read_newick(filename, leaf_labels, leaf_time=None)

Loads a tree saved in Newick format and adds annotations required for LineageOT.

Parameters
  • filename (str) – The name of the file to load from.

  • leaf_labels (list) – The label of each leaf in the Newick tree, sorted to align with the gene expression AnnData object filtered to cells at the corresponding time.

  • leaf_time (float (default None)) – The time of sampling of the leaves. If unspecified, the root of the tree is assigned time 0.

Returns

tree – The saved tree, in LineageOT’s format. Each node is annotated with ‘time_to_parent’ and ‘time’ (which indicates either the time of sampling (for observed cells) or the time of division (for unobserved ancestors)). Edges are directed from parent to child and are annotated with ‘time’ equal to the child node’s ‘time_to_parent’. Observed node indices correspond to their index in leaf_labels, which should match their row in the gene expression AnnData object filtered to cells at the corresponding time.

Return type

Networkx DiGraph

lineageot.core.save_coupling_as_tmap(coupling, time_1, time_2, tmap_out)

Saves a LineageOT coupling for downstream analysis with Waddington-OT. A sequence of saved couplings can be loaded in wot with wot.tmap.TransportMapModel.from_directory(tmap_out)

Parameters
  • coupling (AnnData) – The coupling to save.

  • time_1 (Number) – The earlier time point in adata. All times are relative to the root of the tree.

  • time_2 (Number) – The later time point in adata. All times are relative to the root of the tree.

  • tmap_out (str) – The path and prefix to the save file name.