Inference

This module contains the implementation of LineageOT used by the core functions.

class lineageot.inference.NeighborJoinNode(subtree, subtree_root, has_global_root)

Bases: object

lineageot.inference.OT_cost(coupling, cost)
lineageot.inference.add_conditional_means_and_variances(tree, observed_nodes)

Adds the mean and variance of the posterior on ‘x’ for each of the unobserved nodes, conditional on the observed values of ‘x’ in observed_nodes, assuming that differences along edges are Gaussian with variance equal to the length of the edge.

In doing so, also adds inverse time annotations to edges.

If no nodes in tree are observed, inverse time annotations are added but conditional means and variances are not (as there is nothing to condition on).

lineageot.inference.add_division_times_from_vertex_times(tree, current_node='root')

Adds ‘time_to_parent’ variables to nodes, based on ‘time’ annotations

lineageot.inference.add_inverse_times_to_edges(tree)

Labels each edge of the tree with ‘inverse time’ equal to 1/edge[‘time’]

lineageot.inference.add_leaf_barcodes(tree, barcode_array)

Adds barcodes from barcode_array to the corresponding leaves of the tree

lineageot.inference.add_leaf_times(tree, final_time)

Adds the known final time to all leaves and 0 as the root time

lineageot.inference.add_leaf_x(tree, x_array)

Adds expression vectors from x_array to the corresponding leaves of the tree

lineageot.inference.add_node_times_from_dict(tree, current_node, time_dict)

Adds times from time_dict to current_node and its descendants

lineageot.inference.add_node_times_from_division_times(tree, current_node='root', overwrite=False)

Adds ‘time’ variable to all descendants of current_node based on the ‘time_to_parent’ variable

lineageot.inference.add_nodes_at_time(tree, time_to_add, current_node='root', num_nodes_added=0)

Splits every edge (u,v) where u[‘time’] < time_to_add < v[‘time’]

into (u, w) and (w, v) with w[‘time’] = time_to_add

Newly added nodes {w} are labeled as tuples (time_to_add, i)

The input tree should be annotated with node times already

lineageot.inference.add_samples_to_clone_tree(clone_matrix, clone_times, clone_reference_tree, sampling_time)

Adds a leaf for each row in clone_matrix to clone_reference_tree. The parent is set as the clone that the cell is a member of with the latest labeling time.

clone_reference_tree is edited in place rather than returned.

Parameters
  • clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.

  • clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.

  • clone_reference_tree – The tree of lineage relationships among clones.

  • sampling_time (Number) – The time of sampling of the cells. Should be greater than all clone labeling times.

lineageot.inference.add_times(tree, mutation_rates, time_inference_method, overwrite=False)

Adds estimated division times/edge lengths to a tree

The tree should already have all node barcodes estimated

lineageot.inference.add_times_to_edges(tree)

Labels each edge of tree with ‘time’ taken from ‘time_to_parent’ of its endpoint

lineageot.inference.annotate_tree(tree, mutation_rates, time_inference_method='independent', overwrite_times=False)

Adds barcodes and times to internal (ancestor) nodes so likelihoods can be computed

Barcodes are inferred by putting minimizing the number of mutation events required, assuming a model with no back mutations and a known initial barcode

lineageot.inference.barcode_distances(barcode_array)

Computes all pairwise lineage distances between barcodes

lineageot.inference.compute_leaf_times(tree, num_leaves)

Computes the list of times of the leaves by adding ‘time_to_parent’ along the path to ‘root’

lineageot.inference.compute_new_distances(distance_matrix, nodes_to_join)
lineageot.inference.compute_q_matrix(distance_matrix)

Computes the Q-matrix for neighbor joining

lineageot.inference.compute_tree_distances(tree)

Computes the matrix of pairwise distances between leaves of the tree

lineageot.inference.convert_newick_to_networkx(newick_tree, leaf_labels, leaf_time=None, root_label='root', unlabeled_nodes_added=0, at_global_root=True)

Converts a tree from the Newick package’s format to LineageOT’s annotated NetworkX DiGraph. Ignores existing annotations, except edge lengths.

Parameters
  • newick_tree (newick.Node or [newick.Node]) – A tree loaded by the Newick package.

  • leaf_labels (list) – The label of each leaf in the Newick tree, sorted to align with the gene expression AnnData object filtered to cells at the corresponding time

  • leaf_time (float (default None)) – The time of sampling of the leaves. If unspecified, the root of the tree is assigned time 0.

  • root_label (str (default 'root')) – The label of the root node of the tree

  • unlabeled_nodes_added (int (default 0)) – The number of previously-unlabeled nodes that have already been added to the tree. Leave as 0 for any top-level use of the function.

  • at_global_root (bool (default True)) – Whether the function is being called to convert a full tree or a subtree.

Returns

tree – The saved tree, in LineageOT’s format. Each node is annotated with ‘time_to_parent’ and ‘time’ (which indicates either the time of sampling (for observed cells) or the time of division (for unobserved ancestors)). Edges are directed from parent to child and are annotated with ‘time’ equal to the child node’s ‘time_to_parent’. Observed node indices correspond to their index in leaf_labels.

Return type

Networkx DiGraph

lineageot.inference.cvxopt_qp_from_numpy(P, q, G, h)

Converts arguments to cvxopt matrices and runs cvxopt’s quadratic programming solver

lineageot.inference.distances_to_joined_node(distance_matrix, nodes_to_join)
lineageot.inference.estimate_division_time(child, parent, mutation_rates)

Estimates the lifetime of child, i.e. the time between when parent divided to make child and when child divided

Input arguments are nodes in a lineage tree, i.e. dicts

lineageot.inference.extract_ancestor_data_arrays(late_tree, time, params)

Returns arrays of the RNA expression and barcodes for ancestors of leaves of the tree

Each row of each array is a leaf node

lineageot.inference.extract_data_arrays(tree)

Returns arrays of the RNA expression and barcodes from leaves of the tree

Each row of each array is a cell

lineageot.inference.find_parent_clone(clone, clone_matrix, clone_times)

Returns the parent of a subclone, assuming this is uniquely defined as the clone from an earlier time point whose barcode was observed in a cell from the subclone.

Parameters
  • clone (int) – Index of clone whose parent will be returned

  • clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.

  • clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.

Returns

parent – Index of parent clone.

Return type

int

lineageot.inference.get_ancestor_data(tree, time, leaf=None)
lineageot.inference.get_components(graph, edge_length_key='time')

Returns subgraph views corresponding to connected components of the graph if edges of infinite length are removed

Parameters
  • graph (NetworkX graph) –

  • edge_length_key (default 'time') –

Returns

subgraphs

Return type

List of NetworkX subgraph views

lineageot.inference.get_internal_nodes(tree)

Returns a list of the non-leaf nodes of a tree

lineageot.inference.get_leaf_descendants(tree, node)

Returns a list of the leaf nodes of the tree that are descendants of node

lineageot.inference.get_leaves(tree, include_root=True)

Returns a list of the leaf nodes of a tree including the root

lineageot.inference.get_lineage_distances_across_time(early_tree, late_tree)

Returns the matrix of lineage distances between leaves of early_tree and leaves in late_tree. Assumes that early_tree is a truncated version of late_tree

lineageot.inference.get_parent_clone_of_leaf(leaf, clone_matrix, clone_times)

Returns the index of the clone that the leaf is a member of with the latest labeling time.

lineageot.inference.get_true_coupling(early_tree, late_tree)

Returns the coupling between leaves of early_tree and their descendants in late_tree. Assumes that early_tree is a truncated version of late_tree

The marginal over the early cells is uniform; if cells have different numbers of descendants, the marginal over late cells will not be uniform.

lineageot.inference.join_nodes(node1, node2, new_root, distances)
lineageot.inference.list_tree_to_digraph(list_tree)

Converts a tree stored as nested lists to a networkx DiGraph

Internal nodes are indexed by negative integers, leaves by nonnegative integers

lineageot.inference.make_clone_reference_tree(clone_matrix, clone_times, root_time=- inf)

Makes a tree with nodes for each clone.

Parameters
  • clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.

  • clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.

  • root_time (Number, default -np.inf) – The time of the most recent common ancestor of all clones. If -np.inf, clone subtrees are effectively treated independently.

Returns

clone_reference_tree – A tree of clones (not sampled cells), annotated with edge and node times

Return type

NetworkX DiGraph

lineageot.inference.make_tree_from_clones(clone_matrix, time, clone_times, root_time=- inf)

Adds a leaf for each row in clone_matrix to clone_reference_tree. The parent is set as the clone that the cell is a member of with the latest labeling time.

clone_reference_tree is edited in place rather than returned.

Parameters
  • clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.

  • clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.

  • time (Number) – The time of sampling of cells.

  • root_time (Number, default -np.inf) – The time of the most recent common ancestor of all clones. If -np.inf, clones are effectively treated as unrelated

Returns

fitted_tree – A tree annotated with edge and node times

Return type

NetworkX DiGraph

lineageot.inference.make_tree_from_nonnested_clones(clone_matrix, time, root_time_factor=1000)

Creates a forest of stars from clonally-labeled data. The centers of the stars are connected to a root far in the past.

Parameters
  • clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise. Each cell should belong to exactly one clone.

  • time (Number) – The time of sampling of cells relative to initial clonal labelling.

  • root_time_factor (Number, default 1000) – Relative time to root of tree (i.e., most recent common ancestor of all cells). The time of the root is set to -root_time_factor*time. The default is large so minimal information is shared across clones.

Returns

fitted_tree – A tree annotated with edge and node times

Return type

NetworkX DiGraph

lineageot.inference.neighbor_join(distance_matrix)

Creates a tree by neighbor joining with the input distance matrix

Final row/column of distance_matrix assumed to correspond to the root (unmutated) barcode

lineageot.inference.pick_joined_nodes(Q)

In default neighbor joining, returns the indices of the pair of nodes with the lowest Q value

TODO: extend to allow stochastic neighbor joining

lineageot.inference.rate_estimator(barcode_array, time)

Estimates the mutation rate based on the number of unmutated barcodes remaining.

lineageot.inference.recursive_add_barcodes(tree, current_node)

Fills in the barcodes for internal nodes for a tree whose leaves have barcodes

Minimizes the number of mutation events that occur, assuming no backmutations and a known initial barcode

lineageot.inference.recursive_list_tree_to_digraph(list_tree, next_internal_node, next_leaf_node)

Recursive helper function for list_tree_to_digraph

Returns (current_tree, next_internal_node_label, root_of_current_tree)

lineageot.inference.remove_node_and_descendants(tree, node)

Removes a node and all its descendants from the tree

lineageot.inference.remove_times(tree)

Removes time annotations from nodes and edges of a tree

lineageot.inference.resample_cells(tree, params, current_node='root', inplace=False)

Runs a new simulation of the cell evolution on a fixed tree

lineageot.inference.robinson_foulds(tree1, tree2)

Computes the Robinson-Foulds distance between two trees

lineageot.inference.scaled_Hamming_distance(barcode1, barcode2)

Computes the distance between two barcodes, adjusted for

  1. the number of sites where both cells were measured and

  2. distance between two scars is twice the distance from

    scarred to unscarred

lineageot.inference.split_edge(tree, edge, new_node)
lineageot.inference.subtree_to_ete3(tree, current_root)

Converts the subtree from current_root to ete3 format

lineageot.inference.tree_accuracy(tree1, tree2)

Returns the fraction of nontrivial splits appearing in both trees

lineageot.inference.tree_discrepancy(tree1, tree2)

Computes a version of the Robinson-Foulds distance between two trees rescaled to be between 0 and 1

lineageot.inference.tree_to_ete3(tree)

Converts a tree to ete3 format. Useful for calculating Robinson-Foulds distance.

lineageot.inference.truncate_tree(tree, new_end_time, params, inplace=False, current_node='root', next_leaf_to_add=0)

Removes all nodes at times greater than new_end_time and adds new leaves at exactly new_end_time

params: simulation parameters used to create tree