Inference¶
This module contains the implementation of LineageOT used by the core functions.

class
lineageot.inference.
NeighborJoinNode
(subtree, subtree_root, has_global_root)¶ Bases:
object

lineageot.inference.
OT_cost
(coupling, cost)¶

lineageot.inference.
add_conditional_means_and_variances
(tree, observed_nodes)¶ Adds the mean and variance of the posterior on ‘x’ for each of the unobserved nodes, conditional on the observed values of ‘x’ in observed_nodes, assuming that differences along edges are Gaussian with variance equal to the length of the edge.
In doing so, also adds inverse time annotations to edges.
If no nodes in tree are observed, inverse time annotations are added but conditional means and variances are not (as there is nothing to condition on).

lineageot.inference.
add_division_times_from_vertex_times
(tree, current_node='root')¶ Adds ‘time_to_parent’ variables to nodes, based on ‘time’ annotations

lineageot.inference.
add_inverse_times_to_edges
(tree)¶ Labels each edge of the tree with ‘inverse time’ equal to 1/edge[‘time’]

lineageot.inference.
add_leaf_barcodes
(tree, barcode_array)¶ Adds barcodes from barcode_array to the corresponding leaves of the tree

lineageot.inference.
add_leaf_times
(tree, final_time)¶ Adds the known final time to all leaves and 0 as the root time

lineageot.inference.
add_leaf_x
(tree, x_array)¶ Adds expression vectors from x_array to the corresponding leaves of the tree

lineageot.inference.
add_node_times_from_dict
(tree, current_node, time_dict)¶ Adds times from time_dict to current_node and its descendants

lineageot.inference.
add_node_times_from_division_times
(tree, current_node='root', overwrite=False)¶ Adds ‘time’ variable to all descendants of current_node based on the ‘time_to_parent’ variable

lineageot.inference.
add_nodes_at_time
(tree, time_to_add, current_node='root', num_nodes_added=0)¶ Splits every edge (u,v) where u[‘time’] < time_to_add < v[‘time’]
into (u, w) and (w, v) with w[‘time’] = time_to_add
Newly added nodes {w} are labeled as tuples (time_to_add, i)
The input tree should be annotated with node times already

lineageot.inference.
add_samples_to_clone_tree
(clone_matrix, clone_times, clone_reference_tree, sampling_time)¶ Adds a leaf for each row in clone_matrix to clone_reference_tree. The parent is set as the clone that the cell is a member of with the latest labeling time.
clone_reference_tree is edited in place rather than returned.
 Parameters
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.
clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.
clone_reference_tree – The tree of lineage relationships among clones.
sampling_time (Number) – The time of sampling of the cells. Should be greater than all clone labeling times.

lineageot.inference.
add_times
(tree, mutation_rates, time_inference_method, overwrite=False)¶ Adds estimated division times/edge lengths to a tree
The tree should already have all node barcodes estimated

lineageot.inference.
add_times_to_edges
(tree)¶ Labels each edge of tree with ‘time’ taken from ‘time_to_parent’ of its endpoint

lineageot.inference.
annotate_tree
(tree, mutation_rates, time_inference_method='independent', overwrite_times=False)¶ Adds barcodes and times to internal (ancestor) nodes so likelihoods can be computed
Barcodes are inferred by putting minimizing the number of mutation events required, assuming a model with no back mutations and a known initial barcode

lineageot.inference.
barcode_distances
(barcode_array)¶ Computes all pairwise lineage distances between barcodes

lineageot.inference.
compute_leaf_times
(tree, num_leaves)¶ Computes the list of times of the leaves by adding ‘time_to_parent’ along the path to ‘root’

lineageot.inference.
compute_new_distances
(distance_matrix, nodes_to_join)¶

lineageot.inference.
compute_q_matrix
(distance_matrix)¶ Computes the Qmatrix for neighbor joining

lineageot.inference.
compute_tree_distances
(tree)¶ Computes the matrix of pairwise distances between leaves of the tree

lineageot.inference.
convert_newick_to_networkx
(newick_tree, leaf_labels, leaf_time=None, root_label='root', unlabeled_nodes_added=0, at_global_root=True)¶ Converts a tree from the Newick package’s format to LineageOT’s annotated NetworkX DiGraph. Ignores existing annotations, except edge lengths.
 Parameters
newick_tree (newick.Node or [newick.Node]) – A tree loaded by the Newick package.
leaf_labels (list) – The label of each leaf in the Newick tree, sorted to align with the gene expression AnnData object filtered to cells at the corresponding time
leaf_time (float (default None)) – The time of sampling of the leaves. If unspecified, the root of the tree is assigned time 0.
root_label (str (default 'root')) – The label of the root node of the tree
unlabeled_nodes_added (int (default 0)) – The number of previouslyunlabeled nodes that have already been added to the tree. Leave as 0 for any toplevel use of the function.
at_global_root (bool (default True)) – Whether the function is being called to convert a full tree or a subtree.
 Returns
tree – The saved tree, in LineageOT’s format. Each node is annotated with ‘time_to_parent’ and ‘time’ (which indicates either the time of sampling (for observed cells) or the time of division (for unobserved ancestors)). Edges are directed from parent to child and are annotated with ‘time’ equal to the child node’s ‘time_to_parent’. Observed node indices correspond to their index in leaf_labels.
 Return type
Networkx DiGraph

lineageot.inference.
cvxopt_qp_from_numpy
(P, q, G, h)¶ Converts arguments to cvxopt matrices and runs cvxopt’s quadratic programming solver

lineageot.inference.
distances_to_joined_node
(distance_matrix, nodes_to_join)¶

lineageot.inference.
estimate_division_time
(child, parent, mutation_rates)¶ Estimates the lifetime of child, i.e. the time between when parent divided to make child and when child divided
Input arguments are nodes in a lineage tree, i.e. dicts

lineageot.inference.
extract_ancestor_data_arrays
(late_tree, time, params)¶ Returns arrays of the RNA expression and barcodes for ancestors of leaves of the tree
Each row of each array is a leaf node

lineageot.inference.
extract_data_arrays
(tree)¶ Returns arrays of the RNA expression and barcodes from leaves of the tree
Each row of each array is a cell

lineageot.inference.
find_parent_clone
(clone, clone_matrix, clone_times)¶ Returns the parent of a subclone, assuming this is uniquely defined as the clone from an earlier time point whose barcode was observed in a cell from the subclone.
 Parameters
clone (int) – Index of clone whose parent will be returned
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.
clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.
 Returns
parent – Index of parent clone.
 Return type
int

lineageot.inference.
get_ancestor_data
(tree, time, leaf=None)¶

lineageot.inference.
get_components
(graph, edge_length_key='time')¶ Returns subgraph views corresponding to connected components of the graph if edges of infinite length are removed
 Parameters
graph (NetworkX graph) –
edge_length_key (default 'time') –
 Returns
subgraphs
 Return type
List of NetworkX subgraph views

lineageot.inference.
get_internal_nodes
(tree)¶ Returns a list of the nonleaf nodes of a tree

lineageot.inference.
get_leaf_descendants
(tree, node)¶ Returns a list of the leaf nodes of the tree that are descendants of node

lineageot.inference.
get_leaves
(tree, include_root=True)¶ Returns a list of the leaf nodes of a tree including the root

lineageot.inference.
get_lineage_distances_across_time
(early_tree, late_tree)¶ Returns the matrix of lineage distances between leaves of early_tree and leaves in late_tree. Assumes that early_tree is a truncated version of late_tree

lineageot.inference.
get_parent_clone_of_leaf
(leaf, clone_matrix, clone_times)¶ Returns the index of the clone that the leaf is a member of with the latest labeling time.

lineageot.inference.
get_true_coupling
(early_tree, late_tree)¶ Returns the coupling between leaves of early_tree and their descendants in late_tree. Assumes that early_tree is a truncated version of late_tree
The marginal over the early cells is uniform; if cells have different numbers of descendants, the marginal over late cells will not be uniform.

lineageot.inference.
join_nodes
(node1, node2, new_root, distances)¶

lineageot.inference.
list_tree_to_digraph
(list_tree)¶ Converts a tree stored as nested lists to a networkx DiGraph
Internal nodes are indexed by negative integers, leaves by nonnegative integers

lineageot.inference.
make_clone_reference_tree
(clone_matrix, clone_times, root_time= inf)¶ Makes a tree with nodes for each clone.
 Parameters
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.
clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.
root_time (Number, default np.inf) – The time of the most recent common ancestor of all clones. If np.inf, clone subtrees are effectively treated independently.
 Returns
clone_reference_tree – A tree of clones (not sampled cells), annotated with edge and node times
 Return type
NetworkX DiGraph

lineageot.inference.
make_tree_from_clones
(clone_matrix, time, clone_times, root_time= inf)¶ Adds a leaf for each row in clone_matrix to clone_reference_tree. The parent is set as the clone that the cell is a member of with the latest labeling time.
clone_reference_tree is edited in place rather than returned.
 Parameters
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.
clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.
time (Number) – The time of sampling of cells.
root_time (Number, default np.inf) – The time of the most recent common ancestor of all clones. If np.inf, clones are effectively treated as unrelated
 Returns
fitted_tree – A tree annotated with edge and node times
 Return type
NetworkX DiGraph

lineageot.inference.
make_tree_from_nonnested_clones
(clone_matrix, time, root_time_factor=1000)¶ Creates a forest of stars from clonallylabeled data. The centers of the stars are connected to a root far in the past.
 Parameters
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise. Each cell should belong to exactly one clone.
time (Number) – The time of sampling of cells relative to initial clonal labelling.
root_time_factor (Number, default 1000) – Relative time to root of tree (i.e., most recent common ancestor of all cells). The time of the root is set to root_time_factor*time. The default is large so minimal information is shared across clones.
 Returns
fitted_tree – A tree annotated with edge and node times
 Return type
NetworkX DiGraph

lineageot.inference.
neighbor_join
(distance_matrix)¶ Creates a tree by neighbor joining with the input distance matrix
Final row/column of distance_matrix assumed to correspond to the root (unmutated) barcode

lineageot.inference.
pick_joined_nodes
(Q)¶ In default neighbor joining, returns the indices of the pair of nodes with the lowest Q value
TODO: extend to allow stochastic neighbor joining

lineageot.inference.
rate_estimator
(barcode_array, time)¶ Estimates the mutation rate based on the number of unmutated barcodes remaining.

lineageot.inference.
recursive_add_barcodes
(tree, current_node)¶ Fills in the barcodes for internal nodes for a tree whose leaves have barcodes
Minimizes the number of mutation events that occur, assuming no backmutations and a known initial barcode

lineageot.inference.
recursive_list_tree_to_digraph
(list_tree, next_internal_node, next_leaf_node)¶ Recursive helper function for list_tree_to_digraph
Returns (current_tree, next_internal_node_label, root_of_current_tree)

lineageot.inference.
remove_node_and_descendants
(tree, node)¶ Removes a node and all its descendants from the tree

lineageot.inference.
remove_times
(tree)¶ Removes time annotations from nodes and edges of a tree

lineageot.inference.
resample_cells
(tree, params, current_node='root', inplace=False)¶ Runs a new simulation of the cell evolution on a fixed tree

lineageot.inference.
robinson_foulds
(tree1, tree2)¶ Computes the RobinsonFoulds distance between two trees

lineageot.inference.
scaled_Hamming_distance
(barcode1, barcode2)¶ Computes the distance between two barcodes, adjusted for
the number of sites where both cells were measured and
distance between two scars is twice the distance from
scarred to unscarred

lineageot.inference.
split_edge
(tree, edge, new_node)¶

lineageot.inference.
subtree_to_ete3
(tree, current_root)¶ Converts the subtree from current_root to ete3 format

lineageot.inference.
tree_accuracy
(tree1, tree2)¶ Returns the fraction of nontrivial splits appearing in both trees

lineageot.inference.
tree_discrepancy
(tree1, tree2)¶ Computes a version of the RobinsonFoulds distance between two trees rescaled to be between 0 and 1

lineageot.inference.
tree_to_ete3
(tree)¶ Converts a tree to ete3 format. Useful for calculating RobinsonFoulds distance.

lineageot.inference.
truncate_tree
(tree, new_end_time, params, inplace=False, current_node='root', next_leaf_to_add=0)¶ Removes all nodes at times greater than new_end_time and adds new leaves at exactly new_end_time
params: simulation parameters used to create tree