Inference¶
This module contains the implementation of LineageOT used by the core functions.
-
class
lineageot.inference.
NeighborJoinNode
(subtree, subtree_root, has_global_root)¶ Bases:
object
-
lineageot.inference.
OT_cost
(coupling, cost)¶
-
lineageot.inference.
add_conditional_means_and_variances
(tree, observed_nodes)¶ Adds the mean and variance of the posterior on ‘x’ for each of the unobserved nodes, conditional on the observed values of ‘x’ in observed_nodes, assuming that differences along edges are Gaussian with variance equal to the length of the edge.
In doing so, also adds inverse time annotations to edges.
If no nodes in tree are observed, inverse time annotations are added but conditional means and variances are not (as there is nothing to condition on).
-
lineageot.inference.
add_division_times_from_vertex_times
(tree, current_node='root')¶ Adds ‘time_to_parent’ variables to nodes, based on ‘time’ annotations
-
lineageot.inference.
add_inverse_times_to_edges
(tree)¶ Labels each edge of the tree with ‘inverse time’ equal to 1/edge[‘time’]
-
lineageot.inference.
add_leaf_barcodes
(tree, barcode_array)¶ Adds barcodes from barcode_array to the corresponding leaves of the tree
-
lineageot.inference.
add_leaf_times
(tree, final_time)¶ Adds the known final time to all leaves and 0 as the root time
-
lineageot.inference.
add_leaf_x
(tree, x_array)¶ Adds expression vectors from x_array to the corresponding leaves of the tree
-
lineageot.inference.
add_node_times_from_dict
(tree, current_node, time_dict)¶ Adds times from time_dict to current_node and its descendants
-
lineageot.inference.
add_node_times_from_division_times
(tree, current_node='root', overwrite=False)¶ Adds ‘time’ variable to all descendants of current_node based on the ‘time_to_parent’ variable
-
lineageot.inference.
add_nodes_at_time
(tree, time_to_add, current_node='root', num_nodes_added=0)¶ Splits every edge (u,v) where u[‘time’] < time_to_add < v[‘time’]
into (u, w) and (w, v) with w[‘time’] = time_to_add
Newly added nodes {w} are labeled as tuples (time_to_add, i)
The input tree should be annotated with node times already
-
lineageot.inference.
add_samples_to_clone_tree
(clone_matrix, clone_times, clone_reference_tree, sampling_time)¶ Adds a leaf for each row in clone_matrix to clone_reference_tree. The parent is set as the clone that the cell is a member of with the latest labeling time.
clone_reference_tree is edited in place rather than returned.
- Parameters
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.
clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.
clone_reference_tree – The tree of lineage relationships among clones.
sampling_time (Number) – The time of sampling of the cells. Should be greater than all clone labeling times.
-
lineageot.inference.
add_times
(tree, mutation_rates, time_inference_method, overwrite=False)¶ Adds estimated division times/edge lengths to a tree
The tree should already have all node barcodes estimated
-
lineageot.inference.
add_times_to_edges
(tree)¶ Labels each edge of tree with ‘time’ taken from ‘time_to_parent’ of its endpoint
-
lineageot.inference.
annotate_tree
(tree, mutation_rates, time_inference_method='independent', overwrite_times=False)¶ Adds barcodes and times to internal (ancestor) nodes so likelihoods can be computed
Barcodes are inferred by putting minimizing the number of mutation events required, assuming a model with no back mutations and a known initial barcode
-
lineageot.inference.
barcode_distances
(barcode_array)¶ Computes all pairwise lineage distances between barcodes
-
lineageot.inference.
compute_leaf_times
(tree, num_leaves)¶ Computes the list of times of the leaves by adding ‘time_to_parent’ along the path to ‘root’
-
lineageot.inference.
compute_new_distances
(distance_matrix, nodes_to_join)¶
-
lineageot.inference.
compute_q_matrix
(distance_matrix)¶ Computes the Q-matrix for neighbor joining
-
lineageot.inference.
compute_tree_distances
(tree)¶ Computes the matrix of pairwise distances between leaves of the tree
-
lineageot.inference.
convert_newick_to_networkx
(newick_tree, leaf_labels, leaf_time=None, root_label='root', unlabeled_nodes_added=0, at_global_root=True)¶ Converts a tree from the Newick package’s format to LineageOT’s annotated NetworkX DiGraph. Ignores existing annotations, except edge lengths.
- Parameters
newick_tree (newick.Node or [newick.Node]) – A tree loaded by the Newick package.
leaf_labels (list) – The label of each leaf in the Newick tree, sorted to align with the gene expression AnnData object filtered to cells at the corresponding time
leaf_time (float (default None)) – The time of sampling of the leaves. If unspecified, the root of the tree is assigned time 0.
root_label (str (default 'root')) – The label of the root node of the tree
unlabeled_nodes_added (int (default 0)) – The number of previously-unlabeled nodes that have already been added to the tree. Leave as 0 for any top-level use of the function.
at_global_root (bool (default True)) – Whether the function is being called to convert a full tree or a subtree.
- Returns
tree – The saved tree, in LineageOT’s format. Each node is annotated with ‘time_to_parent’ and ‘time’ (which indicates either the time of sampling (for observed cells) or the time of division (for unobserved ancestors)). Edges are directed from parent to child and are annotated with ‘time’ equal to the child node’s ‘time_to_parent’. Observed node indices correspond to their index in leaf_labels.
- Return type
Networkx DiGraph
-
lineageot.inference.
cvxopt_qp_from_numpy
(P, q, G, h)¶ Converts arguments to cvxopt matrices and runs cvxopt’s quadratic programming solver
-
lineageot.inference.
distances_to_joined_node
(distance_matrix, nodes_to_join)¶
-
lineageot.inference.
estimate_division_time
(child, parent, mutation_rates)¶ Estimates the lifetime of child, i.e. the time between when parent divided to make child and when child divided
Input arguments are nodes in a lineage tree, i.e. dicts
-
lineageot.inference.
extract_ancestor_data_arrays
(late_tree, time, params)¶ Returns arrays of the RNA expression and barcodes for ancestors of leaves of the tree
Each row of each array is a leaf node
-
lineageot.inference.
extract_data_arrays
(tree)¶ Returns arrays of the RNA expression and barcodes from leaves of the tree
Each row of each array is a cell
-
lineageot.inference.
find_parent_clone
(clone, clone_matrix, clone_times)¶ Returns the parent of a subclone, assuming this is uniquely defined as the clone from an earlier time point whose barcode was observed in a cell from the subclone.
- Parameters
clone (int) – Index of clone whose parent will be returned
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.
clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.
- Returns
parent – Index of parent clone.
- Return type
int
-
lineageot.inference.
get_ancestor_data
(tree, time, leaf=None)¶
-
lineageot.inference.
get_components
(graph, edge_length_key='time')¶ Returns subgraph views corresponding to connected components of the graph if edges of infinite length are removed
- Parameters
graph (NetworkX graph) –
edge_length_key (default 'time') –
- Returns
subgraphs
- Return type
List of NetworkX subgraph views
-
lineageot.inference.
get_internal_nodes
(tree)¶ Returns a list of the non-leaf nodes of a tree
-
lineageot.inference.
get_leaf_descendants
(tree, node)¶ Returns a list of the leaf nodes of the tree that are descendants of node
-
lineageot.inference.
get_leaves
(tree, include_root=True)¶ Returns a list of the leaf nodes of a tree including the root
-
lineageot.inference.
get_lineage_distances_across_time
(early_tree, late_tree)¶ Returns the matrix of lineage distances between leaves of early_tree and leaves in late_tree. Assumes that early_tree is a truncated version of late_tree
-
lineageot.inference.
get_parent_clone_of_leaf
(leaf, clone_matrix, clone_times)¶ Returns the index of the clone that the leaf is a member of with the latest labeling time.
-
lineageot.inference.
get_true_coupling
(early_tree, late_tree)¶ Returns the coupling between leaves of early_tree and their descendants in late_tree. Assumes that early_tree is a truncated version of late_tree
The marginal over the early cells is uniform; if cells have different numbers of descendants, the marginal over late cells will not be uniform.
-
lineageot.inference.
join_nodes
(node1, node2, new_root, distances)¶
-
lineageot.inference.
list_tree_to_digraph
(list_tree)¶ Converts a tree stored as nested lists to a networkx DiGraph
Internal nodes are indexed by negative integers, leaves by nonnegative integers
-
lineageot.inference.
make_clone_reference_tree
(clone_matrix, clone_times, root_time=- inf)¶ Makes a tree with nodes for each clone.
- Parameters
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.
clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.
root_time (Number, default -np.inf) – The time of the most recent common ancestor of all clones. If -np.inf, clone subtrees are effectively treated independently.
- Returns
clone_reference_tree – A tree of clones (not sampled cells), annotated with edge and node times
- Return type
NetworkX DiGraph
-
lineageot.inference.
make_tree_from_clones
(clone_matrix, time, clone_times, root_time=- inf)¶ Adds a leaf for each row in clone_matrix to clone_reference_tree. The parent is set as the clone that the cell is a member of with the latest labeling time.
clone_reference_tree is edited in place rather than returned.
- Parameters
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise.
clone_times (Vector of length num_clones) – Each entry has the time of labeling of the corresponding clone.
time (Number) – The time of sampling of cells.
root_time (Number, default -np.inf) – The time of the most recent common ancestor of all clones. If -np.inf, clones are effectively treated as unrelated
- Returns
fitted_tree – A tree annotated with edge and node times
- Return type
NetworkX DiGraph
-
lineageot.inference.
make_tree_from_nonnested_clones
(clone_matrix, time, root_time_factor=1000)¶ Creates a forest of stars from clonally-labeled data. The centers of the stars are connected to a root far in the past.
- Parameters
clone_matrix (Boolean array with shape [num_cells, num_clones]) – Each entry is 1 if the corresponding cell belongs to the corresponding clone and zero otherwise. Each cell should belong to exactly one clone.
time (Number) – The time of sampling of cells relative to initial clonal labelling.
root_time_factor (Number, default 1000) – Relative time to root of tree (i.e., most recent common ancestor of all cells). The time of the root is set to -root_time_factor*time. The default is large so minimal information is shared across clones.
- Returns
fitted_tree – A tree annotated with edge and node times
- Return type
NetworkX DiGraph
-
lineageot.inference.
neighbor_join
(distance_matrix)¶ Creates a tree by neighbor joining with the input distance matrix
Final row/column of distance_matrix assumed to correspond to the root (unmutated) barcode
-
lineageot.inference.
pick_joined_nodes
(Q)¶ In default neighbor joining, returns the indices of the pair of nodes with the lowest Q value
TODO: extend to allow stochastic neighbor joining
-
lineageot.inference.
rate_estimator
(barcode_array, time)¶ Estimates the mutation rate based on the number of unmutated barcodes remaining.
-
lineageot.inference.
recursive_add_barcodes
(tree, current_node)¶ Fills in the barcodes for internal nodes for a tree whose leaves have barcodes
Minimizes the number of mutation events that occur, assuming no backmutations and a known initial barcode
-
lineageot.inference.
recursive_list_tree_to_digraph
(list_tree, next_internal_node, next_leaf_node)¶ Recursive helper function for list_tree_to_digraph
Returns (current_tree, next_internal_node_label, root_of_current_tree)
-
lineageot.inference.
remove_node_and_descendants
(tree, node)¶ Removes a node and all its descendants from the tree
-
lineageot.inference.
remove_times
(tree)¶ Removes time annotations from nodes and edges of a tree
-
lineageot.inference.
resample_cells
(tree, params, current_node='root', inplace=False)¶ Runs a new simulation of the cell evolution on a fixed tree
-
lineageot.inference.
robinson_foulds
(tree1, tree2)¶ Computes the Robinson-Foulds distance between two trees
-
lineageot.inference.
scaled_Hamming_distance
(barcode1, barcode2)¶ Computes the distance between two barcodes, adjusted for
the number of sites where both cells were measured and
distance between two scars is twice the distance from
scarred to unscarred
-
lineageot.inference.
split_edge
(tree, edge, new_node)¶
-
lineageot.inference.
subtree_to_ete3
(tree, current_root)¶ Converts the subtree from current_root to ete3 format
-
lineageot.inference.
tree_accuracy
(tree1, tree2)¶ Returns the fraction of nontrivial splits appearing in both trees
-
lineageot.inference.
tree_discrepancy
(tree1, tree2)¶ Computes a version of the Robinson-Foulds distance between two trees rescaled to be between 0 and 1
-
lineageot.inference.
tree_to_ete3
(tree)¶ Converts a tree to ete3 format. Useful for calculating Robinson-Foulds distance.
-
lineageot.inference.
truncate_tree
(tree, new_end_time, params, inplace=False, current_node='root', next_leaf_to_add=0)¶ Removes all nodes at times greater than new_end_time and adds new leaves at exactly new_end_time
params: simulation parameters used to create tree