Statistical Comparison¶

class pygna.statistical_comparison.StatisticalComparison(comparison_statistic, network: networkx.classes.graph.Graph, n_proc: int = 1, diz: dict = {}, degree_bins=1)[source]¶

This class implements the statistical analysis comparison between two genesets. Please refer to the single method documentation for the returning values

comparison_empirical_pvalue(genesetA: set, genesetB: set, alternative: str = 'less', max_iter: int = 100, keep: bool = False) → [<class 'int'>, <class 'float'>, <class 'float'>, <class 'int'>, <class 'int'>][source]¶

Calculate the empirical value between two genesets

Return observed, pvalue, null_distribution, len(mapped_genesetA), len(mapped_genesetB):
Parameters:	genesetA – the first geneset to compare genesetB – the second geneset to compare alternative – the pvalue selection of the observed genes max_iter – the maximum number of iterations keep – if the geneset B should not be kept
	the list with the data calculated

get_comparison_null_distribution(genesetA: list, genesetB: list, n_samples: int, keep: bool, sampling_p_a=None, sampling_p_b=None) → list[source]¶

Calculate the null distribution between two genesets with single CPU

Parameters:	genesetA – the first geneset to compare genesetB – the second geneset to compare n_samples – the number of samples to be taken keep – if the geneset B should not be kept
Returns:	the random distribution calculated

get_comparison_null_distribution_mp(genesetA: list, genesetB: list, max_iter: int = 100, keep: bool = False, sampling_p_a=None, sampling_p_b=None) → numpy.ndarray[source]¶

Calculate the null distribution between two genesets with multiple CPUs

Parameters:	genesetA – the first geneset to compare genesetB – the second geneset to compare max_iter – maximum number of iteration to perform keep – if the geneset B should not be kept sampling_p_a – random sampling probability for geneset a sampling_p_b – random sampling probability for geneset b
Returns:	the array with null distribution

pygna.statistical_comparison.comparison_shortest_path(network: networkx.classes.graph.Graph, genesetA: set, genesetB: set, diz: dict) → float[source]¶

Evaluate the shortest path between two genesets

Parameters:	network – the graph representing the network genesetA – the first geneset list genesetB – the second geneset list diz – the dictionary containing the nodes name and index

pygna.statistical_comparison.calculate_sum(n: numpy.ndarray, m: numpy.ndarray, diz: dict) → numpy.ndarray[source]¶

Evaluate the sum of the columns of two matrices

Parameters:	n – the first column m – the second column diz – the dictionary containing the data

pygna.statistical_comparison.comparison_random_walk(network: networkx.classes.graph.Graph, genesetA: list, genesetB: list, diz: dict = {}) → float[source]¶

Evaluate the random walk on two genesets

Parameters:	network – the graph representing the network genesetA – the first geneset list genesetB – the second geneset list diz – the dictionary containing the nodes name and index