Statistical Comparison

class pygna.statistical_comparison.StatisticalComparison(comparison_statistic, network: networkx.classes.graph.Graph, n_proc: int = 1, diz: dict = {}, degree_bins=1)[source]

This class implements the statistical analysis comparison between two genesets. Please refer to the single method documentation for the returning values

comparison_empirical_pvalue(genesetA: set, genesetB: set, alternative: str = 'less', max_iter: int = 100, keep: bool = False) → [<class 'int'>, <class 'float'>, <class 'float'>, <class 'int'>, <class 'int'>][source]

Calculate the empirical value between two genesets

Parameters:
  • genesetA – the first geneset to compare
  • genesetB – the second geneset to compare
  • alternative – the pvalue selection of the observed genes
  • max_iter – the maximum number of iterations
  • keep – if the geneset B should not be kept
Return observed, pvalue, null_distribution, len(mapped_genesetA), len(mapped_genesetB):
 

the list with the data calculated

get_comparison_null_distribution(genesetA: list, genesetB: list, n_samples: int, keep: bool, sampling_p_a=None, sampling_p_b=None) → list[source]

Calculate the null distribution between two genesets with single CPU

Parameters:
  • genesetA – the first geneset to compare
  • genesetB – the second geneset to compare
  • n_samples – the number of samples to be taken
  • keep – if the geneset B should not be kept
Returns:

the random distribution calculated

get_comparison_null_distribution_mp(genesetA: list, genesetB: list, max_iter: int = 100, keep: bool = False, sampling_p_a=None, sampling_p_b=None) → numpy.ndarray[source]

Calculate the null distribution between two genesets with multiple CPUs

Parameters:
  • genesetA – the first geneset to compare
  • genesetB – the second geneset to compare
  • max_iter – maximum number of iteration to perform
  • keep – if the geneset B should not be kept
  • sampling_p_a – random sampling probability for geneset a
  • sampling_p_b – random sampling probability for geneset b
Returns:

the array with null distribution

pygna.statistical_comparison.comparison_shortest_path(network: networkx.classes.graph.Graph, genesetA: set, genesetB: set, diz: dict) → float[source]

Evaluate the shortest path between two genesets

Parameters:
  • network – the graph representing the network
  • genesetA – the first geneset list
  • genesetB – the second geneset list
  • diz – the dictionary containing the nodes name and index
pygna.statistical_comparison.calculate_sum(n: numpy.ndarray, m: numpy.ndarray, diz: dict) → numpy.ndarray[source]

Evaluate the sum of the columns of two matrices

Parameters:
  • n – the first column
  • m – the second column
  • diz – the dictionary containing the data
pygna.statistical_comparison.comparison_random_walk(network: networkx.classes.graph.Graph, genesetA: list, genesetB: list, diz: dict = {}) → float[source]

Evaluate the random walk on two genesets

Parameters:
  • network – the graph representing the network
  • genesetA – the first geneset list
  • genesetB – the second geneset list
  • diz – the dictionary containing the nodes name and index