Converters

class pygna.converters.Converters[source]

This class is wrap static methods that can be used to convert the data from a format to another. Please refer to each class method for the specific function

classmethod convert_e2s(geneset: pandas.core.frame.DataFrame, tsv_data: pandas.core.frame.DataFrame, entrez_col: str = 'NCBI Gene ID', symbol_col: str = 'Approved symbol') → list[source]

Method to convert the entrez genes to symbols

Parameters:
  • tsv_data – the dataframe to work on
  • symbol_col – the column containing the symbols
  • entrez_col – the column containing the entrez ID
  • geneset – column containing the entrez to convert
Returns:

list containing the string names

Example

>>> gmt_data = rc.ReadGmt(".gmt", True).get_data()
>>> converted = []
>>> for k, d in gmt_data.items():
>>>     converted[k] = Converters.convert_e2s(d["genes"], tsv_data,entrez_col, symbol_col)
classmethod convert_s2e(geneset: pandas.core.frame.DataFrame, tsv_data: pandas.core.frame.DataFrame, entrez_col: str = 'NCBI Gene ID', symbol_col: str = 'Approved symbol') → list[source]

Method to convert the genes symbols to entrez id.

Parameters:
  • tsv_data – the dataframe to work on
  • symbol_col – the column containing the symbols
  • entrez_col – the column containing the entrez ID
  • geneset – column containing the strings to convert
Returns:

list containing the entrez names

Example

>>> gmt_data = rc.ReadGmt(".gmt", True).get_data()
>>> converted = []
>>> for k, d in gmt_data.items():
>>>    converted[k] = gmt_data[k]["genes"] = Converters.convert_s2e(d["genes"], tsv_data, entrez_col, symbol_col)
class pygna.converters.CsvToCsvEnriched(csv_file: pandas.core.frame.DataFrame, conversion: str, original_name_col: str, new_name_col: str, geneset: str, entrez_col: str, symbol_col: str, converter_map_filename: str = 'entrez_name.tsv', output_file: str = None)[source]

Class that is used to add a column with the entrezID or Symbols to a CSV file

get_data() → pandas.core.frame.DataFrame[source]

Return the conversion result

Returns:dataframe with the e2s or s2e added as column
class pygna.converters.CsvToGmt(input_file: str, setname: str, filter_column: str, alternative: str, threshold: float, output_gmt: str = None, output_csv: str = None, name_column: str = 'Unnamed: 0', descriptor: str = None)[source]

This Class converts a csv file to a GMT allowing to filter the elements using the values of one of the columns. The user can specify the column used to retrieve the name of the objects and the filter condition. The output can be either a GMT with the names of the genes that pass the filter or a csv with the whole filtered table, otherwise both can be created.

class pygna.converters.GmtToGmtEnriched(gmt_file: str, output_gmt_file: str, conversion: str, entrez_col: str, symbol_col: str, converter_map_filename: str = 'entrez_name.tsv')[source]

This Class is used to convert a GMT file, adding information about the Entrez ID or the symbol

class pygna.converters.GroupGmt(input_table: str, output_gmt: str, name_col: str = 'Gene', group_col: str = 'Cancer', descriptor: str = 'cancer_genes')[source]

This function generates a GMT file of multiple setnames. From the table file, it groups the names in the group_col (the column you want to use to group them) and prints the genes in the name_col. Set the descriptor according to your needs