Converters¶
-
class
pygna.converters.
Converters
[source]¶ This class is wrap static methods that can be used to convert the data from a format to another. Please refer to each class method for the specific function
-
classmethod
convert_e2s
(geneset: pandas.core.frame.DataFrame, tsv_data: pandas.core.frame.DataFrame, entrez_col: str = 'NCBI Gene ID', symbol_col: str = 'Approved symbol') → list[source]¶ Method to convert the entrez genes to symbols
Parameters: - tsv_data – the dataframe to work on
- symbol_col – the column containing the symbols
- entrez_col – the column containing the entrez ID
- geneset – column containing the entrez to convert
Returns: list containing the string names
Example
>>> gmt_data = rc.ReadGmt(".gmt", True).get_data() >>> converted = [] >>> for k, d in gmt_data.items(): >>> converted[k] = Converters.convert_e2s(d["genes"], tsv_data,entrez_col, symbol_col)
-
classmethod
convert_s2e
(geneset: pandas.core.frame.DataFrame, tsv_data: pandas.core.frame.DataFrame, entrez_col: str = 'NCBI Gene ID', symbol_col: str = 'Approved symbol') → list[source]¶ Method to convert the genes symbols to entrez id.
Parameters: - tsv_data – the dataframe to work on
- symbol_col – the column containing the symbols
- entrez_col – the column containing the entrez ID
- geneset – column containing the strings to convert
Returns: list containing the entrez names
Example
>>> gmt_data = rc.ReadGmt(".gmt", True).get_data() >>> converted = [] >>> for k, d in gmt_data.items(): >>> converted[k] = gmt_data[k]["genes"] = Converters.convert_s2e(d["genes"], tsv_data, entrez_col, symbol_col)
-
classmethod
-
class
pygna.converters.
CsvToCsvEnriched
(csv_file: pandas.core.frame.DataFrame, conversion: str, original_name_col: str, new_name_col: str, geneset: str, entrez_col: str, symbol_col: str, converter_map_filename: str = 'entrez_name.tsv', output_file: str = None)[source]¶ Class that is used to add a column with the entrezID or Symbols to a CSV file
-
class
pygna.converters.
CsvToGmt
(input_file: str, setname: str, filter_column: str, alternative: str, threshold: float, output_gmt: str = None, output_csv: str = None, name_column: str = 'Unnamed: 0', descriptor: str = None)[source]¶ This Class converts a csv file to a GMT allowing to filter the elements using the values of one of the columns. The user can specify the column used to retrieve the name of the objects and the filter condition. The output can be either a GMT with the names of the genes that pass the filter or a csv with the whole filtered table, otherwise both can be created.
-
class
pygna.converters.
GmtToGmtEnriched
(gmt_file: str, output_gmt_file: str, conversion: str, entrez_col: str, symbol_col: str, converter_map_filename: str = 'entrez_name.tsv')[source]¶ This Class is used to convert a GMT file, adding information about the Entrez ID or the symbol
-
class
pygna.converters.
GroupGmt
(input_table: str, output_gmt: str, name_col: str = 'Gene', group_col: str = 'Cancer', descriptor: str = 'cancer_genes')[source]¶ This function generates a GMT file of multiple setnames. From the table file, it groups the names in the group_col (the column you want to use to group them) and prints the genes in the name_col. Set the descriptor according to your needs