A table of gene ids, symbols, aliases, previous aliases, and names from the Human Genome Naming Consortium, corresponding to genome build GRCh38. Data is from the HGNC gene groups and protein-coding genes tables. Aliases, previous aliases and names have been split to contain one entry per row, with an additional "symbol_type" column giving the source of the symbol (e.g. HGNC_SYMBOL, alias_symbol).
The HGNC table has been filtered by BIOTYPE to remove pseudogenes, read-through genes, RNA genes, mitochondrial genes and genes of unknown biotype.
Format
A data frame with 109313 rows and 9 variables:
- HGNC_ID
HGNC gene IDs
- ENSEMBL_ID
Ensembl gene ID, from HGNC
- ENTREZ_ID
ENTREZ (NCBI gene) ID, from HGNC
- UNIPROT_ID
UNIPROT ID, from HGNC
- BIOTYPE
Type of gene
- HGNC_SYMBOL
HGNC gene symbol
- symbol_type
Source of the "value" column, e.g. "HGNC_SYMBOL", "HGNC_NAME"
- value
A gene symbol, symbol alias, or name
- SOURCE
Source of the data. Here always "HGNC"