Skip to contents

Separate names of antibodies against multi-subunit proteins e.g. CD235ab, CD66ace into one subunit per row.

Two subunit patterns are considered. For the first, subunits are lower case letters and the gene name has no separator, e.g. CD66ace is composed of subunits CD66a, CD66b and CD66c. For the second pattern, subunits are written with uppercase letters and are separated with a "-", e.g. HLA-A/C/E is composed of subunits HLA-A, HLA-C and HLA-E. Both patterns require at least at least 2 capital letters or numbers followed by at least 2 possible subunits. There may be a separator between the groups and/or between the lower case letters. At present, the between group separators are -, . and space, and the between subunit separators are / and .

Subunits should be converted from Greek symbols before applying this function.

At present user-supplied regex patterns are not supported

Usage

separateSubunits(df, ab = "Antigen", new_col = "subunit")

Arguments

df

A data.frame or tibble

ab

(character(1), default "Antigen) Name of the column containing antibody names

new_col

(default: subunit) Name of new column containing guesses for single subunit names