Clonal relatedness — clonalRelatedness • LymphoSeqTest

Calculates the clonal relatedness for each repertoire_id in a study.

Usage

clonalRelatedness(study_table, editDistance = 10)

Arguments

study_table: A tibble of unproductive or productive junction sequences or productive junction sequences. Junction and duplicate_count are required columns.
editDistance: An integer giving the minimum edit distance that the sequence must be less than or equal to. See details below.

Value

Returns a tibble with the calculated clonal relatedness for each repertoire_id.

Details

Clonal relatedness is the proportion of junction sequences that are related by a defined edit distance threshold. The value ranges from 0 to 1 where 0 indicates no sequences are related and 1 indicates all sequences are related.

Edit distance is a way of quantifying how dissimilar two sequences are to one another by counting the minimum number of operations required to transform one sequence into the other. For example, an edit distance of 0 means the sequences are identical and an edit distance of 1 indicates that the sequences different by a single amino acid or junction.

Examples

file_path <- system.file("extdata", "IGH_sequencing", package = "LymphoSeqTest")

stable <- readImmunoSeq(path = file_path)
#> Rows: 1 Columns: 144
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (69): sequence_id, sequence, sequence_aa, locus, v_call, d_call, d2_call...
#> dbl (70): v_score, v_identity, v_support, d_score, d_identity, d_support, d2...
#> lgl  (5): rev_comp, productive, vj_in_frame, stop_codon, complete_vdj
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 694 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): vFamilyTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFun...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 41 rows [14, 15, 33, 36, 48, 78, 119, 123, 130, 135, 149, 167, 176, 190, 198, 210, 245, 247, 250, 262, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 1000 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (8): vFamilyTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFun...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 58 rows [31, 33, 40, 41, 90, 96, 109, 117, 146, 154, 178, 189, 238, 252, 255, 260, 270, 278, 315, 320, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 694 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): vFamilyTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFun...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 41 rows [14, 15, 33, 36, 48, 78, 119, 123, 130, 135, 149, 167, 176, 190, 198, 210, 245, 247, 250, 262, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 694 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (26): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (9): vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFunction, fracti...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 10 rows [204, 206, 265, 347, 410, 411, 419, 512, 582, 608].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 492 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (18): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (9): jGeneAlleleTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, ...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 3 rows [134, 143, 251].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 209 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): jGeneAlleleTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, ...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 20 rows [4, 27, 34, 37, 52, 53, 55, 69, 81, 87, 88, 90, 95, 108, 111, 131, 151, 158, 160, 200].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 436 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): jGeneAlleleTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, ...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 47 rows [21, 22, 28, 59, 63, 69, 78, 79, 82, 87, 90, 91, 116, 121, 149, 170, 182, 188, 216, 237, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 1000 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (26): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (9): vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFunction, fracti...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 27 rows [117, 121, 146, 157, 178, 199, 296, 310, 322, 323, 324, 325, 349, 351, 363, 420, 421, 467, 468, 484, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 1000 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (26): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (18): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (8): vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFunction, vAlign...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 85 rows [38, 58, 79, 83, 92, 119, 127, 145, 149, 161, 162, 169, 187, 191, 199, 237, 250, 272, 275, 283, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 275 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (24): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (18): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): vFamilyTies, jGeneAlleleTies, vOrphon, dOrphon, jOrphon, vFunction...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 24 rows [9, 29, 40, 42, 61, 84, 87, 101, 104, 106, 108, 119, 146, 170, 177, 192, 201, 206, 214, 248, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")

clonal_relatedness <- clonalRelatedness(stable, editDistance = 10)

# Merge results with clonality table
clonality <- clonality(stable)
merged <- dplyr::full_join(clonality, clonal_relatedness, by = "repertoire_id")