Align multiple sequences

Perform multiple sequence alignment using one of three methods and output results to the console or as a pdf file. One may perform the alignment of all amino acid or junction sequences in a single repertoire_id. Alternatively, one may search for a given sequence within a list of samples using an edit distance threshold.

Usage

alignSeq(
  study_table,
  repertoire_ids = NULL,
  sequence_list = NULL,
  edit_distance = 15,
  type = "junction",
  method = "ClustalOmega",
  top = 150
)

Arguments

study_table: A tibble consisting of antigen receptor sequences imported by the LymphoSeq function readImmunoSeq.
sequence_list: A character vector of one ore more amino acid or junction CDR3 sequences to search.
edit_distance: An integer giving the minimum edit distance that the sequence must be less than or equal to. See details below.
type: A character vector indicating whether "junction_aa" or "junction" sequences should be aligned. If "junction_aa" is specified, then run productiveSeqs first.
method: A character vector indicating the multiple sequence alignment method to be used. Refer to the Bioconductor msa package for more details. Options include "ClustalW", "ClustalOmega", and "Muscle".
top: The number of top sequences to perform alignment.
repertoire_id: A character vector indicating the name of the repertoire_id in the productive sequence list.

Value

Performs a multiple sequence alignment object.

Details

Edit distance is a way of quantifying how dissimilar two sequences are to one another by counting the minimum number of operations required to transform one sequence into the other. For example, an edit distance of 0 means the sequences are identical and an edit distance of 1 indicates that the sequences different by a single amino acid or junction.

Examples

file_path <- system.file("extdata", "IGH_sequencing", package = "LymphoSeqTest")

stable <- readImmunoSeq(path = file_path)
#> Rows: 1 Columns: 144
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (69): sequence_id, sequence, sequence_aa, locus, v_call, d_call, d2_call...
#> dbl (70): v_score, v_identity, v_support, d_score, d_identity, d_support, d2...
#> lgl  (5): rev_comp, productive, vj_in_frame, stop_codon, complete_vdj
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 694 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): vFamilyTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFun...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 41 rows [14, 15, 33, 36, 48, 78, 119, 123, 130, 135, 149, 167, 176, 190, 198, 210, 245, 247, 250, 262, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 1000 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (8): vFamilyTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFun...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 58 rows [31, 33, 40, 41, 90, 96, 109, 117, 146, 154, 178, 189, 238, 252, 255, 260, 270, 278, 315, 320, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 694 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): vFamilyTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFun...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 41 rows [14, 15, 33, 36, 48, 78, 119, 123, 130, 135, 149, 167, 176, 190, 198, 210, 245, 247, 250, 262, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 694 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (26): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (9): vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFunction, fracti...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 10 rows [204, 206, 265, 347, 410, 411, 419, 512, 582, 608].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 492 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (18): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (9): jGeneAlleleTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, ...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 3 rows [134, 143, 251].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 209 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): jGeneAlleleTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, ...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 20 rows [4, 27, 34, 37, 52, 53, 55, 69, 81, 87, 88, 90, 95, 108, 111, 131, 151, 158, 160, 200].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 436 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (25): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): jGeneAlleleTies, vOrphon, dOrphon, jOrphon, vFunction, dFunction, ...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 47 rows [21, 22, 28, 59, 63, 69, 78, 79, 82, 87, 90, 91, 116, 121, 149, 170, 182, 188, 216, 237, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 1000 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (26): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (17): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (9): vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFunction, fracti...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 27 rows [117, 121, 146, 157, 178, 199, 296, 310, 322, 323, 324, 325, 349, 351, 363, 420, 421, 467, 468, 484, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 1000 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (26): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (18): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl  (8): vOrphon, dOrphon, jOrphon, vFunction, dFunction, jFunction, vAlign...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 85 rows [38, 58, 79, 83, 92, 119, 127, 145, 149, 161, 162, 169, 187, 191, 199, 237, 250, 272, 275, 283, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")
#> Rows: 275 Columns: 52
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (24): nucleotide, aminoAcid, vMaxResolved, vFamilyName, vGeneName, vGene...
#> dbl (18): count (reads), frequencyCount (%), cdr3Length, vDeletion, n1Insert...
#> lgl (10): vFamilyTies, jGeneAlleleTies, vOrphon, dOrphon, jOrphon, vFunction...
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Warning: Expected 2 pieces. Additional pieces discarded in 24 rows [9, 29, 40, 42, 61, 84, 87, 101, 104, 106, 108, 119, 146, 170, 177, 192, 201, 206, 214, 248, ...].
#> Joining, by = c("sequence", "sequence_aa", "v_call", "d_call", "d2_call",
#> "j_call", "junction", "junction_aa", "duplicate_count", "clone_id",
#> "repertoire_id")

ntable <- productiveSeq(stable, aggregate = "junction")

alignSeq(ntable, repertoire_id = "IGH_MVQ92552A_BL", type = "junction", 
         method = "ClustalW")
#> use default substitution matrix
#> CLUSTAL 2.1  
#> 
#> Call:
#>    msa::msa(string_list, method = method)
#> 
#> MsaDNAMultipleAlignment with 65 rows and 186 columns
#>      aln                                                   names
#>  [1] -------------------------...CTTTTGATATCTGGGGCCAAGGG-- IGH_MVQ92552A_BL
#>  [2] ------------------------G...GTATGGACGTCTGGGGCCAAGGG-- IGH_MVQ92552A_BL
#>  [3] ------------------GACAACA...CTTTTGATTTTTGGGGCCAAGGG-- IGH_MVQ92552A_BL
#>  [4] ---------------------CGCG...ACATGGACGTCTGGGGCAAAGGG-- IGH_MVQ92552A_BL
#>  [5] ------------------------A...CTTTTGATGTTTGGGGCCAAGGG-- IGH_MVQ92552A_BL
#>  [6] -------------------------...ACATGGACGTCTGGGGCAAAGGG-- IGH_MVQ92552A_BL
#>  [7] -------------------------...CTATGGACGTCTGGGGCCAAGGG-- IGH_MVQ92552A_BL
#>  [8] -------------------------...ACATGGACGTCTGGGGCAAAGGG-- IGH_MVQ92552A_BL
#>  [9] -------------------------...ACATGGACGTCTGGGGCAAAGGG-- IGH_MVQ92552A_BL 
#>  ... ...
#> [58] ------CAGGGTCACCATGACCAGG...----CCTTA-CTGGGGCCAGGGA-- IGH_MVQ92552A_BL
#> [59] ---------------CATGACCAGG...----TGACTACTGGGGCCAGGGA-- IGH_MVQ92552A_BL
#> [60] ---------------------CGCG...--TTGACCTACTGGGGCCAGGGA-- IGH_MVQ92552A_BL
#> [61] -------------------------...GGTTCAGATACTGGGGCCAGGGA-- IGH_MVQ92552A_BL
#> [62] ---------------CATCTCCAGA...A--TTGACTACTGGGGCCAGGGA-- IGH_MVQ92552A_BL
#> [63] ------------------CTCCAGA...ACATGGACGTCTGGGGCAAAGGG-- IGH_MVQ92552A_BL
#> [64] ----------------------GCC...ACATGGACGTCTGGGGCAAAGGG-- IGH_MVQ92552A_BL
#> [65] -------------------------...TGTGCCCCCTAGGAGGCCACACTGG IGH_MVQ92552A_BL
#>  Con ------------------------?...?T????AC??CTGGGGCCAGGGA-- Consensus

Usage

Arguments

Value

Details

See also

Examples