Perform multiple sequence alignment using one of three methods and output results to the console or as a pdf file. One may perform the alignment of all amino acid or nucleotide sequences in a single repertoire_id. Alternatively, one may search for a given sequence within a list of samples using an edit distance threshold.
Usage
alignSeq(
study_table,
repertoire_ids = NULL,
sequence_list = NULL,
edit_distance = 15,
type = "junction",
method = "ClustalOmega",
top = 150
)
Arguments
- study_table
A tibble consisting of antigen receptor sequences imported by the LymphoSeq2 function
readImmunoSeq()
.- repertoire_ids
A character vector indicating the name of the repertoire_id in the productive sequence list.
- sequence_list
A character vector of one ore more amino acid or nucleotide CDR3 sequences to search.
- edit_distance
An integer giving the minimum edit distance that the sequence must be less than or equal to. See details below.
- type
A character vector indicating whether "junction_aa" or "junction" sequences should be aligned. If "junction_aa" is specified, then run
productiveSeq()
first.- method
A character vector indicating the multiple sequence alignment method to be used. Refer to the Bioconductor "msa" package for more details. Options include "ClustalW", "ClustalOmega", and "Muscle".
- top
The number of top sequences to perform alignment.
Details
Edit distance is a way of quantifying how dissimilar two sequences are to one another by counting the minimum number of operations required to transform one sequence into the other. For example, an edit distance of 0 means the sequences are identical and an edit distance of 1 indicates that the sequences different by a single amino acid or junction.
See also
If having trouble saving pdf files, refer to Bioconductor package msa for installation instructions http://bioconductor.org/packages/release/bioc/vignettes/msa/inst/doc/msa.pdf
Examples
file_path <- system.file("extdata", "IGH_sequencing", package = "LymphoSeq2")
study_table <- LymphoSeq2::readImmunoSeq(path = file_path, threads = 1)
#> Registered S3 methods overwritten by 'readr':
#> method from
#> as.data.frame.spec_tbl_df vroom
#> as_tibble.spec_tbl_df vroom
#> format.col_spec vroom
#> print.col_spec vroom
#> print.collector vroom
#> print.date_names vroom
#> print.locale vroom
#> str.col_spec vroom
study_table <- LymphoSeq2::topSeqs(study_table, top = 100)
nucleotide_table <- LymphoSeq2::productiveSeq(study_table, aggregate = "junction")
LymphoSeq2::alignSeq(nucleotide_table,
repertoire_ids = "IGH_MVQ92552A_BL", type = "junction",
method = "ClustalW"
)
#> use default substitution matrix
#> CLUSTAL 2.1
#>
#> Call:
#> msa::msa(string_list, method = method)
#>
#> MsaDNAMultipleAlignment with 42 rows and 179 columns
#> aln names
#> [1] -------------------------...CCACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_23
#> [2] -------------------------...CGCTATGGACGTCTGGGGCCAAGGG IGH_MVQ92552A_BL_36
#> [3] -------------------------...CTACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_40
#> [4] -------------------------...CCACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_6
#> [5] ------------------------G...-GGTATGGACGTCTGGGGCCAAGGG IGH_MVQ92552A_BL_38
#> [6] ---------------------CGCG...CTACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_24
#> [7] ------------------------A...TGCTTTTGATGTTTGGGGCCAAGGG IGH_MVQ92552A_BL_3
#> [8] -------------------------...CGCTATGGACGTCTGGGGCCAAGGG IGH_MVQ92552A_BL_11
#> [9] -------------------------...CTACTTTGACGACTGGGGCCAGGGA IGH_MVQ92552A_BL_8
#> ... ...
#> [35] ------------CACCATCTCCAGA...--TCTTTGAATACTGGGGCCAGGGA IGH_MVQ92552A_BL_12
#> [36] ---------AGTCACGATTACCGCG...--GTTCGGGGAATTGGGGCCAGGGA IGH_MVQ92552A_BL_5
#> [37] ------------------GACAACA...--CTTTTGATTTTTGGGGCCAAGGG IGH_MVQ92552A_BL_34
#> [38] ---------------CATGACCAGG...--ACTTTGACTACTGGGGCCAGGGA IGH_MVQ92552A_BL_21
#> [39] ---------------------CGCG...--GCTTTGACCAGTGGGGCCAGGGA IGH_MVQ92552A_BL_25
#> [40] ------------------CTCCAGA...--TCCTCGACTATTGGGGCCAGGGA IGH_MVQ92552A_BL_29
#> [41] ------------------CTCCAGA...--ACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_28
#> [42] ----------------------GCC...CTACATGGACGTCTGGGGCAAAGGG IGH_MVQ92552A_BL_37
#> Con ----------------------???...--?CTT?GAC?ACTGGGGCCAGGGA Consensus