Analyze high-throughput sequencing of T and B cell receptors • LymphoSeq2

Adaptive Immune Receptor Repertoire Sequencing (AIRR-seq) provides a unique opportunity to interrogate the adaptive immune repertoire under various clinical conditions. The utility offered by this technology has quickly garnered interest from a community of clinicians and researchers investigating the immunological landscapes of a large spectrum of health and disease states. LymphoSeq2 is a toolkit that allows users to import, manipulate and visualize AIRR-Seq data from various AIRR-Seq assays such as Adaptive ImmunoSEQ and BGI-IRSeq, with support for 10X VDJ sequencing coming soon. The platform also supports the importing of AIRR-seq data processed using the MiXCR pipeline. The vignette highlights some of the key features of LymphoSeq2.

Installation

To install the latest version of LymphoSeq2 you can use the devtools package and install LymphoSeq2 from GitHub

# install.packages("devtools")
devtools::install_github("shashidhar22/LymphoSeq2", build_vignettes = TRUE)

Getting started

To import AIRR-Seq data using LymphoSeq2 we can use the readImmunoSeq function. Currently the function can import data from MiXCR, Adaptive ImmunoSEQ, BGI IR-SEQ, and 10X Genomic single cell VDJ rearrangements.

library(LymphoSeq2)
#> Loading required package: data.table
#> Registered S3 methods overwritten by 'ggalt':
#>   method                  from   
#>   grid.draw.absoluteGrob  ggplot2
#>   grobHeight.absoluteGrob ggplot2
#>   grobWidth.absoluteGrob  ggplot2
#>   grobX.absoluteGrob      ggplot2
#>   grobY.absoluteGrob      ggplot2
study_files <- system.file("extdata", "TCRB_sequencing", package = "LymphoSeq2") 
study_table <- LymphoSeq2::readImmunoSeq(study_files)
#> Registered S3 methods overwritten by 'readr':
#>   method                    from 
#>   as.data.frame.spec_tbl_df vroom
#>   as_tibble.spec_tbl_df     vroom
#>   format.col_spec           vroom
#>   print.col_spec            vroom
#>   print.collector           vroom
#>   print.date_names          vroom
#>   print.locale              vroom
#>   str.col_spec              vroom

To get a quick summary of repertoire characteristics, use the clonality function. This will calculate many standard repertoire diversity metrics such clonality, gini coefficient, convergence, and unique productive sequence for each of the repertoires in the input dataset.

summary_table <- LymphoSeq2::clonality(study_table)
summary_table
#> # A tibble: 10 × 8
#>    repertoire_id     total_seq…¹ uniqu…² total…³ clona…⁴ gini_…⁵ top_p…⁶ conve…⁷
#>    <chr>                   <int>   <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 TRB_CD4_949              1000     845   25769   0.443   0.867   30.1     1   
#>  2 TRB_CD8_949              1000     794   26239   0.431   0.903   19.3     1.01
#>  3 TRB_CD8_CMV_369           414     281    1794   0.332   0.761   16.5     1.12
#>  4 TRB_Unsorted_0           1000     838   18161   0.281   0.818    5.77    1   
#>  5 TRB_Unsorted_1320        1000     838  178190   0.422   0.902   14.6     1   
#>  6 TRB_Unsorted_1496        1000     832   33669   0.389   0.881   14.2     1   
#>  7 TRB_Unsorted_32           920     767   31078   0.134   0.601    4.87    1.01
#>  8 TRB_Unsorted_369         1000     830  339413   0.426   0.845   29.7     1   
#>  9 TRB_Unsorted_83          1000     823  236732   0.338   0.777   23.6     1   
#> 10 TRB_Unsorted_949         1000     831    6549   0.306   0.765   13.8     1   
#> # … with abbreviated variable names ¹total_sequences,
#> #   ²unique_productive_sequences, ³total_count, ⁴clonality, ⁵gini_coefficient,
#> #   ⁶top_productive_sequence, ⁷convergence

To compare samples with varying depth of sequencing, you can use the clonality function to sample down all repertoires to a minimum number of sequences. Since we randomly sample sequences from each repertoire, in this mode the clonality function will repeat this operation for a user specified number of iterations and caculate the average value for all the diversity metrics.

sampled_summary <- LymphoSeq2::clonality(study_table, rarefy = TRUE, iterations = 5, min_count = 1000)
sampled_summary
#> # A tibble: 10 × 8
#>    repertoire_id     total_seq…¹ uniqu…² total…³ clona…⁴ gini_…⁵ top_p…⁶ conve…⁷
#>    <chr>                   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1 TRB_CD4_949              156.    129     1000  0.319    0.735   31.3     1   
#>  2 TRB_CD8_949              192.    154.    1000  0.293    0.734   18.6     1.01
#>  3 TRB_CD8_CMV_369          266     183.    1000  0.303    0.723   17.1     1.09
#>  4 TRB_Unsorted_0           254.    212.    1000  0.158    0.616    6.11    1.00
#>  5 TRB_Unsorted_1320        191.    156.    1000  0.277    0.726   14.6     1.02
#>  6 TRB_Unsorted_1496        206     168.    1000  0.260    0.710   14.0     1.00
#>  7 TRB_Unsorted_32          417.    349.    1000  0.0952   0.466    5.54    1.01
#>  8 TRB_Unsorted_369         245.    203     1000  0.336    0.711   30.2     1   
#>  9 TRB_Unsorted_83          320.    264.    1000  0.261    0.639   23.4     1   
#> 10 TRB_Unsorted_949         301.    249.    1000  0.222    0.630   13.0     1.00
#> # … with abbreviated variable names ¹total_sequences,
#> #   ²unique_productive_sequences, ³total_count, ⁴clonality, ⁵gini_coefficient,
#> #   ⁶top_productive_sequence, ⁷convergence