Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions 01-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ ottrpal::set_knitr_image_path()

# Introduction

This course was developed in Summer 2023 and updated in Fall 2025. We welcome any feedback at help@pvactools.org or by submission of [GitHub issues](https://github.com/griffithlab/pVACtools_Intro_Course/issues).
This course was developed in Summer 2023 and last updated in Summer 2026. We welcome any feedback at help@pvactools.org or by submission of [GitHub issues](https://github.com/griffithlab/pVACtools_Intro_Course/issues).

## Motivation

Expand All @@ -24,7 +24,7 @@ prioritization, and selection using a graphical Web-based interface (pVACview),
vaccines. pVACtools is available at [http://www.pvactools.org](http://www.pvactools.org).

```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "pVACtools is a cancer immunotherapy tools suite"}
ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g3a37485c18b_1_0")
ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit?slide=id.g3e342b543ab_0_0#slide=id.g3e342b543ab_0_0")
```

## Background
Expand All @@ -43,7 +43,9 @@ stability and recognition by cytotoxic T cells [@Richters2019].
pVACtools can be used as the final step in a well-established variant calling pipeline. It leverages existing tools with functionality related to variant annotation
(Ensembl VEP [@McLaren2016]), identifying neoantigens from specific sources (e.g. fusions via star-fusion [@Haas2019], AGFusion [@Murphy2016], and Arriba [@Uhrig2021]),
HLA typing (OptiType [@Szolek2014], PHLAT [@Bai2018]), peptide-MHC binding prediction (IEDB [@Vita2018], NetMHCpan [@Jurtz2017], MHCflurry [@ODonnell2018],
MHCnuggets [@Shao2020]), peptide-MHC stability (NetMHCstabpan [@Rasmussen2016]], peptide processing (NetChop [@Nielsen2005]), manufacturability
MHCnuggets [@Shao2020], MixMHCpred [@Gfeller2023]), presentation (IEDB [@Vita2018], BigMHC [@Albert2023], MHCflurry[@ODonnell2018], MixMHC2pred [@Racle2023]),
immunogenicity (BigMHC [@Albert2023], DeepImmuno [@Li2021], ImmuoScope [@Shen2025], PRIME [@Gfeller2023]), peptide-MHC stability (NetMHCstabpan [@Rasmussen2016]],
peptide processing (NetChop [@Nielsen2005]), manufacturability
metrics (vaxrank [@Rubinsteyn2017]), and reference proteome similarity (BLAST [@Altschul1990]). Each of these tools tackles specific tasks within the broader goal of
antigen analysis and is utilized by pVACtools to provide an end-to-end integration of novel algorithms and established tools needed to discover, characterize, prioritize,
and utilize tumor-specific neoantigens in basic research and clinical applications. Combining pVACtools with existing variant calling pipelines provides an end-to-end
Expand Down
12 changes: 6 additions & 6 deletions 02-prerequisites.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -70,27 +70,27 @@ install.packages("colourpicker", dependencies=TRUE)

## Data

For this course, we have put together a set of input data generated from the breast
For this course, we have put together a set of input data generated from the breast
cancer cell line HCC1395 and a matched normal lymphoblastoid cell line HCC1395BL.
Data from this cell line is commonly used as test data in bioinformatics applications.
For more information on these lines and the generation of test data, please refer to
Data from this cell line is commonly used as test data in bioinformatics applications.
For more information on these lines and the generation of test data, please refer to
the [data section of our precision medicine bioinformatics course](https://pmbio.org/module-02-inputs/0002/05/01/Data/).

The input data consists of the following files:

For pVACseq:

- `annotated.expression.vcf.gz`: A somatic (tumor-normal) VCF and its tbi index file. The VCF has been
annotated with VEP and has coverage and expression information added. It has also been annotated with
custom VEP plugins that provide wild type and mutant versions of the full length protein sequences
annotated with VEP and has coverage and expression information added. It has also been annotated with
custom VEP plugins that provide wild type and mutant versions of the full length protein sequences
predicted to arise from each transcript annotated with each variant.
- `phased.vcf.gz`: A phased tumor-germline VCF and its tbi index file to provide information about
in-phase proximal variants that might alter the predicted peptide sequence around a somatic
mutation of interest.
- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions.

For more detailed information on how the variant input file is created, please refer to the
[input file preparation](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep.html)
[input file preparation](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep.html)
section of the pVACtools docs.

For pVACfuse:
Expand Down
87 changes: 48 additions & 39 deletions 03-running_pvactools.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,19 @@ mkdir pVACtools_outputs
docker run \
-v ${PWD}/HCC1395_inputs:/HCC1395_inputs \
-v ${PWD}/pVACtools_outputs:/pVACtools_outputs \
-it griffithlab/pvactools:6.0.3 \
-it griffithlab/pvactools:7.0.0 \
/bin/bash
```

This will pull the 6.0.3 version of the griffithlab/pvactools Docker image and
This will pull the 7.0.0 version of the griffithlab/pvactools Docker image and
start an interactive session (`-it`) of that Docker image using the bash shell (`/bin/bash`).
The `-v ${PWD}/HCC1395_inputs:/HCC1395_inputs`
part of the command will mount the
`HCC1395_inputs` folder at `/HCC1395_inputs` inside of the Docker container
so that you will have access to the input data from inside the Docker
container. The `-v ${PWD}/pVACtools_outputs:/pVACtools_outputs` part of the command
will mount the `pVACtools_outputs` folder you just created. We will write the
outputs from pVACseq and pVACfuse to that folder so that you will have access
outputs from pVACseq, pVACfuse, and pVACsplice to that folder so that you will have access
to it once you exit the Docker image.

## Running pVACseq
Expand Down Expand Up @@ -120,17 +120,14 @@ your run. Here are a list of parameters we generally recommend:
are considered by pVACseq. This flag will lead pVACseq to skip variants that
have a FILTER applied in the VCF to, e.g., exclude variants that were marked
as low quality by the variant caller.
- `--percentile-threshold`: When considering the peptide-MHC binding affinity
for filtering and prioritizing neoantigen candidates, by default only the
IC50 value is being used. Setting this parameter will additionally also filter
on the predicted percentile. We recommend a value of 2 (2%) for this
threshold.
- `--percentile-threshold-strategy`: When running pVACseq with a
`--percentile-threshold` set, this parameter will influence how both the
IC50 cutoff and the percentile cutoff are applied. The default,
`conservative`, will require a candidate to pass both the binding and the
percentile threshold, while the `exploratory` option will require a candidate
to only pass either the binding or the percentile threshold.
- `--use-normalized-percentiles`: Not all prediction algorithms supported by
pVACseq output a percentile rank. This option will calculate normalized percentiles
for class I epitopes of length 8-11 and all class I algorithms and the 1,000
most common human class I MHC alleles based on the same set of 100,000 reference
peptides. These percentiles will be used in place of percentiles natively
calculated by some algorithms. This ensures that all class I algorithms will
return a percentile score since some do not do so natively. It also ensures
that the percentiles are calculated consistently between all algorithms.

Additionally there are a number of parameters that might be useful depending
on your specific analysis needs:
Expand All @@ -147,6 +144,12 @@ on your specific analysis needs:
unstable. This parameter allows users to set their own rules as to which
peptides are considered problematic and peptides meeting those rules will be marked in the
pVACseq results and deprioritized.
- `--percentile-threshold-strategy`: By default, pVACseq will
filter and prioritize neoantigen candidates on the binding, presentation,
and immunogenicity percentiles in addition to the raw IC50 binding affinity.
A candidate will need to pass all thresholds. However, setting this parameter
to `exploratory` will relax this behavior and only require a candidate to
pass one of the thresholds.
- `--transcript-prioritization-strategy` and
`--maximum-transcript-support-level`: Generally, multiple transcripts of a
gene may code for a neoantigen candidate. When picking the best transcript
Expand Down Expand Up @@ -177,8 +180,8 @@ Given the considerations outlined above, let's run pVACseq on our sample data.

From the `optitype_normal_result.tsv` we know that the patient's class I alleles are
HLA-A\*29:02, HLA-B\*45:01, HLA-B\*82:02, and HLA-C\*06:02 (indicated that two of three class I
alleles are homozygous in this sample). We also have clinical typing information that confirms
these class I alleles as well as identifying DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the
alleles are homozygous in this sample). We also have clinical typing information that confirms
these class I alleles as well as identifying DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the
patient's class II alleles.

Note that where needed pVACseq will automatically create HLA class II dimer combinations using
Expand Down Expand Up @@ -274,17 +277,14 @@ usually apply. Here are a list of parameters we generally recommend:
neoantigen candidate in the reference proteome and report any hits found.
By default this is done using BLASTp but we recommend using a proteome FASTA
file via the `--peptide-fasta` parameter to speed up this step.
- `--percentile-threshold`: When considering the peptide-MHC binding affinity
for filtering and prioritizing neoantigen candidates, by default only the
IC50 value is being used. Setting this parameter will additionally also filter
on the predicted percentile. We recommend a value of 2 (2%) for this
threshold.
- `--percentile-threshold-strategy`: When running pVACfuse with a
`--percentile-threshold` set, this parameter will influence how both the
IC50 cutoff and the percentile cutoff are applied. The default,
`conservative`, will require a candidate to pass both the binding and the
percentile threshold, while the `exploratory` option will require a candidate
to only pass either the binding or the percentile threshold.
- `--use-normalized-percentiles`: Not all prediction algorithms supported by
pVACfuse output a percentile rank. This option will calculate normalized percentiles
for class I epitopes of length 8-11 and all class I algorithms and the 1,000
most common human class I MHC alleles based on the same set of 100,000 reference
peptides. These percentiles will be used in place of percentiles natively
calculated by some algorithms. This ensures that all class I algorithms will
return a percentile score since some do not do so natively. It also ensures
that the percentiles are calculated consistently between all algorithms.

Additionally there are a number of parameters that might be useful depending
on your specific analysis needs:
Expand All @@ -298,6 +298,12 @@ on your specific analysis needs:
unstable. This parameter allows users to set their own rules as to which
peptides are considered problematic and peptides meeting those rules will be marked in the
pVACfuse results and deprioritized.
- `--percentile-threshold-strategy`: By default, pVACfuse will
filter and prioritize neoantigen candidates on the binding, presentation,
and immunogenicity percentiles in addition to the raw IC50 binding affinity.
A candidate will need to pass all thresholds. However, setting this parameter
to `exploratory` will relax this behavior and only require a candidate to
pass one of the thresholds.
- `--threads`: This argument will allow pVACfuse to run in multi-processing
mode.
- `--keep-tmp-files`: Setting this flag will save intermediate files created by pVACfuse.
Expand All @@ -312,7 +318,7 @@ Given the considerations outlined above, let's run pVACfuse on our sample data.

As with pVACseq, we can use the `optitype_normal_result.tsv` file to identify the patient's
class I HLA alleles. These are HLA-A\*29:02, HLA-B\*45:01, HLA-B\*82:02, and HLA-C\*06:02.
We also have clinical typing information that confirms these class I alleles as well as
We also have clinical typing information that confirms these class I alleles as well as
identified DQA1\*03:03, DQB1\*03:02, and DRB1\*04:05 as the patient's class II alleles.

For pVACfuse the sample name is not used for any parsing so it doesn't need to
Expand Down Expand Up @@ -398,17 +404,14 @@ usually apply. Here is a list of parameters we generally recommend:
neoantigen candidate in the reference proteome and report any hits found.
By default this is done using BLASTp, but we recommend using a proteome FASTA
file via the `--peptide-fasta` parameter to speed up this step.
- `--percentile-threshold`: When considering the peptide-MHC binding affinity
for filtering and prioritizing neoantigen candidates, by default only the
IC50 value is being used. Setting this parameter will additionally filter
on the predicted percentile. We recommend a value of 2 (2%) for this
threshold.
- `--percentile-threshold-strategy`: When running pVACsplice with a
`--percentile-threshold` set, this parameter will influence how both the
IC50 cutoff and the percentile cutoff are applied. The default,
`conservative`, will require a candidate to pass both the binding and the
percentile threshold, while the `exploratory` option will require a candidate
to only pass either the binding or the percentile threshold.
- `--use-normalized-percentiles`: Not all prediction algorithms supported by
pVACsplice output a percentile rank. This option will calculate normalized percentiles
for class I epitopes of length 8-11 and all class I algorithms and the 1,000
most common human class I MHC alleles based on the same set of 100,000 reference
peptides. These percentiles will be used in place of percentiles natively
calculated by some algorithms. This ensures that all class I algorithms will
return a percentile score since some do not do so natively. It also ensures
that the percentiles are calculated consistently between all algorithms.

Additionally there are a number of parameters that might be useful depending
on your specific analysis needs:
Expand All @@ -422,6 +425,12 @@ on your specific analysis needs:
unstable. This parameter allows users to set their own rules as to which
peptides are considered problematic and peptides meeting those rules will be marked in the
pVACsplice results and deprioritized.
- `--percentile-threshold-strategy`: By default, pVACsplice will
filter and prioritize neoantigen candidates on the binding, presentation,
and immunogenicity percentiles in addition to the raw IC50 binding affinity.
A candidate will need to pass all thresholds. However, setting this parameter
to `exploratory` will relax this behavior and only require a candidate to
pass one of the thresholds.
- `--transcript-prioritization-strategy` and
`--maximum-transcript-support-level`: Generally, multiple transcripts of a
gene may code for a neoantigen candidate. When picking the best transcript
Expand Down
Loading
Loading