Results from WGS

After running the WGS analysis pipeline, a set of structured output folders is generated. These outputs include raw data processing, variant calling, annotation, and summary reports, all essential for downstream interpretation and reproducibility.

This section outlines the purpose of each output folder and what you should expect to find inside. Below is the file structure of the results folder -

results/
├── annotation/
├── csv/
├── multiqc/
├── pipeline_info/
├── preprocessing/
├── reference/
├── reports/
└── variant_calling/

The table below describes the contents of each folder -

Folder with outputs

Folder

Contents

Description

preprocessing/

FASTQ, BAM, trimming reports

Contains cleaned, aligned, and sorted reads. May include outputs from tools like BWA, SAMtools, GATK MarkDuplicates (e.g., duplicate marking).

variant_calling/

.vcf, .bcf, .g.vcf

Raw and filtered variant call files from GATK or other callers. May include SNPs, INDELs, and structural variants.

annotation/

.vcf, .txt, .csv

Functionally annotated variants (e.g., via VEP, SnpEff), including gene impacts, allele frequencies, and clinical significance.

csv/

Summarized tables

Processed or summarized tabular results, often for downstream stats, visualization, or integration.

multiqc/

multiqc_report.html

Unified quality control report covering FastQC, alignment stats, duplicate rates, coverage, etc.

pipeline_info/

.yaml, .txt, .log

Workflow configuration, software versions, and run metadata for reproducibility.

reference/

Reference genome, index files

FASTA and prebuilt index files (e.g., BWA, GATK, VEP), used during alignment and variant calling.

reports/

PDF/HTML/Markdown summaries

High-level summaries including coverage reports, alignment metrics, variant stats, or project-level documentation.


Key output files

Checklist

File type

Description

*.bam, *.bai

Aligned reads and index files (for IGV or variant callers)

*.vcf, *.bcf

Variant calls, raw and filtered

annotated.vcf

Variants with gene annotations and clinical tags

coverage.txt, stats.csv

Coverage summaries per chromosome or sample

multiqc_report.html

Visual dashboard for QC metrics


Output Validation Checklist

Checklist

Step

Validation

Read quality (FastQC)

Mean Q > 30, even distribution

Alignment (BAM files)

>95% mapped reads, proper pairing

Duplicate removal

Duplicates marked or removed

Variant calling success

Reasonable number of SNPs/INDELs

Variant annotation

Annotated VCFs with gene and effect info

QC overview

All tools successfully summarized in MultiQC

To explore more in-depth explanations for interpreting your FASTQC and multiQC reports, you can use this video link for multiQC and this manual for FASTQC.