Hi, I’m Shraddha Piparia — Computational Biologist & ML Scientist 👋
I build machine learning systems and scalable pipelines to understand disease biology from large-scale genomics, proteomics, and single-cell data. My work focuses on disease stratification, treatment-response heterogeneity, and reproducible analysis of high-dimensional biomedical data.
Ph.D. in Computer Science | Computational Biology + ML | Genomics, Proteomics & Single-Cell Multi-omics
Machine Learning: PyTorch · Representation Learning · VAEs · Transformers · SHAP · Clinical NLP
Computational Biology: GWAS · Disease heterogeneity · Single-cell · Multi-omics · UK Biobank · Olink
Engineering: Nextflow · Docker · Spark · SLURM · GitHub Actions
Genotype Representation Learning (Ongoing)
Discovered latent asthma-related genomic structure without phenotype labels using an LD-aware VAE + transformer framework.
- Technical Highlight: Built VAE-derived LD block embeddings, transformer-based cross-block contextualization, and leave-one-block-out perturbation attribution to identify influential genomic regions.
- Impact: Linked unsupervised latent structure to interpretable asthma biology, including HLA class II and PDE4D signals.
proteomics_npx_analysis (Ongoing)
Developed scalable pipelines for disease subtype discovery using Olink NPX proteomics.
- Technical Highlight: Built Spark + SQL workflows to process population-scale data (~50K UK Biobank participants).
- Impact: Defined neurocognitive vs. non-neurocognitive Long COVID subtypes using WHO-aligned symptom profiles.
Reproducible single-cell multiome workflow integrating scRNA-seq and scATAC-seq to study cell-type-specific regulatory activity.
- Technical Highlight: Built an RNA + ATAC preprocessing and integration workflow using RNA QC, ATAC QC, dimensionality reduction, WNN-style multiome integration, and regulatory activity analysis.
- Impact: Demonstrates end-to-end handling of multimodal single-cell data, from quality control to integrated cell-state and regulatory interpretation.
Interpretable ML to predict pediatric COVID-19 status from radiology text.
- Technical Highlight: Extracted structured features from 2,500+ CXR impressions using radiology-specific NLP and SHAP values.
- Impact: Surfaced clinically meaningful patterns from text alone with full model transparency.
- Publication: Piparia S, Defante A, Tantisira K, Ryu J. Using machine learning to improve our understanding of COVID-19 infection in children. PLOS ONE 18(2): e0281666 (2023). DOI
A reproducible GWAS workflow to identify variants associated with clinically defined asthma endotypes. Published in Respiratory Research (2025).
- Technical Highlight: Nextflow + R pipeline designed for HPC/SLURM execution with ANOVA-style modeling.
- Impact: Enabled detection of subtype-specific genetic signals beyond traditional case-control designs.
- Publication: Piparia S, Kho A, Desai B, Wong R, Sharma R, Celedon JC, Weiss ST, Mcgeachie M, Tantisira K. A principal component analysis-based endophenotype definition for change in lung function and inhaled corticosteroid treatment response in childhood asthma. Respiratory Research: 26:351 (2025). DOI
- Publication: Piparia S, Hadikhani P, Ziniti J, Hecker J, Kho A, Sharma R, Celedon JC, Weiss ST, Mcgeachie M, Tantisira K. A Categorical ANCOVA Approach to Severity Endophenotype-Specific GWAS Childhood Asthma. Journal of Personalized Medicine 16.1 (2026). DOI
Analysis of miRNA signatures associated with treatment response heterogeneity.
- Technical Highlight: Integrated multi-omic datasets to identify miR-584-5p as a key modulator of corticosteroid resistance.
- Impact: Identified miR-584-5p as a candidate molecular marker linked to variability in corticosteroid response.
- Publication: Piparia S, Kho A, Hadikhani P, Ban G-Y, Sharma R, Celedon JC, Weiss ST, Mcgeachie M, Tantisira K. MicroRNA-584-5p As a Key Modulator of Inhaled Corticosteroid Resistance in Asthma. AJRCCM (2025). DOI
Production-style genomics template from raw sequencing data to interpretable genetic analysis.
- Technical: Modular workflow covering FASTQ QC, alignment, variant calling, GWAS, PRS, and eQTL interpretation.
- Impact: Provides a portable, reproducible blueprint for moving from raw sequencing files to genetic insight.
I enjoy building methods that move from raw data to interpretable biological insight through versioned environments, automated workflows, and reusable code.

