Skip to content
View shraddhapiparia's full-sized avatar
💭
All problems in Computer Science can be solved with another level of indirection
💭
All problems in Computer Science can be solved with another level of indirection

Block or report shraddhapiparia

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shraddhapiparia/README.md

Hi, I’m Shraddha Piparia — Computational Biologist & ML Scientist 👋

I build machine learning systems and scalable pipelines to understand disease biology from large-scale genomics, proteomics, and single-cell data. My work focuses on disease stratification, treatment-response heterogeneity, and reproducible analysis of high-dimensional biomedical data.

Ph.D. in Computer Science | Computational Biology + ML | Genomics, Proteomics & Single-Cell Multi-omics

🌐 Personal site


Tech Stack

Machine Learning: PyTorch · Representation Learning · VAEs · Transformers · SHAP · Clinical NLP

Computational Biology: GWAS · Disease heterogeneity · Single-cell · Multi-omics · UK Biobank · Olink

Engineering: Nextflow · Docker · Spark · SLURM · GitHub Actions


Featured Projects

Discovered latent asthma-related genomic structure without phenotype labels using an LD-aware VAE + transformer framework.

  • Technical Highlight: Built VAE-derived LD block embeddings, transformer-based cross-block contextualization, and leave-one-block-out perturbation attribution to identify influential genomic regions.
  • Impact: Linked unsupervised latent structure to interpretable asthma biology, including HLA class II and PDE4D signals.

Developed scalable pipelines for disease subtype discovery using Olink NPX proteomics.

  • Technical Highlight: Built Spark + SQL workflows to process population-scale data (~50K UK Biobank participants).
  • Impact: Defined neurocognitive vs. non-neurocognitive Long COVID subtypes using WHO-aligned symptom profiles.

Reproducible single-cell multiome workflow integrating scRNA-seq and scATAC-seq to study cell-type-specific regulatory activity.

  • Technical Highlight: Built an RNA + ATAC preprocessing and integration workflow using RNA QC, ATAC QC, dimensionality reduction, WNN-style multiome integration, and regulatory activity analysis.
  • Impact: Demonstrates end-to-end handling of multimodal single-cell data, from quality control to integrated cell-state and regulatory interpretation.

Interpretable ML to predict pediatric COVID-19 status from radiology text.

  • Technical Highlight: Extracted structured features from 2,500+ CXR impressions using radiology-specific NLP and SHAP values.
  • Impact: Surfaced clinically meaningful patterns from text alone with full model transparency.
  • Publication: Piparia S, Defante A, Tantisira K, Ryu J. Using machine learning to improve our understanding of COVID-19 infection in children. PLOS ONE 18(2): e0281666 (2023). DOI

A reproducible GWAS workflow to identify variants associated with clinically defined asthma endotypes. Published in Respiratory Research (2025).

  • Technical Highlight: Nextflow + R pipeline designed for HPC/SLURM execution with ANOVA-style modeling.
  • Impact: Enabled detection of subtype-specific genetic signals beyond traditional case-control designs.
  • Publication: Piparia S, Kho A, Desai B, Wong R, Sharma R, Celedon JC, Weiss ST, Mcgeachie M, Tantisira K. A principal component analysis-based endophenotype definition for change in lung function and inhaled corticosteroid treatment response in childhood asthma. Respiratory Research: 26:351 (2025). DOI
  • Publication: Piparia S, Hadikhani P, Ziniti J, Hecker J, Kho A, Sharma R, Celedon JC, Weiss ST, Mcgeachie M, Tantisira K. A Categorical ANCOVA Approach to Severity Endophenotype-Specific GWAS Childhood Asthma. Journal of Personalized Medicine 16.1 (2026). DOI

Analysis of miRNA signatures associated with treatment response heterogeneity.

  • Technical Highlight: Integrated multi-omic datasets to identify miR-584-5p as a key modulator of corticosteroid resistance.
  • Impact: Identified miR-584-5p as a candidate molecular marker linked to variability in corticosteroid response.
  • Publication: Piparia S, Kho A, Hadikhani P, Ban G-Y, Sharma R, Celedon JC, Weiss ST, Mcgeachie M, Tantisira K. MicroRNA-584-5p As a Key Modulator of Inhaled Corticosteroid Resistance in Asthma. AJRCCM (2025). DOI

Additional Project

Production-style genomics template from raw sequencing data to interpretable genetic analysis.

  • Technical: Modular workflow covering FASTQ QC, alignment, variant calling, GWAS, PRS, and eQTL interpretation.
  • Impact: Provides a portable, reproducible blueprint for moving from raw sequencing files to genetic insight.

Research and Writing


Focus & Values

I enjoy building methods that move from raw data to interpretable biological insight through versioned environments, automated workflows, and reusable code.

Pinned Loading

  1. blockbased-genotype-embedding-analysis blockbased-genotype-embedding-analysis Public

    Two-phase deep learning framework for asthma genotype representation: per-block β-VAE embeddings followed by a cross-block Transformer with attention.

    Python

  2. sc-rna-atac-regulon-benchmark sc-rna-atac-regulon-benchmark Public

    Extends the RNA-only sc-cell-state-benchmark into paired RNA+ATAC analysis using PBMC multiome data. Includes RNA baseline import, multimodal integration (WNN), TF/regulon inference, and optional p…

    R

  3. ANOVA-like-GWAS-NextFlow ANOVA-like-GWAS-NextFlow Public

    Endotype-aware GWAS pipeline using ANCOVA to detect subtype-specific genetic variants. Discovery in CAMP and validation in GACRS childhood asthma cohorts.

    Python

  4. COVID-Radiology-Study COVID-Radiology-Study Public

    Pediatric CXR radiology impression ML (NLP features + Random Forest + SHAP)

    Python

  5. miRNA_ics_interaction miRNA_ics_interaction Public

    End-to-end pharmacogenomic workflow identifying miRNA modifiers of ICS response in childhood asthma.

    R

  6. proteomics_npx_analysis proteomics_npx_analysis Public

    Scalable Olink NPX proteomics workflow for identifying neurocognitive Long COVID signatures in pediatric and UK Biobank cohorts using logistic regression, PySpark, and protein interaction analysis.

    Python