Christine Hou
  • Home
  • Projects
  • Packages
  • Publications
  • Other Experiences

On this page

  • Computational Omics
    • Identification of biased features in SRT and snRNA-seq with batch effect
  • Software engineer
    • Spatial Human Hippocampus
    • R Client for the Human BioMolecular Atlas Program (HuBMAP) Data Portal
  • Data analysis
    • Long-read RNA sequencing data
    • Exploring the use of Artificial Intelligence to Assess the Exposome
    • Single Cell RNA Sequencing Analysis
    • Analyzing Farm and Illness-related Modules in Child Nasal Gene Expression

Projects

Christine’s academic journey lies between genomics data and data science, specifically on the computational method development to solve the concrete issues on genomics data and software engineering. She also has many experiences conducting genomics data analysis.

Computational Omics

Identification of biased features in SRT and snRNA-seq with batch effect

August 2024 - Current, Baltimore, MD

We developed computational tool of BiasDetect within Bioconductor framework to identify biased genes in spatially-resolved transcriptomics (SRT) and single-nucleus RNA sequencing (snRNA-seq) data with batch variable to increase the accuracy for downstream clustering process. Specifically, we examined and selected binomial deviance model from scry to calculate the per-gene ranks and deviance difference with and without batch variable. Then, we derived data-driven threshold approach based on the number of standard deviation (nSD) in relative change of deviance and in rank difference to identify the biased features. Finally, we validated the clustering performances of computational method by PRECAST.

  • R/biasDetect | Identify the biased features from spatially variable genes in spatially-resolved transcriptomics data and from highly variable genes in single-nucleus RNA sequencing data with batch variable.

Software engineer

Spatial Human Hippocampus

September - October 2024, Baltimore, MD

I developed an ExperimentHub package named humanHippocampus2024 within Bioconductor framework to store the generated spatially-resolved transcriptomics data in SpatialExperiment class and single-nucleus RNA sequencing data in SingleCellExperiment class from the 10x Genomics Visium Spatial (U01) Human Hippocampus (HPC) project by The Lieber Institute for Brain Development (LIBD) researchers and scientists.

  • R/humanHippocampus2024 | ExperimentHub package that contains a spatially-resolved transcriptomics dataset in SpatialExperiment class and a single-nucleus RNA sequencing dataset in SingleCellExperiment class from Nelson et al (under-review by Bioconductor).
  • Nelson ED, Tippani M, Ramnauth AD, Divecha HR, Miller RA, Eagles NJ, Pattie EA, Kwon SH, Bach SV, Kaipa UM, Yao J, Hou C, Kleinman JE, Collado-Torres L, Han S, Maynard KR, Hyde TM, Martinowich K, Page SC, Hicks SC. (2024). An integrated single-nucleus and spatial transcriptomics atlas reveals the molecular landscape of the human hippocampus. bioRxiv.

R Client for the Human BioMolecular Atlas Program (HuBMAP) Data Portal

March – September 2024, Baltimore, MD

The Human BioMolecular Atlas Program (HuBMAP) is a comprehensive, open-sourced, global atlas of the human body at a cellar resolution. There are plentiful datasets generated by more than 20 assays from various donors and samples. Despite these data resources being available on HuBMAP Data Portal, there currently lacks programmatic interface in R to access, explore, retrieve, and download these data. Thus, I developed and published efficient and programmatic R client named HuBMAPR with Bioconductor framework. As an R/Bioconductor package, HuBMAPR successfully helps worldwide researchers to conduct data retrieval and bulk data transfer from HuBMAP Data Portal easier and faster. On December 2nd, I was invited to present HuBMAPR 10-minute package demo oral presentation.

  • Hou C, Ghazanfar S, Marini F, Morgan M, Hicks SC. (2024). HuBMAPR: an R Client for the HuBMAP Data Portal. bioRxiv.
  • R/HuBMAPR | | R/Bioconductor package that serves as an R Client for the HuBMAP Data Portal. (Hou et al., 2024. bioRxiv)

Data analysis

Long-read RNA sequencing data

October 2024 - Current, Baltimore, MD

As the methods of the year in 2022 by Nature (https://www.nature.com/articles/s41592-022-01759-x), long-read RNA sequencing (LS RNA-seq) opens tremendous research opportunities and highlights the new challenges in genomics. Our current long-term goal is to compose an organized and systematic LS RNA-seq data analysis e-book providing instructions and building workflows based on various LS RNA-seq datasets, making contributions for the whole genomics community.

Exploring the use of Artificial Intelligence to Assess the Exposome

October 2023 – August 2024, Baltimore, MD

As part-time data analyst, I helped to conduct the raw sequencing data quality control and batch effect removal. I applied Weighted Gene Co-expression Network Analysis (WGCNA) and differential expression analysis on 100 samples from the Baltimore community to explore the exposure factor’s association with health outcomes.

Single Cell RNA Sequencing Analysis

Feb. – May. 2023, Madison, WI

I conducted quality control assessments using both cell count and gene level filters to ensure data integrity. I applied normalization techniques using SCTransform following cell cycle score verification, followed by employing PCA and UMAP for dimensional reduction computations. I employed canonical correlation analysis (CCA) to process normalized samples and analyzed clusters and cell types using the SingleR.

Analyzing Farm and Illness-related Modules in Child Nasal Gene Expression

August – December 2022, Madison, WI

Farm exposures in early life reduce the risks for childhood allergic diseases and asthma. There is less information about how farm exposures relate to respiratory illnesses and mucosal immune development. As undergraduate research assistant, I helped to remove samples that failed quality control checks or did not have corresponding respiratory illness data of the 100 samples submitted for sequencing (22 farm and 42 non-farm samples were included in the analyses of differential gene expression). I performed differential expression analysis using DESeq2 and applied gene set enrichment analysis to compare the gene expression trends for farm status and high repository illness count. I also utilized Weighted Gene Coexpression Network Analysis (WGCNA) results to identify co-expressed gene modules in the transcriptomics data and quantified the eigen-genes for each module and explored differential expression to determine whether any modules were associated with farm status or respiratory illness frequency.

  • Brownell J, Lee KE, Chasman D, Gangnon R, Bendixsen CG, Barnes K, Grindle K, Pappas T, Bochkov YA, Dresen A, Hou C, Haslam DB, Seroogy CM, Ong IM, Gern JE. Farm animal exposure, respiratory illnesses, and nasal cell gene expression. (2024). J Allergy Clin Immunol. Jun; 153(6):1647-1654. PMID: 38309597; PMCID: PMC11162314.