Projects
Christine’s academic journey lies between genomics data and data science, specifically on the computational method development to solve the concrete issues on genomics data and software engineering. She also has many experiences conducting genomics data analysis.
Computational Omics
Identification of biased features in SRT and snRNA-seq with batch effect
August 2024 - Current, Baltimore, MD
We developed computational tool of BiasDetect within Bioconductor framework to identify biased genes in spatially-resolved transcriptomics (SRT) and single-nucleus RNA sequencing (snRNA-seq) data with batch variable to increase the accuracy for downstream clustering process. Specifically, we examined and selected binomial deviance model from scry to calculate the per-gene ranks and deviance difference with and without batch variable. Then, we derived data-driven threshold approach based on the number of standard deviation (nSD) in relative change of deviance and in rank difference to identify the biased features. Finally, we validated the clustering performances of computational method by PRECAST.
- R/biasDetect | Identify the biased features from spatially variable genes in spatially-resolved transcriptomics data and from highly variable genes in single-nucleus RNA sequencing data with batch variable.
Software engineer
Spatial Human Hippocampus
September - October 2024, Baltimore, MD
I developed an ExperimentHub package named humanHippocampus2024 within Bioconductor framework to store the generated spatially-resolved transcriptomics data in SpatialExperiment class and single-nucleus RNA sequencing data in SingleCellExperiment class from the 10x Genomics Visium Spatial (U01) Human Hippocampus (HPC) project by The Lieber Institute for Brain Development (LIBD) researchers and scientists.
- R/humanHippocampus2024 | ExperimentHub package that contains a spatially-resolved transcriptomics dataset in SpatialExperiment class and a single-nucleus RNA sequencing dataset in SingleCellExperiment class from Nelson et al (under-review by Bioconductor).
- Nelson ED, Tippani M, Ramnauth AD, Divecha HR, Miller RA, Eagles NJ, Pattie EA, Kwon SH, Bach SV, Kaipa UM, Yao J, Hou C, Kleinman JE, Collado-Torres L, Han S, Maynard KR, Hyde TM, Martinowich K, Page SC, Hicks SC. (2024). An integrated single-nucleus and spatial transcriptomics atlas reveals the molecular landscape of the human hippocampus. bioRxiv.
R Client for the Human BioMolecular Atlas Program (HuBMAP) Data Portal
March – September 2024, Baltimore, MD
The Human BioMolecular Atlas Program (HuBMAP) is a comprehensive, open-sourced, global atlas of the human body at a cellar resolution. There are plentiful datasets generated by more than 20 assays from various donors and samples. Despite these data resources being available on HuBMAP Data Portal, there currently lacks programmatic interface in R to access, explore, retrieve, and download these data. Thus, I developed and published efficient and programmatic R client named HuBMAPR with Bioconductor framework. As an R/Bioconductor package, HuBMAPR successfully helps worldwide researchers to conduct data retrieval and bulk data transfer from HuBMAP Data Portal easier and faster. On December 2nd, I was invited to present HuBMAPR 10-minute package demo oral presentation.
- Hou C, Ghazanfar S, Marini F, Morgan M, Hicks SC. (2024). HuBMAPR: an R Client for the HuBMAP Data Portal. bioRxiv.
- R/HuBMAPR | | R/Bioconductor package that serves as an R Client for the HuBMAP Data Portal. (Hou et al., 2024. bioRxiv)
Data analysis
Long-read RNA sequencing data
October 2024 - Current, Baltimore, MD
As the methods of the year in 2022 by Nature (https://www.nature.com/articles/s41592-022-01759-x), long-read RNA sequencing (LS RNA-seq) opens tremendous research opportunities and highlights the new challenges in genomics. Our current long-term goal is to compose an organized and systematic LS RNA-seq data analysis e-book providing instructions and building workflows based on various LS RNA-seq datasets, making contributions for the whole genomics community.
Exploring the use of Artificial Intelligence to Assess the Exposome
October 2023 – August 2024, Baltimore, MD
As part-time data analyst, I helped to conduct the raw sequencing data quality control and batch effect removal. I applied Weighted Gene Co-expression Network Analysis (WGCNA) and differential expression analysis on 100 samples from the Baltimore community to explore the exposure factor’s association with health outcomes.
Single Cell RNA Sequencing Analysis
Feb. – May. 2023, Madison, WI
I conducted quality control assessments using both cell count and gene level filters to ensure data integrity. I applied normalization techniques using SCTransform following cell cycle score verification, followed by employing PCA and UMAP for dimensional reduction computations. I employed canonical correlation analysis (CCA) to process normalized samples and analyzed clusters and cell types using the SingleR.