Completed

  • Lung cancer classification from CT scan images using deep convolutional neural networks - 2017 Data Science Bowl Kaggle Competition
    Project report
    • Implemented U-Net with Python and Tensorflow according to literature for nodule segmentation
    • Developed deep convolutional neural networks for lung cancer classification
    • Applied batch normalization, drop-out layers, and tuned parameters to improve prediction accuracy
    • Augmented training data by random cropping and merging the training set with images from public databases of CT scans
    • Achieved performance comparable to top-50 out of 394 participating teams
  • A multivariate Gaussian Network for detection of multiple perturbations in gene regulartory networks
    Published journal article
    • Completed mathematical proofs of properties for a multivariate Gaussian Network model for detection of perturbations in gene regulatory networks
    • Created simulations to evaluate the model in networks with a hub gene
  • Reference-batch ComBat: a linear model with empirical Bayes shrinkage for batch correction
    Published journal article
    • Built an R software based on Empirical Bayes hierarchical linear regression for batch effect adjustment
  • Dataset heterogeneity in the validation of prediction models across studies
    Published journal article
    • Evaluated prediction models across studies on bootstrap simulated independent genomic datasets
    • Used Cox model and mixed-effect models to predict breast cancer patient survival
    • Developed simulatorZ Bioconductor package
  • Predicting Circadian Gene Expression in Neurospora crassa - BU Bioinformatics Challenge Project
    • Manipulated and exported RNA-Seq datasets in SQL databases
    • Designed a pipeline in MATLAB collaboratively, using discrete difference approximation and EM optimization, to predict regulatory effect in the circadian clock gene network

Ongoing

  • ComBat-Seq: batch correction algorithm for RNA-Seq count data
    Project repo
    • Developed a negative binomial generalized linear model (GLM) for batch correction in count data
    • Designed and implemented simulations to evaluate the performance of ComBat-Seq, compared it with other available methods
    • Submitted as a contributed paper to JSM 2019
  • Evaluating batch effect from a multi-study perspective
    Project repo
    • Implemented ensemble methods to integrate multiple statistical learners, including lasso logistic regression, random forest, neural networks and SVM, for prediction on tuberculosis progression