Projects
Completed
- Lung cancer classification from CT scan images using deep convolutional neural networks - 2017 Data Science Bowl Kaggle Competition
Project report- Implemented U-Net with Python and Tensorflow according to literature for nodule segmentation
- Developed deep convolutional neural networks for lung cancer classification
- Applied batch normalization, drop-out layers, and tuned parameters to improve prediction accuracy
- Augmented training data by random cropping and merging the training set with images from public databases of CT scans
- Achieved performance comparable to top-50 out of 394 participating teams
- A multivariate Gaussian Network for detection of multiple perturbations in gene regulartory networks
Published journal article- Completed mathematical proofs of properties for a multivariate Gaussian Network model for detection of perturbations in gene regulatory networks
- Created simulations to evaluate the model in networks with a hub gene
- Reference-batch ComBat: a linear model with empirical Bayes shrinkage for batch correction
Published journal article- Built an R software based on Empirical Bayes hierarchical linear regression for batch effect adjustment
- Dataset heterogeneity in the validation of prediction models across studies
Published journal article- Evaluated prediction models across studies on bootstrap simulated independent genomic datasets
- Used Cox model and mixed-effect models to predict breast cancer patient survival
- Developed simulatorZ Bioconductor package
- Predicting Circadian Gene Expression in Neurospora crassa - BU Bioinformatics Challenge Project
- Manipulated and exported RNA-Seq datasets in SQL databases
- Designed a pipeline in MATLAB collaboratively, using discrete difference approximation and EM optimization, to predict regulatory effect in the circadian clock gene network
Ongoing
- ComBat-Seq: batch correction algorithm for RNA-Seq count data
Project repo- Developed a negative binomial generalized linear model (GLM) for batch correction in count data
- Designed and implemented simulations to evaluate the performance of ComBat-Seq, compared it with other available methods
- Submitted as a contributed paper to JSM 2019
- Evaluating batch effect from a multi-study perspective
Project repo- Implemented ensemble methods to integrate multiple statistical learners, including lasso logistic regression, random forest, neural networks and SVM, for prediction on tuberculosis progression