PROJECTS

Below are a few selected projects done by me.

PROCLIVITY PROPAGATION

Developed predictor (Proclivity Propagation) to impute missing attribute values and detect outliers in friendship networks.
Proclivity Propagation is capable of capturing homophily, heterophily, self and cross proclivities in attributed networks.
This novel predictor is unique in the sense that it gives confidence intervals on accuracies that it predicts.
Used a state-of-the-art network correlation matrix called PROclivity index for attributed Networks (PRONE) to find attributes relevant for prediction of missing attribute values.
Tests of Proclivity Propagation on Facebook100 dataset give prediction accuracies as high as 85%, 78%, 75% and 69% for attributes like year, dormitory, status and gender.
This new method gives better prediction accuracies than standard machine learning and statistical techniques like Support Vector Machine classifiers and Low Rank Matrix Completion.

PREDICTION OF CELLS USING GENETIC DATA

Used genes, protein and DNA data from 33 different labs and research publications with 17000 cell expression samples.
Designed multi-stage machine learning pipeline based on principal component analysis to accurately predict cell types.
Reduced the number of features from ~80,000 to ~350 through the use of multi-stage dimensionality reduction techniques.
Removed outliers in data using interquartile range-based selection. Scaled features using standardization.
Achieved 94% accuracy in determining cell types.
Designed similarity metric to determine correspondence between single cell RNA expression profiles.

Designed software for weak lensing measurements from COSMOS2015 dataset and other cosmological surveys. The codes in this software repository use a combination of C and Python codes, and Cython as a bridge to generate convergence and lensing maps from the COSMOS2015 galaxy catalog. These codes are highly optimized, and analyze half a million galaxies within a few (~1) minutes.

FACIAL ACTION UNIT DETECTION

Automatic recognition of facial expressions and emotions using 46 unique action units around eyes and lips.
Built pipeline to implement Hierarchical Support Vector Machine classifier to classify and detect facial expressions in videos of human faces.
Achieved accuracies of 95%, 95% and 75% during detection of facial expressions like open mouth, neutral face and smile ( YouTube video ).

IMMIGRATION DATA TREND

In this project, I created a data engineering pipeline to analyze past years data trends of H1B(H-1B, H-1B1, E-3) visa applications. The data used in this project are available from the US Department of Labor and its Office of Foreign Labor Certification Performance Data. This data engineering pipeline returns output files with two important metrics: TOP 10 OCCUPATIONS and TOP 10 STATES for certified visa applicants.