Machine Learning

Machine Learning Projects

Below is a compilation of the most recent Machine Learning projects I have worked on. The project topics are varied in scope and include topics such as healthcare analytics, risk factor analysis, artificial intelligence, deep learning and reinforcement learning. Following each project I added a link to the corresponding project report or paper.

Discovery of Disease Risk Factors on Individuals Assessed by the National Health and Nutrition Examination Surveys 2013-2014

In this project, we train a gradient boosting classifier on publicly available data to identify risk predictors for diabetes and cardiovascular disease for individuals residing in the United States. Our results suggest that researchers and clinicians could make use of machine learning analyses to obtain valuable health assessment of patients at a much reduced cost and time.

Predicting House Prices: A Supervised Learning Approach

In this project, I developed a gridsearch-optimized decision tree regressor that is trained and tested on data collected from homes in the suburbs of Boston. We use the feature importances of the model to predict the price of homes with varying characteristics. The model has practical application in the real estate industry of major urban areas, in particular for an agent who could make use of the obtained information on a daily basis.

Convolutional Neural Networks for Dog Breed Identification

In this project I take a series of steps towards developing an image classifier algorithm for dog breed identification. To this end, we design and train a convolutional neural network (CNN) classifier using Keras with Tensorflow backend and GPU support. In addition, we use transfer learning to benchmark different CNN architectures with different depths to obtain an image classifier with 83% accuracy. The classifier is capable of discriminating dogs from humans and other related animals.

Income Prediction via Multimodel Machine Learning

In this project, I employ several ML algorithms (Gradient Boosting, Logistic Regression, K-Nearest Neighbors, Support Vector Machines, etc.) to accurately predict the income of a person using Census data. To evaluate the performance of each model, we create a training and predicting pipeline that allows to quickly and effectively train models using various sizes of training data and perform predictions on the testing data. This sort of task can arise in a non-profit setting, where organizations survive on donations.

Training an Autonomous Vehicle using Reinforcement Learning

I implement an optimized Q-Learning model to train an agent to drive a Smartcab through a city grid environment, starting and ending at any arbitrary locations within the grid. The Smartcab agent simulates transporting passengers from one location to another, evaluating two very important metrics: Safety and Reliability.

Customer Segmentation Analysis via Unsupervised Learning

I analyze monetary data on customer spending on diverse product categories. One goal of this project is to describe the variation in the different types of customers that a wholesale distributor interacts with. Our results equip the distributor with insight into how to best structure their delivery service to meet the needs of each customer. To perform the segmentation procedure, we apply a machine learning pipeline involving principal components analysis (PCA), K-means and Gaussian mixture models. The cluster analysis reveals clusters that contain data points that can be classified as purely 'Retailers', 'Hotels', and Restaurants/Cafes', with some degree of overlap between the 'Hote'l and 'Restaurant' clusters.