Predicting the onset of Diabetes Mellitus using supervised machine learning classification:

Executive Summary

Diabetes mellitus (DM) is a chronic disease that affects the metabolic system of the body and results from the pancreas’ inability to produce insulin or for the cells in the body to not respond to insulin produced (World Health Organisation, 2016). Early diagnosis of DM is one of the key considerations for medical professionals, nations & societies around the world to better cope with later complications and problems associated with the disease. The research problem that this study centers around is therefore: “The prolific upward trend of diabetes mellitus in the US, and how supervised machine learning classification can be used to aid prediction of future occurrences with the PIMA Indians dataset”

Below are some images from the report to show some of the findings of my research on the PIMA Indians dataset.

Correlation matrix of outcome of Diabetes with varying lifestyle variables:


Confusion matrix of most performant Support Vector Classification model:



Visual Analytics – Individual Assignment – 4 Data Analytics visualisations:

Panama Papers – Investigating the scale of offshore funds/entities from countries around the world:PanamaPapers_refined_imageforwordpress.PNG

Top 150 composers on Wikipedia:

Using a force-directed radial to show the top 150 most influential composers from Wikipedia data based on the concept of centrality. The colours shown represent the modularity of composers, where some composers are seen to be in a community of other composers where others are not (i.e. Stravinsky and Bach are seen to be in separate communities)


Graph class planar graph:


Visual Analytics – VAST Mini Challenge 2 – Using Visual Analytics to analyse chemical release of factories

In this project I worked as part of a group of which I led, co-authored the initial and final report as well as co-authored the presentation that was later presented to our uni cohort.

As a group we received 95% for the initial report, 100% for the presentation (of which Cameron Wasilewsky & I presented) and 90% for the final report.

All of the work shown below is part of a group effort.

Problem definition

In March 2017, the Institute of Electrical and Electronics Engineers (‘IEEE’) for Visual Analytics Science and Technology (‘VAST’) announced an annual competition for the Visual Analytics community to design interactive systems to help solve conceptual environmental problems.
A fictional environmental problem, called ‘Mini Challenge 2’ (‘MC2’), was the focus of this project, code named ‘Gaia’.
The project requires the design and development of a Visual Analytics (VA) system. The system is designed to enable a new user to visually manipulate data and gain the insights needed to solve the problem questions.

Example visualisations developed for the project:

Some example Tableau dashboards from Gaia developed by Cameron Wasilewsky 


GAIA’s fully interactive D3 site, developed by Kris Lopez



GAIA Animation Sample Video developed in Unity:

This simulation environment was developed from scratch by Matt Burgess. It shows the emissions of factories, the strength of the readings and the way that the chemicals advect to the wind over time:

Social Good:

Interactive dashboard for ReachOut