All reports are stored privately. Please email email@example.com for a copy of the work or ask to become a collaborator on Dan’s repo here: Master of Data Science – Papers & Reports
Deep Learning vs. Linear Classifiers on the
This report explores the use of three image classification models on Zalando Research’s
Fashion-MNIST dataset, a benchmarking dataset for machine learning models. A Random
Forest Classifier, Support Vector Machine and a Convolutional Neural Network were built, compared and their results discussed. Of the models constructed by the team, the CNN models were the most accurate with a 2-hidden layer and 3-hidden layer CNN achieving 90.05% and 91.17% accuracy on the test set respectively. In addition to comparing accuracy of there models, the team also set three addition goals: 1) comparing model performance against literature, 2) run-time under 10 minutes and 3) exploring the effect of graphical processing units (GPU) in comparison to central processing units (CPU) on training time. The team were successful in achieving all their goals and outperformed some of the top performing deep-learning classifiers in literature with their CNN models.
Building a Multi-Class Logistic Regression Classifier from Scratch:
This report on image classification explores the use of a Logistic Regression One-Versus-All image classifier to classifier a training and test set of 30,000 and 5,000 clothing
images respectively. The classification algorithm was compared against
various benchmarks including algorithms from sklearn libraries and the performance of
algorithms on a similar dataset: Zalando Research’s Fashion-MNIST image dataset. The scratch coded logistic regression model was built without the use of external libraries and showed its ability to outperform sklearn models and accomplish an accuracy within 5% of the top performing non-deep learning models on the Fashion-MNIST data set.
Predicting the onset of Diabetes Mellitus using supervised machine learning classification:
Diabetes mellitus (DM) is a chronic disease that affects the metabolic system of the body and results from the pancreas’ inability to produce insulin or for the cells in the body to not respond to insulin produced (World Health Organisation, 2016). Early diagnosis of DM is one of the key considerations for medical professionals, nations & societies around the world to better cope with later complications and problems associated with the disease. The research problem that this study centers around is therefore: “The prolific upward trend of diabetes mellitus in the US, and how supervised machine learning classification can be used to aid prediction of future occurrences with the PIMA Indians dataset”
Below are some images from the report to show some of the findings of my research on the PIMA Indians dataset.
Correlation matrix of outcome of Diabetes with varying lifestyle variables:
Confusion matrix of the most performant Support Vector Classification model:
Visual Analytics – Individual Assignment – 4 Data Analytics visualisations:
Panama Papers – Investigating the scale of offshore funds/entities from countries around the world:
Top 150 composers on Wikipedia:
Using a force-directed radial to show the top 150 most influential composers from Wikipedia data based on the concept of centrality. The colours shown represent the modularity of composers, where some composers are seen to be in a community of other composers where others are not (i.e. Stravinsky and Bach are seen to be in separate communities)
Graph class planar graph:
Visual Analytics – VAST Mini Challenge 2 – Using Visual Analytics to analyse chemical release of factories
In this project I worked as part of a group of which I led, co-authored the initial and final report as well as co-authored the presentation that was later presented to our uni cohort.
As a group we received 95% for the initial report, 100% for the presentation (of which Cameron Wasilewsky & I presented) and 90% for the final report.
All of the work shown below is part of a group effort.
In March 2017, the Institute of Electrical and Electronics Engineers (‘IEEE’) for Visual Analytics Science and Technology (‘VAST’) announced an annual competition for the Visual Analytics community to design interactive systems to help solve conceptual environmental problems.
A fictional environmental problem, called ‘Mini Challenge 2’ (‘MC2’), was the focus of this project, code named ‘Gaia’.
The project requires the design and development of a Visual Analytics (VA) system. The system is designed to enable a new user to visually manipulate data and gain the insights needed to solve the problem questions.
Example visualisations developed for the project:
Some example Tableau dashboards from Gaia developed by Cameron Wasilewsky
GAIA’s fully interactive D3 site, developed by Kris Lopez
GAIA Animation Sample Video developed in Unity:
This simulation environment was developed from scratch by Matt Burgess. It shows the emissions of factories, the strength of the readings and the way that the chemicals advect to the wind over time: