Blog

Why sometimes its good to move on professionally, and what it can do for you (if the move is truly authentic)…

Hello, good afternoon, good morning and g’day! It’s been some time since I had the time or energy to write a blog between switching jobs, purchasing and moving into a new house, finishing another module in my Master of Data Science as well as all of the other things life can throw at you!

This blog post is another ‘non-Data Science’ post. But one that I hope you’ll find equally interesting and informative to read. I know that I am not the only professional to have had some doubts or questions about the direction I feel that my career was moving in at any one point in time, and I’m hoping that what I can share about my experiences will help others to make the right decision for the betterment of their respective career(s) now and in the future!

The decision to move on and leave a Big 4

PricewaterhouseCoopers_Logo

Mid way through last year I was in a fortunate enough position to be offered a promotion within PwC to Senior Manager/Associate Director within the Consulting line of service.

Whilst I had already being working closely with one of the teams in Consulting it was a fantastic opportunity for my career to progress further from a management perspective. I was also able to gain access to some very high profile – and high pressure – projects in the firm that would in turn offer a multitude of opportunities to develop a variety of skills, my professional profile as well as professional network. The people I had already worked with were simply some of the most interesting, talented and passionate people I had worked with throughout my time at PwC and I am forever thankful to the Health Analytics and Actuarial department in PwC (you know who you are if you’re reading this 😉 )

With all this being said I found over the course of a couple of months that after the shine of the promotion had rubbed off the work I was carrying out was not quite in line with what I wanted to actually be doing. Over the previous 2 – 3 years I had started to pivot my professional day job towards more advanced analytics and data science and I found that at as a Senior Manager I wasn’t in a position to carry the technical elements out quite as much. I imagine many of you reading this will think ‘Well, no s**t! You were senior manager not a technical gun consultant!’ this is a fair observation but not exactly the point of the article. I think the much more important piece I have learned over the years is that:

You and you are alone are best qualified to tell you what you will enjoy or gain the most fulfilment or satisfaction from doing, no one else can tell you this. If you don’t know this, you need to find it out as soon as you can. Listen to others but trust yourself first.

After realising that the work I was carrying out wasn’t quite technically focused for me  had a decision to make: Should I stay, lean in and change myself or should I take the decision to pivot out and away to something else? I chose the latter…

Entering no man’s land and joining a Post-IPO startup

It was with an extremely heavy heart that I handed in my resignation to the partner I reported into at PwC, however, I knew it was the right choice. This was even considering that:

  1. I had no job to move into when I handed my resignation in
  2. I had literally just purchased a house and had the first mortgage payment due imminently
  3. I was getting married the following year and had all the prep and financial savings to carry out!

Everyone I spoke to said it was a ‘brave’ or ‘ballsy’ decision, however, to me it was the only right decision to make.  I also think you before anyone else have to have confidence in yourself, your abilities and your potential. If you don’t you’ll crumble at the first sign of challenge or resistance from others.

So, post my resignation being handed in I started the merry dance of speaking with recruiters, friends, ex-colleagues, direct company contacts and if I have somethings to pass on it is that:

  1. Networking is unbelievably key and important, if you don’t already have a network in the city you work in you should! Probably about 50-60% of my interviews were through friends or previous colleagues/acquaintances.
  2. Some times HR or Recruiters are less than ‘optimal’ when it comes to communication. Don’t take it personally, some people are better than others.
  3. Applying for jobs is hard work, don’t think of it as anything less than that…

For the first time in my life, prior to accepting an offer for a new job I had no role to move into and no guarantee of a revenue stream coming in for the future. This frankly scared the bejeezus out of me for a week or two whilst I was starting the job application process up. This is something you should be aware of if you’re looking at leaving a job without any guarantee of a new role straight off…I certainly can’t and won’t speak for others but it affected me and what I saw as a ‘professional identity’. I felt quite lost and with no professional identity, which at first felt weird.

Domo.png

Fast forward a 3-4 weeks and after a number of conversations, enough coffee to drown a hippo, interviews and ‘chats’ with a variety of companies I was fortunate enough to be offered a role as a ‘Senior Business and Technical’ Consultant within the firm Domo. Again, I trusted my gut instinct and leant on some industrial experience before I gladly accepted the offer. As of writing this post I’ve been with Domo for around 3 months and whilst the company is hugely different to a Big 4 its an absolutely fantastic place to work where I’m able to flex technical and business muscles as required to help our clients in a Consulting capacity!

Making the right choice for you

Throughout my career I’ve always tried to make the right choice by myself first. This may be seen as selfish, but as I’ve always seen it you need to take care of yourself before you take care of others (including your family and loved ones!) and if you can’t do this right you don’t have the right foundation for everything else that life throws at you.

It’s often the toughest choice to make a change from a position of comfort in a professional sense. BUT, if you are feeling some twinges of doubt, uncertainty or your gut is making sounds that things aren’t quite right: LISTEN to it; it might be the best choice you make (as long as you’re genuine it’s you making that choice and not the persuasion or opinion of others)

 

 

 

Visualising and exploiting Network Structures with Python and Networkx

How would you visualise a complex social system or the evolution of the influenza virus? Possibly a histogram or a very large number of pie charts? Unlikely, with any great elegance or efficiency…

Network theory and the analysis of network data structures allow us to better understand measures of influence, popularity and centrality in a network powered by mature statistical approaches.

Sharing for the win

This morning I was fortunate enough to be invited to talk at Data Science Breakfast Sydney Meetup: Dan Bridgman on Visualising and Exploiting Network Structures with Python Geoff Pidcock

Always great to network and talk with others and I’ve attached my slide deck from this talk to this blog post. I think the session was also recorded so as soon that is available i’ll try and upload that here too!

Have a read and comment on what you think, always keen to hear what others think!

Visualising and Exploiting Network Structures with Python and Networkx

Applying Data Science to Social Good causes

Before I begin, I need to liberally link the two social good groups that are closest to my heart. I can only comment on the Sydney community of meetups, however, the two that stand out for me are:

  • The Minerva Collective – The Minerva Collective is a fantastic bunch of people from a variety of professions in Sydney that meet to discuss, brainstorm and tackle social problems at large by applying Data Science in Australia. Typically The Minerva Collective partners with organisations who will share data on the Minerva Collective – Data Republic platform to achieve a number of objectives. Some great work happening here.
  • Data for Democracy – Sydney – An amazing, talented and open minded group of data scientists, analysts, engineers & data nerds that all relate to: D4D – Origin Story. If you have an evening spare you should certainly go along!
  • Both Minerva & D4D – Sydney work because they have people like you willing to spend time and energy on the problems the community face. Give them a try!

Gratuitous plug over

For more years than I can count I’ve always enjoyed the satisfaction of helping others, solving problems and coming up with new ideas to try improve either a technical process or rethink how business processes work.

It’s only in the recent years that I’ve actually found like-minded individuals and groups in the Sydney scene for applying Data Science/Analytics to social good causes.

It’s not all totally selfless altruism and that’s OK

Now i’d be lying if I said I carried the work out for not for profits/NGOs or charities is purely through in blood altruism:

For me, I see it as a combination of altruism and a inherent feel good factor to try and help others that are in need.

And that’s OK as far as I see it. I’m still relatively new to applying my time and passion to solving these kind of problems, and what I now recognise is that you have to find a balance that keeps you happy.

As far as I see it, if you want to give back to society with your skills (whether in Data Science, Accounting or anything else) it takes time. But, unequivocally the time is absolutely worth it, regardless of the effort.

Success is all about persistence and doing the right thing for the long term

So, yep it’s a rather cliched title and I use it because working in your own time to apply your skills for the good of others is TOUGH and unrelenting, and it’s not generally due to technical difficulties. In my experience it has been due to a lack of maturity in the use of data/technology, lack of time/resource or substantive expertise in the area you’re focusing in. When you actually focus on the classifier, visualisation or clustering algorithm you’ve hopefully got a lot of the hard work done for you. Hopefully.

So, if its hard and you’re not doing it purely for altruistic reasons why should you devote your skills and time to help others?

Penny drop

pennydrop.JPG

As mentioned earlier i’m relatively new in the journey of attempting to help others by applying my skills in Data Science & Analytics, however, I can categorically state the feeling I first received from my first meetup with The Minerva Collective.

I left the meetup buzzing with excitement and passion for what was possible by meeting like-minded, talented and curious people together in a group setting to discuss approaches to a business problem or problems that a particular NFP/NGO or organisation may face. It was like the penny dropped in front of my eyes, and to me even better than this is seeing the penny drop in other’s eyes…here we were talking about problems like childhood obesity, mental health or domestic violence in a setting that simply set the neurons firing and got prototypes or hypotheses going.

Needless to say, this doesn’t even include the combination of new contacts and knowledge on Data Science you’ll certainly pick up by attending these events. Whether you’re mentoring someone or listening to a guru, you’ll learn and by the process of osmosis will improve your skills.

As always: 01110100 01101000 01100001 01101110 01101011 01110011 00100000 01100110 01101111 01110010 00100000 01110010 01100101 01100001 01100100 01101001 01101110 01100111

 

Markov chain models are so { random | hilarious | odd }

There’s something about Markov models to me that is cool yet very weird. They are one of the first types of examples of probability theory I stumbled on in around 2013/2014 and this was before I even knew what probability theory or a stochastic model meant or actually did. It’s only when I dived a bit further into the possible applications of them that I frankly found them so much fun.

Yes, that’s right Markov Chain models are a guilty confession of mine and as shown later in this post I sometimes tinker with them to create Frankenstein-esque applications in Python. Maybe its their – albeit very limited – capability to generate text and predict the future that keeps me entertained, who knows.

This blog post covers the basics of the markov chain process. I’ll readily state upfront that I certainly will not be able to cover over one hundred years of history/application of it in this one post. Instead my intention is to attempt to teach what they are, how they can be used and also how you might be able to have some fun with them 🙂

So, what the Markov?

Markov chain is “a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.”[1]

Markov Chains, Wikipedia

The markov process itself was named after the Russian mathematician Andrey Markov in the early 20th century when he published his first paper on the topic in 1906.

What I find particularly interesting about markov chains is that they essentially do not hold any kind of ‘attention’ or memory. The next step is only based on what state was attained previously.

As an example, consider the below table:

Markov Chains Explained, Tech Effigy Tutorials.

If it is cloudy, there is a 10% probability of the next state being cloudy, 50% of the next state being rain and 40% of the next state being sunny. Therefore it is most likely that the next day will be rainy and we move to the rainy state. The next day following is also likely to be rainy, based on a 60% probability.

Because this is a stochastic model and because of simply how probability theory actually works there is no way to say that this is what will happen. It’s just highly likely that the predictions from this table of data are that rainy days will follow cloudy days and that in turn rainy days are more likely to follow rainy days. One of the key things to remember here is that the prediction of the day’s state only has a dependency on the previous day, there are no other dependencies or form of memory/attention.

A good way to represent this table of data is also in a state diagram or a transition matrix that Joseph Armstrong has explained so well on the Tech Effigy pages:

Markov Chains Explained, Tech Effigy Tutorials.

Applications of a Markov Chain Model

The applications of a markov chain model are varied and there is also a number of derivatives of them. For example did you know that some of Google’s page rank algorithm uses markov chains in its prediction of what a random surfer of the search engine will do? Other examples of applications include:

  • Chemical reactions and engineering – Out of scope of this post but you only have to Google this to see a great deal of papers on the subject
  • Text prediction – Yes, your iPhone or Android (other phones are available) most likely uses markov chains in its text prediction based on what you have typed before.
  • Financial applications – Used for credit risk and predicting market trends
  • Text generation – Lots of applications here, however, some include:
    • Subreddit simulation – An automated process that uses markov chains to generate random submissions, titles and comments using markov chains.
    • Quote generation – You’ll see how this can be possibly achieved below
  • Google page rank

Admittedly, some of the above examples have limited scope (i..e. text generation) but some are also quite powerful and well used.

In a later post I’ll possibly also look to cover derivatives such as monte carlo markov chains (MCMC) & Hidden Markov Models (HMM).

MCMCs are widely used to calculate numerical approximations of multi-dimensional integrals and come from a large class of algorithms that focus on sampling of a probability distribution. Where as HMMs are a derivative of the standard markov chain that have some states that are hidden or unobserved.

Having fun with Markov chains

So, the fun bit. At least in my eyes!

It’s relatively easy (< 50-60 lines of code) to generate sentences using markov chains in Python from a given corpus of text. Shout out to Digitators.com who’s code I have used and modified for my purpose.

In the example I show below I’m looking to generate some new inspirational quotes based on around 350 existing classic quotes. Admittedly some of the quotes that I generated are a bit of a train crash, but then many others are hilarious and some of them are even quite profound! I have selected and included  a prize few quotes below the code, I DID NOT WRITE THESE MYSELF!

The code to generate new quotes

The overall process to create quotes revolves around:

  1. Getting data (in this case existing quotes) from a source file
  2. Ensuring data is in a correct format to create the pseudo-markov model, in this case a Python dictionary of words with associated values for each word.
  3. Generate a list of words of a maximum size that starts with a random START word, appends it to the list and then searches the dictionary for the next potential words from the START word.
  4. The loop finishes when either the max length is reached or an END word is selected.
 
# The only Python library we need for this work!
import random

Now to create the dictionary of START, END and associated words

 
# Create a dict to store our key words and associated values
model = {}
    for line in open('markovdata.txt', 'r', encoding="utf8") :
        line = line.lower().split()
        for i, word in enumerate(line):
            if i == len(line)-1: 
                model['END'] = model.get('END', []) + [word]
            else: 
                if i == 0:
                    model['START'] = model.get('START', []) + [word]
                    model[word] = model.get(word, []) + [line[i+1]];

And finally to create a function that will create a quote of max length or finishing with an END word.

def quotegen():
    generated = []

    while len(generated)< 30 :
        if not generated:
            words = model['START']
        elif generated[-1] in model['END']:
            break
        else:
            words = model[generated[-1]]
       generated.append(random.choice(words)) 

     print( ' '.join(generated))
 

That’s it! I’m sure the Python could be made more Pythonic but it allows you to see how you can easily use ANY text data to generate new sentences of data.

Example quotes:

“hope is not something ready made. it cannot be silenced.”

“fall seven times and that you go of my success is showing up.”

“challenges are empty.”

“each monday is during our possibilities become limitless.”

“to love what you feel inferior without your heart.”

“dream big and methods. third, adjust all your children, but people give up creativity. the dark, the light.”

“dream big and stand up someone else will forget what we must do.”

“change your whole life”

“a voice within yourself–the invisible battles inside all success.”

“the mind is afraid of success should be happy monday!”

I could go on but hopefully you get an impression of some of the zany and sometimes profound quotes that can be generated with less than 60 lines of code and approximately 350 quotes!

Until next time, thanks for reading 🙂

 

 

AI will take all of our jobs and that’s OK. Or is it?

Artificial Intelligence (AI) will take all of our jobs, and that’s ok

This was the initially damning sentence and also title of a Data Science breakfast meetup I attended at General Assembly this morning in Sydney. Organised by Geoff Pidcock and Anthony Tockar the roughly 50-100 people present were given a fantastic talk from Tomer Garzberg about a range of topics centered on AI and the current use of AI to replace jobs in Chinese factories simply by showing black screens of nothingness, representing that AI doesn’t and won’t need the same amenities such as running water, light and heat as us humans. That they can run in darkness, 24*7 for a period of time with little to no supervision. This is already happening!

So, unless you’ve been living under a rock and continuing your machine to destroy textile machines you’ll be aware that every day the uses of AI in employment are becoming more and more present and ubiquitous in our everyday lives and workplaces. Examples include the chinese manufactories mentioned above, AI lawyers, driverless cars as well as scores of other blue-collar work that follows a structured approach that is more easily automated.

The question you are probably asking yourself is: Is my job safe? What jobs are likely to be automated first? Well, really as Tomer and the panel this morning pointed out: No job is ultimately safe. This doesn’t mean that this kind of change will happen overnight, instead we can expect it to take generations, however, it is going to happen. 

Tomer’s also spoke to a number of variables that proposed the percentage a job area would be to automated in % terms when you looked at the variance, predictability and structure of the job in question. A good example to look at for comparison is a stock taker vs animal specialist. One has a very low variance, highly predictable and structured job, the other doesn’t and the complexity of applying AI & machine learning to one is much more difficult than the other. This again doesn’t mean it can’t or won’t happen, but that it is less likely.

Related but not the same is a recent development of a data scientist that used a neural network (later posts to come to explain this) to code a basic HTML and CSS website based on a picture of a design mockup. This shows that even web development can’t escape automation 😉 Turning Design Mockups Into Code With Deep Learning

Is anybody actually ready for this change?

After the enlightening talk from Tomer this morning a panel of 4 experts in the field of Data Science & AI answered questions regarding:

  • The deployment of AI in a business setting – I.e. a Junior lawyer being developed for a legal firm!
  • Political & economic issues – What will the governments do when people’s jobs changes – where will they go? What is the government doing? With the current capitalist market, what regulation is required to protect those jobs that are lost and can’t be re-skilled?
  • Psychological & Philosophical issues – Should we really be targeting higher areas of Maslow’s Hierarchy of Needs
  • Educational & governmental changes – Similar to above but more impactful, how will Educational systems need to change to prepare tomorrow’s students for a world changed by AI?

Whilst I still have very cautious optimism of the use of AI in the world and workplace I feel that no one is yet ready to fully embrace the change it will bring. More importantly I do not think the Australian government or educational system is ready for the economic changes that the rise of automation will bring and is already starting to bring. I say this not for impact or sensationalism but because it’s only a matter of time until the workplaces see the effects!

Further links for review about this topic:

A slightly different blog post today, I hope you’ve enjoyed reading it as much as I have in attending and learning more about the advance of AI in the workplace and the world. If you’re interested in the Data Science Breakfast meetup Sydney, it can be found here: Data Science Breakfast Meetup

 

 

 

Jupyter Notebooks – What? How? Why?

I’ve been using Jupyter notebooks for a few years now and whilst I don’t consider myself to be a ‘god like’ user of them, I do feel pretty comfortable with the innards of them.

First, there was a why

Asking why something is worth using is more powerful than just blathering on about how you can use it and what it does first I find.

So, with that in mind why you should you look at using Jupyter notebooks? And what for?

  1. They are fantastic for exploratory and prototype development
  2. They are easy to share and collaborate on with others to show research, findings, analysis etc
  3. They can be a relatively common medium for data science work in teams that carry out data analysis and science

And as the reverse of this, what shouldn’t you really look to use them for?

  1. Developing production ready Python or R application systems – Bad idea!
  2. Front end applications or web facing applications
  3. As a replacement IDE to something like Visual Studio/PyCharm etc

Before Jupyter we have to deal with snakes..

So, before we start talking exclusively about what Juptyer notebooks are and how can you use them I want to introduce a common Data Science software bundle: Anaconda. Anaconda essentially simplifies the use of Python and R for Data Science with a bunch of common scientific packages you will most likely use whilst carrying out Data Science work.

Its available on Windows, Mac OS and Linux and I would strongly recommend you check it out rather than having to install a Python engine, R Engine, Jupyter server and DB connectors separately. In addition with the Anaconda distribution you get the Conda package & environment manager which makes the download of more esoteric packages a piece of cake!

That’s awesome, but I came here for Jupyter notebooks

Ok! Rather than paraphrase what Jupyter notebooks are, i’ve taken the liberty of using Jupyter.org’s description of them:

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

http://jupyter.org/

Essentially, think of a Jupyter notebook as an interactive Python, R or Java (Scala too I think) place to write code, instructions in markdown and display visualisations and results from Machine learning, exploratory data analysis as well what is also said above. If you’ve used interactive Python notebooks you’ll definitely see the similarities.

As an example they look something like this:

JupyterNotebnookCapture

Jupyter notebooks work around a number of ‘cells’ that you can run individually, in tandem or as a selection. This is generally great but as you develop bigger notebooks you’ll need to also consider things like effective pipeline design and the right order of these cells. As is mentioned in: Reproducible Analysis Through Automated Jupyter Notebook Pipelines they also are not great for subsequent analysis runs. In other words you can’t easily automate the re-running of your analysis, again more of a problem when the notebook begins to bloat.

So, how do I go about setting up & using Jupyter?

To use Jupyter you’ve got a few of options:

  1. https://try.jupyter.org/ – A place to trial out Jupyter
  2. Download and install your own Jupyter notebook server/machine – A bit more involved but not that difficult and gives greater flexibility.
  3. https://mybinder.org/ – I saw Bindr for the first time the other day (shout out to data4democracy – Sydney and the guys!) and I have to say the concept is just awesome. The idea is that instead of running Jupyter locally on your own machine or using a remote notebook server you simply take advantage of Bindr to create a docker image of your Github repo and as an output it will create a Jupyter notebook! One point to note about this offering is that it’s still in Beta, however, if you’re not after a bullet proof, production ready way of hosting Jupyter notebooks I’d check this one out.

The trial and Bindr option need little extra explanation, however, I’ll now show you how easy it is to install and setup Jupyter on your own machine to carry out your own Data Analysis, cleaning, visualisation, whatever you want Data Science-wise.

Also from this point onwards I’ll be assuming you’ve using Python 2 or 3, are on Windows (only because it’s what I am using right now, I actually prefer Mac OS) and have it already installed via Anaconda/some other means. Ok, first:

Python 2:

python -m pip install --upgrade pip
python -m pip install jupyter

Python 3:

python3 -m pip install --upgrade pip
python3 -m pip install jupyter

Assuming no error messages you can now start the Jupyter notebook with:

jupyter notebook

Easy, huh? As an addition I’d recommend you create some form of batch/executable you can easily run/schedule so that it starts the notebook server without you having to run the above program manually each time. Here is an example:

cd "C:\Locationtostorenotebooks"
jupyter notebook
start chrome "http:\\localhost:8888"

Gotchas

Now, Jupyter is pretty great, however, like any product it does have its gotchas. One is that sometimes you’ll find that the server won’t let you ‘in’ without supplying a token for the server. To get around this you can do the following:

$ jupyter notebook list
Currently running servers:
http://localhost:8888/?token=abc... :: /home/you/notebooks
https://0.0.0.0:9999/?token=123... :: /tmp/public
http://localhost:8889/ :: /tmp/has-password

Security in the Jupyter notebook server

You can then copy the token into the box that it prompts you with.

Another more trivial issue that I’ve heard (but not yet seen) is that if you check Jupyter notebooks into Source Code repos like Git that it can screw with the formatting of the notebook. This is something I think that is easily avoided by using something like Bindr above.

Other funky Jupyter stuff you can do

So, you’ve got your Jupyter on and you’ve starting writing your own notebooks to harness the power of Data Science and all the cool stuff. What’s next? Well, the use of cell magic and other tips as shown in: 28 Jupyter Notebook tips, tricks, and shortcuts is worth checking out. A couple of examples I use commonly are:

%matplotlib inline -- This basically activates matplotlib plotting inline
with your notebooks
%%time -- Useful for finding out how long a cell block takes to run
%%writefile -- Export the contents of the cell to an external file

I don’t use this one currently but I definitely will be in future academic projects:

When you write LaTeX in a Markdown cell, it will be rendered as a formula using MathJax.

This should hopefully give you enough to get started!

Good luck and thanks for reading 🙂

Pt.2 of So, you want to be a Data Scientist?!

Hello!

In the part one of this blog post I covered some starting areas, considerations and general resources (MOOCs, Youtube channels) for people generally interested in Data Science. There are heaps of Data Science resources and blogs to learn from out there, and I’d encourage you to read as many as you can by looking at sites such as: Top 50 Data Science Blogs (which you’ll also find some of the top sites are blogs I follow too 🙂 )

That aside lets look at podcasts, Meetups, further steps for the people who want to go deep with Data Science and then look at how you can start tackling Data Science problems on your own machine today!

Podcasts

Podcasts should hopefully need no introduction, however, for those of you who are less audio attuned they are an episodic series of audio or video files about pretty much anything in which you can choose to download and or stream – Podcast, Wikipedia.

Personally, I feel they are a brilliant way to listen to new ideas, approaches or learn something new whilst you do something else. For me, i’ll usually listen to Data Science/Freakonomics podcasts on the way to work with a coffee in hand :coffee: 😀 as they the bandwidth of listening and walking go together so seamlessly.

Here are some of my favourite and recommended Data Science podcasts:

  1. Partially Derivative – Possibly one of my favourites series of all time, its unfortunately stopped, however, there are close to 3 years of podcasts. Often funny, often sweary, always informative with a healthy dose of alcohol during epsiodes.
  2. Linear Digressions – Running for around 2 years is Linear Digressions, a slightly more sober version of PD above that has buckets of decent content on varying DS topics. Katie & Ben are a great duo and come from some interesting backgrounds (Katie is a Physicist and has worked on the Udacity Machine learning MOOC)
  3. Talking Machines – I’ve only listened to a few of the episodes from this one but they are pretty content packed and can go quite deep, worth a listen!
  4. The Data Skeptic – One of the first podcasts I listened to regarding Data Science, a good start for beginners although the style of the podcast was not to my taste back in 2014/5

The above should give you a good breadth of listening material for some time, give it a shot when you’re next walking/commuting to work, hopefully you’ll enjoy them as much as I did 🙂

Meetups

If I was to give younger Dan some tips on how to develop a strong network of contacts to aid my Data Science or other tech experience I wouldn’t think twice to recommend Meetup.com! In the past five years in Sydney, Australia i’ve been to a variety of tech & DS meetups and it’s an absolutely fantastic way to not only build your networks out but also meet with other like-minded individuals who share the passion you’re there for too. Some good Data Science meetups in the Sydney area are for example:

  1. Data Science Sydney
  2. Data Science Breakfast Meetup
  3. Minerva Collective – I’m biased but a great bunch of brainy people doing good with Data
  4. Data for Democracy – Sydney
  5. Sydney Users of R (SURF)

I’m sure there are others but these are all ones I’ve had the pleasure of attending.

‘Actual’ Data Science Areas

So, we’ve been through a myriad of podcasts, meetups, resources etc but if you had no time deadlines and simply wanted go deep with Data Science/Machine learning et al what would you need to know to effectively practice it? I’m still learning this (and probably will be for a few decades to come!) but I would strongly recommend researching into (and in no order):

Related image

Linear Algebra, Khan Academy

  • Linear Algebra – Read this blog by Jason Brownlee to understand the relative importance of Linear Algebra in Machine learning/Data Science. It’s not crucial to know linear algebra for the running of a ML model, but it will certainly help for choosing the right algorithm for your problem and/or tuning your model when things can go south. Things like Singular Value Decomposition and Principal Component Analysis are heavily rooted in Linear Algebra, as is the understanding of higher dimensions and how to perform operations on them. A book worth checking out for this is by the legendary Gilbert Strang
  • Probability Theory – Also integral to portions of Data Science & Machine Learning is Probability Theory. Towards Data Science – Probability Theory summarises this meaty topic pretty well. I won’t spend any more time badly explaining it!

A conditional probability related to weather – In this example: Probability of rain occurring, given a sunny day.

  • Software Engineering – At some point as a Data Scientist you’ll need to write code. Whether this is Python, R or something else its key to have a good grasps of Software Engineering areas such as:
    • Version control
    • Unit testing
    • Modular programming
    • API design/creation
    • Taking an idea from Development to Production (this is something a decent Data Science team would have Data Engineers assist with but not everyone is this lucky)
  • Statistics & the Scientific Method – Ok, saying Statistics is a bit of a catchall I’ll admit but knowing when to perform a Student’s t-test over an Analysis of Variance (ANOVA) for samples of data is just one example of when knowledge of Statistics is useful. What about if we were looking to determine the value of a parameter of a population? Would you know that bootstraping/resampling is one way to achieve this? What about how cross-validation works? I could go on..

Wikipedia, Scientific method

  • The scientific method is absolutely key to performing – in my opinion – robust Data Science. I’ve purposely put this in caps because having a business/research problem and formulating a testable hypothesis is something that is generally missed when you learn Data Science on the MOOCs or Stack Overflow. By doing this you’re actually doing the ‘Science’ in Data Science!

This blog post has kind of gone for a bit a longer than I originally intended, so next time we’ll cover how to get Jupyter & Python running on your machine, possibly even using Bindr with Git! I’m also keen to get your thoughts on the length of these posts so please drop a comment if you like!

Until next time, thanks for reading 🙂