Pt.2 of So, you want to be a Data Scientist?!

Hello!

In the part one of this blog post I covered some starting areas, considerations and general resources (MOOCs, Youtube channels) for people generally interested in Data Science. There are heaps of Data Science resources and blogs to learn from out there, and I’d encourage you to read as many as you can by looking at sites such as: Top 50 Data Science Blogs (which you’ll also find some of the top sites are blogs I follow too 🙂 )

That aside lets look at podcasts, Meetups, further steps for the people who want to go deep with Data Science and then look at how you can start tackling Data Science problems on your own machine today!

Podcasts

Podcasts should hopefully need no introduction, however, for those of you who are less audio attuned they are an episodic series of audio or video files about pretty much anything in which you can choose to download and or stream – Podcast, Wikipedia.

Personally, I feel they are a brilliant way to listen to new ideas, approaches or learn something new whilst you do something else. For me, i’ll usually listen to Data Science/Freakonomics podcasts on the way to work with a coffee in hand :coffee: 😀 as they the bandwidth of listening and walking go together so seamlessly.

Here are some of my favourite and recommended Data Science podcasts:

  1. Partially Derivative – Possibly one of my favourites series of all time, its unfortunately stopped, however, there are close to 3 years of podcasts. Often funny, often sweary, always informative with a healthy dose of alcohol during epsiodes.
  2. Linear Digressions – Running for around 2 years is Linear Digressions, a slightly more sober version of PD above that has buckets of decent content on varying DS topics. Katie & Ben are a great duo and come from some interesting backgrounds (Katie is a Physicist and has worked on the Udacity Machine learning MOOC)
  3. Talking Machines – I’ve only listened to a few of the episodes from this one but they are pretty content packed and can go quite deep, worth a listen!
  4. The Data Skeptic – One of the first podcasts I listened to regarding Data Science, a good start for beginners although the style of the podcast was not to my taste back in 2014/5

The above should give you a good breadth of listening material for some time, give it a shot when you’re next walking/commuting to work, hopefully you’ll enjoy them as much as I did 🙂

Meetups

If I was to give younger Dan some tips on how to develop a strong network of contacts to aid my Data Science or other tech experience I wouldn’t think twice to recommend Meetup.com! In the past five years in Sydney, Australia i’ve been to a variety of tech & DS meetups and it’s an absolutely fantastic way to not only build your networks out but also meet with other like-minded individuals who share the passion you’re there for too. Some good Data Science meetups in the Sydney area are for example:

  1. Data Science Sydney
  2. Data Science Breakfast Meetup
  3. Minerva Collective – I’m biased but a great bunch of brainy people doing good with Data
  4. Data for Democracy – Sydney
  5. Sydney Users of R (SURF)

I’m sure there are others but these are all ones I’ve had the pleasure of attending.

‘Actual’ Data Science Areas

So, we’ve been through a myriad of podcasts, meetups, resources etc but if you had no time deadlines and simply wanted go deep with Data Science/Machine learning et al what would you need to know to effectively practice it? I’m still learning this (and probably will be for a few decades to come!) but I would strongly recommend researching into (and in no order):

Related image

Linear Algebra, Khan Academy

  • Linear Algebra – Read this blog by Jason Brownlee to understand the relative importance of Linear Algebra in Machine learning/Data Science. It’s not crucial to know linear algebra for the running of a ML model, but it will certainly help for choosing the right algorithm for your problem and/or tuning your model when things can go south. Things like Singular Value Decomposition and Principal Component Analysis are heavily rooted in Linear Algebra, as is the understanding of higher dimensions and how to perform operations on them. A book worth checking out for this is by the legendary Gilbert Strang
  • Probability Theory – Also integral to portions of Data Science & Machine Learning is Probability Theory. Towards Data Science – Probability Theory summarises this meaty topic pretty well. I won’t spend any more time badly explaining it!

A conditional probability related to weather – In this example: Probability of rain occurring, given a sunny day.

  • Software Engineering – At some point as a Data Scientist you’ll need to write code. Whether this is Python, R or something else its key to have a good grasps of Software Engineering areas such as:
    • Version control
    • Unit testing
    • Modular programming
    • API design/creation
    • Taking an idea from Development to Production (this is something a decent Data Science team would have Data Engineers assist with but not everyone is this lucky)
  • Statistics & the Scientific Method – Ok, saying Statistics is a bit of a catchall I’ll admit but knowing when to perform a Student’s t-test over an Analysis of Variance (ANOVA) for samples of data is just one example of when knowledge of Statistics is useful. What about if we were looking to determine the value of a parameter of a population? Would you know that bootstraping/resampling is one way to achieve this? What about how cross-validation works? I could go on..

Wikipedia, Scientific method

  • The scientific method is absolutely key to performing – in my opinion – robust Data Science. I’ve purposely put this in caps because having a business/research problem and formulating a testable hypothesis is something that is generally missed when you learn Data Science on the MOOCs or Stack Overflow. By doing this you’re actually doing the ‘Science’ in Data Science!

This blog post has kind of gone for a bit a longer than I originally intended, so next time we’ll cover how to get Jupyter & Python running on your machine, possibly even using Bindr with Git! I’m also keen to get your thoughts on the length of these posts so please drop a comment if you like!

Until next time, thanks for reading 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s