What I learned from my Data Science CodeAcademy course
I’m taking a career break to spend some time with family and decided to learn some new skills — this article aims to summarise my new found knowledge about Data Science and discusses my experience on using CodeAcademy for the Data Scientist path.
What is Data Science and what’s it like to learn on CodeAcademy?
Data Science supports decision making with data analytics. The CodeAcademy course itself teaches you the basic skills to manipulate large data sets, analyse them using various python libraries and visualise your findings in easy to create graphs and charts.
Furthermore, you are also introduced to Python in a good level of detail which will allow you to create programs using Python 3 and start to work away from the CodeAcademy platform.
The platform itself has an excellent framework to guide you through the syllabus — as well as plenty of reading on CodeAcademy and links to useful articles, CodeAcademy shines by its use of an interactive programming module which allows you to try out your code out live on the web browser. It also has the functionality to check your code, provide guidance and access to the solution if you need it. CodeAcademy encourages you to program in groups and sets you up in a cohort of other coders who are doing the same course as you — you can connect and work with each other over the CodeAcademy forum or using their Discord server.
However I found although folks were reasonably responsive on the Discord and Forums, I wasn’t able to find anyone on the same part of the syllabus as me so I ended up doing a lot of the projects on my own.
What did you learn?
- Python programming — aside from the basic syntax, I was able to code using loops, dictionaries, data frames, classes and try my skills out on various challenges
- Data Acquisition — learned about different data sources and how its possible to create your own data sets using publicly available APIs. I also used a cool Python library called BeautifulSoup which allows to take data that’s displayed on websites and pull them into your code
- Data manipulation and wrangling — I became confident in aggregating and manipulating data using SQL and a Python library called Pandas as well as manipulating data so it is more uniform with Python’s Regular Expression libraries. Ensuring your data is consistent and organised well is necessary for the more complex data analytics you can run
- Basic statistical analysis — I revisited some statistics from my school days (averages, variance, standard deviation, quartiles etc.) by creating functions in Python and then using Python’s built in Numpy and Stats libraries which have statistical functions built in making it very easy to use.
- Hypothesis testing — I learned about hypothesis testing and how to implement these in Python including 1- and 2-sample t-tests, ANOVA, Tukey’s range test, a Binomial test, and a Chi-Square test. Essentially being able to test assumptions on a piece of data and calculate the probability whether it fits within a significance threshold for the rest of the population of outcomes
- Data visualisation — One of my favourite parts of the course was using Python’s matlib and seaborn libraries to create beautiful charts and graphs to illustrate how data was distributed from certain data sets available from Kaggle.com
- Natural Language Processing — Playing with Python’s NLTK library to analyse pieces of texts for their meaning was eye opening. Equally learning about bias in search engine’s use of these tools and the ethical responsibilities coders have to ensure inclusivity.
- Machine Learning — I learnt about Python’s powerful machine learning tools and how they could be used to make predictive data models. Essentially feeding a python library (e.g. K-Nearest Neighbor or Naive Bayes classifers) a set of data on say Tweets from Twitter and then using the model to predict behaviour of the new Tweet e.g. whether it would go viral or not
- Deep Learning — This was just a pretty basic introduction into some of the concepts of Deep Learning
Overall, I felt like I succeeded in learning a new programming language and picked up some new skills on dealing with large data sets to drive business decision making. I hope to use these continually in the future and will aim to share some of my progress on medium too.