Data Science is an intersection of domain knowledge, technical expertise, and statistics. It gives us the power to evaluate existing data, perform various functions such as visualization and manipulation which in turn help us in decision making.
In the 30DaysOfLearning, we cover different Data Science techniques, that as a Data Scientist equips you with the relevant knowledge to analyze and visualize data. The full curriculum can be found at: https://aka.ms/30DL-DSMLPage. In this blog, we will summarize the different concepts covered and a link to the resources.
Setting up your local environment
Starting out is like going on a journey, and with every journey, you must plan, prepare and have everything ready before you leave. To continue, you will need to set up your local environment by downloading the necessary resources and tools. For our case, you will need to set up a GitHub account, redeem your Azure for Students credits, set up your Visual Studio and finally learn how to link your local code from Visual Studio to GitHub.
Get going with Python.
Once your local environment is all set up, we need to start out by learning basic programming. Python is one of the main programming languages used in Data Science. A major advantage of the language is its abundance of libraries which enable you to analyze your data with ease. In our first session, we covered how to build a simple BMI calculator application. While building the application we explored more on dealing with data types, variables, operators, conditional logic and so on. You can review the concepts at https://aka.ms/py4beginners or watch the video below for the full demo.
Prepare your data.
The Data Science process begins with getting your data. Real world data is often messy. You might encounter some missing values, duplicated data, or data not in the right format. For example, if someone fills in the same form twice, how do you mitigate this? As we went through the data preparation class, we learnt to import our data, the different libraries we needed and how to locally install them and finally how to utilize the libraries. Using the pumpkins dataset, we dealt with missing and duplicated data as well as how to work with data within data frames. A follow up session on manipulating data covers more in-depth including some exploratory statistics and joining data.
Share your data findings.
We have learnt a new language, worked on our data and now we might want to share with the world the journey we have gone through. The first thing is sharing your data story. In this section we cover data visualization, what charts can you use where and how to create meaningful charts. For this session we used the birds’ dataset and tried various visualizations, you can watch the session video here.
Bonus: bring your data to the cloud
What does data science have to do with the cloud? Microsoft Azure allows you to easily scale your resources, has high availability and ensures your data is always secure. If this is not enough, using services such as AutoML and ML Designer, you can prepare data, train, and publish your models without writing a single line of code. Learn how below: Data Science in the Cloud
Your turn, try out Data Science
We have gone through all the major Data Science processes and now it is your turn. Using the spam dataset, go ahead and perform all the different data science techniques we have covered.