Microsoft Certified Data Scientist
Early this month I received my Microsoft Professional Program Data Science Certificate. It was an exciting journey, but I am equally excited that it ended and look forward to working as a data scientist. Allow me to share my experience, especially with those who want to follow my footsteps.
Introduction
Early 2017 I started my company blog hoping that it would be propelled to the top of the search rankings. That didn’t happen, but in October the blog refocused on data visualization. That’s how my interest in data science started. I felt certification would help me stand out from the crowd, so I enrolled in the Data Science Track of the Microsoft Professional Program (MPP).
Microsoft identified the 8 top-most skills that needed to be taught in consultation with the industry. The curriculum is built around these skills and delivered as massive open online courses (MOOC) on the edX learning platform. That worked just fine for me, since I could take the lessons from the comfort of my home office. Without a regular job, I spent a few hours every day to learn something new and practice what I had learned. I accelerated my learning in the second quarter and completed the program in 7 months with the submission of my capstone project.
- Learning Path for Azure Data Scientist Azure Data Scientists apply Azure’s machine learning techniques to train, evaluate, and deploy models that solve business problems. Become a Microsoft Certified Professional Designing and Implementing a Data Science Solution on Azure DP-100T01 Explore AI solution development with data science services in.
- The Microsoft Professional Program (MPP) is a collection of courses that teach skills in several core technology tracks that help you excel in the industry's newest job roles. These courses are created and taught by experts and feature quizzes, hands-on labs, and engaging communities.
Microsoft Certified: Azure Data Scientist Associate. Page Sections. Azure Data Scientists apply Azure’s machine learning techniques to train, evaluate,.
Program Structure
The Data Science Certification program is made up of 3 units and a final project taught over 10 courses. The MPP Data Science Certificate will be awarded when you achieve a 70% pass rate and obtain a verified certificate for the 10 courses. You could first enroll in the free audit track and upgrade later, since a verified certificate costs $99 per course. In total the MPP Data Science Certificate will cost you $990.
Some courses give a choice between different technologies. For instance, analyzing and visualizing data is taught with Excel and Power BI. For the programming courses one has a choice between R and Python, but nothing prevents you from learning both. The choice between R and Python will impact your learning experience, so you need to put some thought into it.
A synopsis of the curriculum is presented below, and further details can be obtained on the Microsoft and edX pages.
Review of Courses
Course 1: Data Science Orientation
This course gets started with an explanation of the curriculum and an encounter with a variety of data scientists. The following modules teach data science fundamentals and provide a basic introduction to statistics. For many of us this will be a refresher of what we already know. The course can be completed in a single day and is a good primer for anyone with an interest in data science. I really enjoyed this course, and it became a motivator for the remainder of the track.
Course 2: Querying Data with Transact-SQL
The content of this course is rather intricate and geared towards database professionals. Various aspects of Transact-SQL are taught in 11 modules, so be prepared for a long ride. The combination of lectures, demonstrations and hands-on lab exercises kept me engaged, but at times I wondered whether the presenters knew I was watching. The course taught me the essentials of SQL, but I’m not sure how much of it I will use. An added benefit of this course is that it taught me how to create an Azure SQL Server Database.
Do you like brains? Race mods for morrowind. 3 downloadsThis is a peculiar mod, to say the least.
Course 3b: Analyzing and Visualizing Data with Power BI
I learned this skill with Power BI, since I was keen to increase my understanding and proficiency. The first modules teach the fundamental BI workflow of data transformation, modeling, visualization, and sharing. The other modules cover various topics and ensure that you obtain an all-round view of the Power BI product. Will Thomson and his team delivered the course with infectious enthusiasm, and I viewed the lessons repeatedly. Read my article Interactive Reports with Power BI to see how I put what I learned into practice.
Course 4: Essential Statistics for Data Analysis using Excel
The modules in this course help you gain a good understanding of descriptive statistics, basic probability, random variables, sampling and confidence intervals and hypothesis testing. The lectures are excellent, and the demonstrations use lots of real-world examples. I enjoyed the lessons and even felt like a statistician, but the formulas fade quickly when you don’t use them! Overall an excellent foundation course for data scientist that equips them with essential skills in statistics.
Course 5b: Introduction to Python for Data Science
Programming courses in the data science track offer a choice between R and Python. I had some knowledge of Python and prefer its natural language, but chose Python due to its increasing popularity within and beyond data science. The modules cover basic Python and the Numpy, Matplotlib, and Pandas packages, and don’t require any previous knowledge. The videos are concise and to the point and the exercises are easy, but the final exam is timed and I struggled a bit with the manipulation of Pandas data frames. The exercises and final exam introduced me to DataCamp’s learning platform, so I know where I can get more practice.
Course 6: Data Science Essentials
This course starts with an introduction to data science, recaps statistics and data visualization, and ends with an introduction to data munging and machine learning. The demonstrations and lab exercises are conducted with Azure ML and Jupyter Notebooks. Both tools are user-friendly and provide an excellent learning environment for machine learning and programming. I enjoyed this course because it reinforced earlier learning and made me appreciate the entire data science workflow. The lab exercises are easy, and you should be able to attain a morale-boosting score.
Course 7: Principles of Machine Learning
This course builds on the previous one and offers a more in-depth overview of classification, regression, and clustering models. There’s a module that focuses on model improvement, while other modules cover tree and ensemble methods and optimized-based methods like neural networks and support vector machines. The lectures are excellent and could act as a future reference, but the calculus can be challenging. The demonstrations and exercises use practical real-world examples and doing the exercises provided great fun and excellent learning. I recommend that you do this course well, since it prepared me for a successful completion of the capstone project.
Course 8b: Programming with Python for Data Science
This course is delivered by Coding Dojo, the industry’s premier coding bootcamp, and is taught by Authman Apatira one of their lead instructors. The lectures are excellent, but you will spend most of your time on writing Python code and learning packages like Scikit Learn. The course is extensive and covers topics like data preparation, feature engineering, dimensionality reduction, data modeling and evaluation. Even though I was closely engaged with this course for a month, I almost freaked out during the final exam. Applying freshly acquired programming and problem-solving skills under the pressure of time proved a challenge, but that’s what you sign up for as a data scientist.
Course 9a: Implementing Predictive Analytics with Spark in Azure HDInsight
I had intended to take Applied Machine Learning to learn more about the use of location data and satellite imagery, but that course was removed as an option by end March 2018. The course Implementing Predictive Analytics with Spark in Azure HDInsight feels like a let-down after the intensive learning in the previous course. One learns to provision an HDInsight Spark Cluster in Ms Azure and gets some exposure to Spark Python, but apart from that there’s limited new learning. A break in-between storms doesn’t harm, and I can add the use of Spark and Spark Python to my CV.
Course 10: Capstone Project
The Capstone Project runs for 6 weeks until the first month of every quarter and our cohort had to build a model that could predict earthquake damage in Nepal. The project assumes that you have completed the other nine courses, since you need to apply what you’ve learned. The 3-part challenge of my project consisted of data exploration, a data model competition and the submission of final report that was reviewed by fellow students. The model competition provided the fun part where one that to develop a multi-classifier from an imbalanced dataset. I struggled to improve on a multiclass logistic regression algorithm with default parameters, but feature selection and tuning of the hyperparameters got the job done. Limited time and visiting relatives added pressure to the assignment but competing and working with other students was stimulating.
What I Learned
The Data Science program claims to teach 8 fundamental skills that a data scientist needs. I completed the curriculum and obtained my data science certificate, but what did I really learn?
- Query relational data – I had worked with relational databases and SQL for many years, but my proficiency in the use of SQL has certainly improved.
- Analyze and visualize data – I had used pie and bar charts extensively, but learned to use other chart types and obtained new skills in data modeling and creating dashboards.
- Understand statistics – I had learned statistics, but this course made me understand statistics and why it is so important when you work with data.
- Explore data with code – I knew a bit of Python, but this course fully taught the basics and introduced me to relevant packages like Pandas, Matplotlib and Scikit Learn.
- Understand core data science concepts – data science is an evolving discipline, but I am now familiar with the core concepts and know how to apply them.
- Understand machine learning – I knew little or nothing about machine learning, but I now have a firm grasp of the principles and methods and know some of the key applications.
- Use code to manipulate and model data – I learned to manipulate data and apply machine learning with Python, but there’s more to be learned and constant practice is needed.
- Develop intelligent solutions – I learned to deploy and use an intelligent machine learning solution with Spark on Azure HDInsight, but not how to build it.
It’s debatable whether all these learned skills qualify me as a data scientist, and some even argue that you need to be in practice. I know a few people who practice badly, so I don’t hesitate to call myself a data scientist. Ready to work hard and open to new learning.
Learning the functional data science skills also enhanced my technical skills in tools like Excel, Power BI and Azure ML. A qualified data scientist should not be constrained to one technology, but skills in these products form an adequate starting point. In addition, they form a good reference for exploring and using other products and technologies.
Wrapping Up
Completing the MPP Data Science track and earning my certificate has been a wonderful experience and I have no regrets about the time and money spent. I got particularly excited about machine learning and have started thinking about its application in various industries. Even so I obtained all-round data science skills that can add value to almost any organization.
The MPP Data Science track is a good choice If you have the time, the money, good internet, and are ready to work with Microsoft products. There are many alternatives for learning data science, so you need to find out what’s on the market. Consider also your career stage and objectives. I had a long career in business and the geospatial industry, so for me this was a good choice. But there could be better options if you are looking for a first job as a Python programmer.
What struck me in this course is that there is an easy and a hard way to do data science. Azure ML offers Machine Learning as a Service (MLaaS) and I see this becoming a self-service tool for business executives and professionals. Python programming on the other hand gives greater control, but it’s tedious and a lot harder. Let’s see who wins this battle, or will they continue to live happily ever after?
Demand for skilled data scientists continues to be sky-high, with IBM recently predicting that there will be a 28% increase in the number of employed data scientists in the next two years.Businesses in all industries are beginning to capitalize on the vast increase in data and the new big data technologies becoming available for analyzing and gaining value from it.
This makes it a great prospect for anyone looking for a well-paid career in an exciting and cutting-edge field.
But it isn’t just those following a traditional academic path – such by studying for one of the best US data science masters degree courses I covered in a recent article – who can benefit.
There are also a large number of free online courses and tutorials which a motivated individual could use as a springboard into a rewarding and lucrative career.
Who could benefit from a free online data science course?
Employers are waking up to the fact that employees with the ability to use data and analytics to solve business problems are increasingly valuable, whatever their background or position in an organization.
A lot of this is because of the proliferation of self-service infrastructure and tools designed to automate many of the technical but repetitive tasks involved with data cleaning, preparation and analytics. This means workers are increasingly able to carry out complex data-driven operations such as predictive modelling and automation without getting their hands dirty coding complex algorithms from scratch.
However, someone with an understanding of the principles will often be in a better position to use these tools productively than someone without! So, if you are looking to enhance your own CV with analytics skills you could do far worse than look at some of these courses. It’s worth noting however that while you can educate yourself with these courses without spending a penny, some of them charge for certification when you’ve finished.
Coursera – Data Science Specialization
Coursera provides one of the longest-established online data science educations, through John Hopkins University. It isn’t completely free – if you can afford it, you are expected to pay a course and certification fee – but this is waived for students who don’t have the financial resources available.
Comprised of 10 courses, the specialization covers statistical programming in R, cluster analysis, natural language processing and practical applications of machine learning. To complete the program, students create a data product which can be used to solve a real-world problem.
Coursera – Data-Driven Decision Making
Also from Coursera, this course is provided by PwC so unsurprisingly focuses more on business applications than theory. It covers the spectrum of tools and techniques which are being adopted by businesses today to tackle data challenges, and the different roles that data specialists can fill in modern organizations. Students are also tutored on selecting the best tools and frameworks for solving problems with data. The four-week course concludes with a task involving deploying a data solution in a simulated business environment,
EdX – Data Science Essentials
This course is provided by Microsoft and forms part of their Professional Program Certificate in Data Science, although it can also be taken as a stand-alone course through EdX. Students are expected to have an “introductory” knowledge of R or Python – the two most popular languages for data science programming at the moment. Subjects covered include probability and statistics, data exploration, visualization, and an introduction to machine learning, using the Microsoft Azure framework. Although all of the course material is free, students can pay ($90 in this case) for an official certificate on completion.
Udacity – Intro to Machine Learning
Machine learning is undoubtedly one of the hot topics in data science right now, and this course aims to give a full overview, from theory to practical application. As well as an introduction to selecting data sources and choosing which algorithms best fit a particular problem the course also forms a part of Udacity’s paid-for “nanodegree” in data analysis.
IBM – Data Science Fundamentals
IBM provides a number of free online courses through its portal formerly known as Big Data University and now rebranded as Cognitive Class. This program covers data science 101, methodology, hands-on applications, programming in R and open source tools. Collectively they should take around 20 hours to complete although those with prior experience of computer science will probably progress more quickly, whereas complete beginners may take a little bit longer.
California Institute of Technology – Learning from Data
This course focuses on machine learning and is delivered as a series of video lectures along with homework assignments and a final exam. As well as an overview of how computers “learn”, it goes into depth with the mathematics (students are expected to have a working knowledge of matrices and calculus, so this one isn’t for complete maths newbies).
Dataquest – Become a Data Scientist
Dataquest is an independent online training provider rather than being affiliated with a university like most of the others here. It offers free access to much of its course materials although you can also pay for premium services which include tutored projects. It offers three paths – data analyst, data scientist and data engineer, and with endorsements from Uber, Amazon and Spotify it looks like a good way to get a feel for whether or not you will enjoy studying data science, without spending money.
KDNuggets – Data Mining Course
KDNuggets is a well-known business and data science website and it has compiled its own free data mining syllabus. There are modules on machine learning, statistical concepts such as decision trees, regression, clustering and classification (see my data science glossary for an introduction to these terms) as well as an introduction to practical implementations of the technology.
The Open Source Data Science Masters
Rather than being offered by an organization or institution, this course is comprised of a collection of open-source materials and resources, available freely online. Subjects covered include natural language processing of the Twitter API using Python, Hadoop MapReduce, SQL and noSQL databases and data visualization. It also includes a grounding in the algebra and statistics needed to understand the fundamentals of data science. Of course there is no certification but the program can be completed at your own speed and works great as a gateway to the wealth of information on data science available online.
'>Demand for skilled data scientists continues to be sky-high, with IBM recently predicting that there will be a 28% increase in the number of employed data scientists in the next two years.
Businesses in all industries are beginning to capitalize on the vast increase in data and the new big data technologies becoming available for analyzing and gaining value from it.
This makes it a great prospect for anyone looking for a well-paid career in an exciting and cutting-edge field.
But it isn’t just those following a traditional academic path – such by studying for one of the best US data science masters degree courses I covered in a recent article – who can benefit.
There are also a large number of free online courses and tutorials which a motivated individual could use as a springboard into a rewarding and lucrative career.
Who could benefit from a free online data science course?
Employers are waking up to the fact that employees with the ability to use data and analytics to solve business problems are increasingly valuable, whatever their background or position in an organization.
A lot of this is because of the proliferation of self-service infrastructure and tools designed to automate many of the technical but repetitive tasks involved with data cleaning, preparation and analytics. This means workers are increasingly able to carry out complex data-driven operations such as predictive modelling and automation without getting their hands dirty coding complex algorithms from scratch.
However, someone with an understanding of the principles will often be in a better position to use these tools productively than someone without! So, if you are looking to enhance your own CV with analytics skills you could do far worse than look at some of these courses. It’s worth noting however that while you can educate yourself with these courses without spending a penny, some of them charge for certification when you’ve finished.
Coursera – Data Science Specialization
Coursera provides one of the longest-established online data science educations, through John Hopkins University. It isn’t completely free – if you can afford it, you are expected to pay a course and certification fee – but this is waived for students who don’t have the financial resources available.
Comprised of 10 courses, the specialization covers statistical programming in R, cluster analysis, natural language processing and practical applications of machine learning. To complete the program, students create a data product which can be used to solve a real-world problem.
Coursera – Data-Driven Decision Making
Also from Coursera, this course is provided by PwC so unsurprisingly focuses more on business applications than theory. It covers the spectrum of tools and techniques which are being adopted by businesses today to tackle data challenges, and the different roles that data specialists can fill in modern organizations. Students are also tutored on selecting the best tools and frameworks for solving problems with data. The four-week course concludes with a task involving deploying a data solution in a simulated business environment,
EdX – Data Science Essentials
This course is provided by Microsoft and forms part of their Professional Program Certificate in Data Science, although it can also be taken as a stand-alone course through EdX. Students are expected to have an “introductory” knowledge of R or Python – the two most popular languages for data science programming at the moment. Subjects covered include probability and statistics, data exploration, visualization, and an introduction to machine learning, using the Microsoft Azure framework. Although all of the course material is free, students can pay ($90 in this case) for an official certificate on completion.
Udacity – Intro to Machine Learning
Machine learning is undoubtedly one of the hot topics in data science right now, and this course aims to give a full overview, from theory to practical application. As well as an introduction to selecting data sources and choosing which algorithms best fit a particular problem the course also forms a part of Udacity’s paid-for “nanodegree” in data analysis.
IBM – Data Science Fundamentals
IBM provides a number of free online courses through its portal formerly known as Big Data University and now rebranded as Cognitive Class. This program covers data science 101, methodology, hands-on applications, programming in R and open source tools. Collectively they should take around 20 hours to complete although those with prior experience of computer science will probably progress more quickly, whereas complete beginners may take a little bit longer.
California Institute of Technology – Learning from Data
Microsoft Learning Data Science
This course focuses on machine learning and is delivered as a series of video lectures along with homework assignments and a final exam. As well as an overview of how computers “learn”, it goes into depth with the mathematics (students are expected to have a working knowledge of matrices and calculus, so this one isn’t for complete maths newbies).
Dataquest – Become a Data Scientist
Dataquest is an independent online training provider rather than being affiliated with a university like most of the others here. It offers free access to much of its course materials although you can also pay for premium services which include tutored projects. It offers three paths – data analyst, data scientist and data engineer, and with endorsements from Uber, Amazon and Spotify it looks like a good way to get a feel for whether or not you will enjoy studying data science, without spending money.
KDNuggets – Data Mining Course
KDNuggets is a well-known business and data science website and it has compiled its own free data mining syllabus. There are modules on machine learning, statistical concepts such as decision trees, regression, clustering and classification (see my data science glossary for an introduction to these terms) as well as an introduction to practical implementations of the technology.
Rather than being offered by an organization or institution, this course is comprised of a collection of open-source materials and resources, available freely online. Subjects covered include natural language processing of the Twitter API using Python, Hadoop MapReduce, SQL and noSQL databases and data visualization. It also includes a grounding in the algebra and statistics needed to understand the fundamentals of data science. Of course there is no certification but the program can be completed at your own speed and works great as a gateway to the wealth of information on data science available online.