Learn Big Data For Free
Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
Big Data is a growing field and you probably have a lot to learn if you want to learn about it.I will try to provide the path I took: 1. Start by Learning a Programming Language: If you want to tackle Big data you should know Python/Java.
This brief tutorial provides a quick introduction to Big Data, MapReduce algorithm, and Hadoop Distributed File System.
This tutorial has been prepared for professionals aspiring to learn the basics of Big Data Analytics using Hadoop Framework and become a Hadoop Developer. Software Professionals, Analytics Professionals, and ETL developers are the key beneficiaries of this course.
Before you start proceeding with this tutorial, we assume that you have prior exposure to Core Java, database concepts, and any of the Linux operating system flavors.
There are expected to be 4.4 million big data jobs by 2015 in governments and every sector of industry. Combine this with a shortage of people trained to carry out the analysis needed (predicted to be nearly 200,000 by 2018) and depending on your point of view you have either a lot of unfilled vacancies, or a lucrative career ahead of you.
But won’t you need a degree and relevant experience? Well, possibly. Not everyone can afford to spend years going back to college and retraining, but there are alternatives.
Increasingly colleges and universities are putting courses online where they can be studied for free. You may not get a degree at the end, but that might not be important. IBM big data evangelist James Kobielus said in 2013 “academic credentials are important but not necessary for high-quality data science. The core aptitudes – curiosity, intellectual agility, statistical fluency, research stamina, scientific rigor, skeptical nature – that distinguish the best data scientists are widely distributed throughout the population.”
Some of the courses do offer certificates of completion or other forms of accreditation, which can certainly go on your CV to impress potential employers.
Of course if you’re not in the employment market – say you run your own business – then these are valuable purely for the knowledge they can give you. There’s no reason that a reasonably competent person couldn’t use that knowledge to launch their own data strategy and reap insights, whatever their business. I would love to hear in the comments section if anyone has done this.
Here’s an overview of what’s available online from various schools, colleges and universities:
The University of Washington’s Introduction to Data Science is available online at Coursera – a huge repository of online learning.
The course can be completed in 8 weeks if you put in 10 to 12 hours’ study per week, and covers the history of data science, key techniques and technologies such as MapReduce and Hadoop as well as traditional relational databases, designing experiments using statistical modeling, and visualizing results.
Some basic programming knowledge is needed, but don’t worry there are plenty of free courses where you can pick that up too, if you don’t already have it (see below).
Coursera’s courses usually run between set dates – if you want accreditation or certificates, you have to register before a set date and complete them before a final deadline. However if you are just interested in the knowledge, you can download all the course materials – which come as videos and reading material – to browse at your leisure.
Harvard also makes its Data Science course available for free online. All lectures are uploaded as videos shortly after they take place, and materials and homework assignments are uploaded to the open source knowledge repository Github.
This course covers what it calls the key facets of a big data investigation: Data wrangling, management, exploratory analysis, prediction and communication of results. Some basic Python knowledge is required.
A familiarity with the basic concepts of statistics is fundamental to big data analysis. You can learn them from Stanford’s course Statistics One, also on Coursera.
The course assumes very little background knowledge and describes itself as “a comprehensive yet friendly” introduction to the subject. It is also designed to work as a refresher for anyone who may have studied it at school or college in the past but let themselves get a bit rusty on the fundamentals!
Those looking for slightly more in-depth or specialist knowledge may be interested in Stanford’s Algorithms: Design and Analysis course. Programming knowledge is essential – you will be expected to know at least one language, i.e C, Java or Python.
The course covers the fundamental principles behind algorithmic design – design paradigms, randomized algorithms and probability, graph algorithms and data structures.
Speaking of programming, a basic level of familiarity with at least one language is recommended for anyone interested in data. Python is a good choice, as it is designed for very fast processing of very large datasets, and is widely used in big data enterprise. Codeacademy.com , Coursera.org and MIT all offer free courses in Python designed for absolute beginners with no programming experience.
If you’re interested in machine learning – the fast growing field of creating self-learning algorithms that can adapt themselves based on data with no human input – there are courses for that too.
The California Institute of Technology’s Learning from Data course includes all of the lectures uploaded to Youtube and iTunes for convenience. It’s one for those who already have some academic background in computer science and are looking to move into a field where a lot of exciting breakthroughs are being made.
Visualization is key to gaining insights from data. Graphs, charts and other far more creative techniques are employed to help us spot patterns hidden in mountains of numbers or unstructured data. UC Berkeley makes its Visualization course available for free online, which can teach you techniques and algorithms used to create effective and well-designed graphical representations of data. You will need some familiarity with one popular graphics API (such as OpenGL or GDI+) as well as one data application (Excel will do). Whichever you choose is up to you as the assignments can be submitted in any format.
Here you have it, you can learn all about Big Data for free – so no more excuses! Hope this post was useful? As always, if you know of any other free Big Data learning resources then please share them in the comments below.
Hadoop
If you want to read my future big data articles then please click 'Follow' and send me a LinkedIn invite. And, of course, feel free to also connect via Twitter, Facebook and The Advanced Performance Institute.
To get a quick overview of big data, have a look at this video:
You might also be interested in my new book 'Big Data: Using Smart Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance'. You can read a free sample chapter here
Finally, here are some other recent posts I have written on the topic:
About : Bernard Marr is a globally recognized expert in business data. His new book is: Big Data: Using Smart Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance