Hi all, we are getting a lot of calls and questions from talented people wondering how they can become a data scientist. And after talking, we usually get to the question “What is data science really about?” which is a good question, and the most important one if you ask me.
Well, when asked what is data science I find myself providing with a somewhat different answer, depending mostly on the project and research I am currently working on. So imagine the variance when talking to other practitioners. When interviewing for data scientist positions, I often find myself meeting people who are very talented and well skilled, who nevertheless lack the requirements to get the position. So having said that, and in the quest for a clearer answer I will try to gather information from a number of practitioners I highly respect, and answer the age old question, How do I become a data scientist, and what is all about ?!
All the information regarding first and second degree is written according to my experience in the Israeli educational system
We will mention a lot of topics here, If you are interested of learning any of those topics, you are in luck, we are working on (close) future posts, explaining where to start learning those subjects and disciplines.
What is data science
And how is it related to machine learning, and AI if at all?
All of the three fields are somewhat related, and separating them fully is artificial, but generally speaking, Data Science is the science (art) of using data to extract meaningful knowledge, information, insights or anything you might want to extract from it, for instance, when calling a data scientist you might ask him to take the data from your stock market bot, and use it to help you decide when should you buy a stock. He may use any kind of tool to try and answer this question, including but not only, statistical models, AI, rule based AI, machine learning based AI, and manual information analysis.
The obstacle here is that people often miss the fact that data science is all about data, and it’s science, meaning you have to deal with a lot of data, and create a scientific process, If you feel that you don’t like either of those things, you might want to reconsider…
AI is the field of making the computer seem (or be, depending on your personal believe) smarter and help support or take decisions, for instance using graph searching algorithms to predict how long will it take to drive from a to b, using rule based AI to build a simple chatbot, and using machine learning to create a very complex engine to support decisions.
Machine learning Is the tool mainly used to process data that is too big (or complex) for manual processing and extract information and decisions from it. E.g analyzing stock market in real time, large scale image recognition that is used to analyze more images than a human can (automatically analyzing images humans can analyze themselves, but more efficiently), or to analyze images a human can’t (high performance image recognition.
They are all integrated and data science is the research process, while AI is the main tool, and machine learning is the main methodology to build an AI engine today. You can be a pure machine learning or AI researcher but this is more academic, chances are you will find yourself doing data science and using AI and machine learning as tools.
What do you need to know In order to practice data science
That field is so big, you just can’t learn everything you need ahead, you will have to select a big set of tools and methodologies that you will be acquainted with, and a smaller set of things to learn in depth, then when practicing in real life, according to the domain you find yourself in, you will need to add a lot more tools and knowledge to your arsenal.
Knowledge and tools to start from are:
Calculus, discrete and linear algebra, you definitely don’t have to be a math genius but some basic knowledge is good, if you want to do more research then it’s a must.
This is the main tool, a few years ago you could have had it alright without programming skills, but today it’s not a good idea to try that.
Some graph searching and optimization techniques, require a bit of theory understanding.
You must have a basic knowledge of computer science algorithms, a few courses in MIT open courses should be alright.
Both classic and deep learning.
NLP and Image Processing with deep learning are very important. As well as familiarity with active and reinforcement learning.
Data science is the science of data and data is our main resource and our task is to extract insights from it. Having said that, we put a lot of time into manipulating the data into a form which the machine can handle. That means a lot of technical terms such as: vectorization, normalization, tokenization and feature engineering. We will explain more about these in depth in a future (close future) post. Generally speaking this is one of the lengthiest tasks you will tackle when working as a data scientist which will also require domain expertise.
What kind of a job will I get as a data scientist
well you can have a very large variety of jobs, starting from jobs that are very much oriented towards programming and development, and jobs that are pure research positions, don’t worry, when you like something, you will find yourself putting time into it, and becoming better and finding jobs that are more suitable to your preferences, the only string attached is that you have to actually be honest with yourself and put the time to be better at what you like, even if it’s not a part of your first job.
Where do I learn it
You have to learn a lot of things, I will try to give a more in depth blog posts for each field later on, but generally speaking the internet is the main source of knowledge for every one today, but try to have someone to consult with, having the right person to ask a quick question in time of need is a very good idea, and helped me a lot.
Also subscribe to stackoverflow, quora and coursera.
Thenewboston is great for basic programming.
MIT and the israeli Technion offer great math courses online.
There are plenty of resources in our field, we will go over them in depth in another post; Coursera, Stanford university online, Bar Ilan university online, Community courses, and Bootcamps are all good options, each has its pros and cons.
Is school important?
I found my masters degree a great place to get exposed to a lot of fields and tools, generally speaking if I wanted to know something in depth, I had to put the time into it myself, but I learned there many things that showed themselves useful later, such as math, general AI, theoretical ML and DL, and also met a lot of people to consult with in time of need. Those co students will soon be your colleagues and having good connections is very useful. Also the professors are usually very strong researchers in multiple areas, and being able to consult with them is very helpful.
Do you have to have a Master’s degree
I found that my second degree was very useful both for knowledge and for job interviews.
I don’t have a Master’s degree in data science. However, I did attend master’s level courses in the field and these broadened my knowledge as well as provided me with the opportunity to consult with great professors
Is the thesis important
If you want to be a researcher then yes. You will learn how to create a well formed research, how to ask a question and provide an answer, of course academic and industrial research is very different, but having this experience is very good for you.
Do you have to have a relevant bachelor degree
No, but it helps, I found myself learning by myself a lot of things my fellow students already knew. And it was sometimes hard (yet possible).
University vs college
I have no solid opinion here, If you want to do research then university is better for you (only place to write a thesis to the best of my knowledge) and If you want industrial experience, then college will provide you with more time for it.
What programming languages are important
Python is an absolute must, R is nice, Java is nice also. Golang starts looking like a new star.
How did i get here?
I did not come in the front door, I first got my BA in accounting and economics, then was a CPA intern, done some other things that are accounting related and decided to go back to school and learn AI. I learned many of the BSc courses, and started my MSc while working as an intern in an infrastructures company that needed some data science and learned it all on the fly. Then moved to an algorithms centered company as a researcher, and afterwards to my current position as a machine learning tech lead in a big consulting company. I think that the MSc definitely helped me a lot and mostly thanks to the thesis. Though I can’t really say that I did in my job the same things as I did in my research. I did learn a lot and got my horizons expanded, met a lot of tools that I later had to learn in depth in order to use. And most of all, I got my head wrapped around the in depths of machine learning.
I graduated with an MEng degree in Biomedical Engineering with specialization in Electronics before there was a Data Science track. During my degree I learned the required calculus and most importantly gained substantial experience in programming (C++ and Matlab) that was enough for me to be able to learn new programming languages very quickly. In my first roles I worked as an engineer in the medical field performing signal processing and data analysis. These led me to learn Python (Pandas) and R within a few days and on the job. Later, I started studying data science by taking a few courses online, trying out a number of projects and followed by obtaining my position as a data scientist while attending advanced classes at Bar Ilan University.
We are looking for an additional study buddy, with significant background in data science and machine learning as well as time to put into research. Please contact us, if you are interested in putting a few hours a week into reviewing new papers with the purpose of learning how to do things people are rarely able to do in our field