5 Data Science Myths Busted

Mavis LohUncategorized

Transitions into data science are tough, even scary! And it is not because you need to learn maths, statistics, and programming. You need to do that, but you also need to battle out the myths you hear from people around you and find your own path through them! These myths often make you feel like only geniuses can work in the field of data science – which is just not true. Whether you’re a recent graduate, an experienced professional, or a leader, it’s important to understand how data science works and you will find your place in the industry. Here’s 7 data science myths busted!

1. A Ph.D is MANDATORY if you want to become a data scientist

Don’t get us wrong, bearing a Ph.D. degree is an amazing achievement that you should be proud of! It requires years of hard work and dedication. However, is it really compulsory to do a Ph.D in order to become a data scientist? It really depends on which data science role you see yourself in – is it an applied data science role, or a research role? It’s important to understand the distinction between these two roles as the former primarily works with existing algorithms and understanding how they work. In simpler terms, it’s about applying these techniques in your project and you DO NOT need a Ph.D for this role. This category pretty much covers most job openings you see on LinkedIn or Glassdoor.

But if you’re interested in the latter, then you might just require a Ph.D as this role requires a Ph.D candidate’s mindset – research and create new algorithms from scratch and write scientific papers. It also helps if the Ph.D adds to the domain you want to work in.

2. You need a full-time Data Science degree to make the transition

Much like the Ph.D dilemma, this is another myth many who aspire to work in the field of data science fall for. With the rising interest for data science, you may ponder how you can stand out from your peers. Splashing out money to get a degree seems like a good starting point and it is an understandable reaction. Fortunately, this is a myth perpetuated predominantly by institutions.

In a vast and complex field like data science, practical experience is king. There are numerous projects you can pick up and work on right now. Or find a problem you are passionate about solving and see if data science techniques can be applied there. Furthermore, there are plenty of resources available online for learning. Alternatively, you could get yourself certified with Hackwagon’s Data Science 101. Not only are they recognised by the industry, you only need 7 lessons with each session lasting 3 hours long! Because of the lack of formal education in this field, transitioning into data science boils down to sheer hard work, discipline and practical experience. Those are the differentiating factors your next recruiting manager will look at.

3. Universe of AI Jobs

An artificial intelligence project isn’t only limited to the role of a data scientist. Instead, it has a universe of jobs attached to it and it requires working with different disciplines across the length and breadth of the project. If you aren’t aware by now, there is a plethora of interdisciplinary roles that ranges from data engineer, data analyst, statistician, IoT specialist, software engineer to UX designer and more that exists.

What we’re trying to get at is that there’s no fixed format/template on how AI works. This is especially relevant for leaders to understand that each role in order to create a successful project.

4. Data Science is only about building predictive models

Being able to predict an event is a powerful thing. And that’s what stands out to newcomers in data science. Building models that can predict what a customer will buy next sounds like a must-have skill, right? Wrong. In fact, there are multiple layers in a data science project. The model building part is just a speck in the overall data science lifecycle. To give you a general idea, the steps involved in a typical data science lifecycle includes:

  • Understanding the problem statement
  • Hypothesis building
  • Data collection
  • Verifying the data
  • Data cleaning
  • Exploratory analysis
  • Designing the model
  • Testing/Verifying the model
    • If an error is found, head back to the verification or cleaning stage
  • Putting it into production (deploying the model)

Nothing is as straightforward as they teach you in a classroom or a course. Experience is the best way to learn how a project works. Try talking to someone who has seen the end-to-end process. Even better, get an internship and get a first-hand account of what makes a data science project tick.

Additionally, data science isn’t limited to simply making predictions. I’m sure you’ve come across the market-basket analysis concept. It’s a combination of clustering techniques and association rules. Or how about anomaly detection? The ability to figure out outliers in the data. There’s a never ending list of things to learn!

5. Data collection is a breeze, the focus should be on building models

Brace yourself before I burst that bubble of yours. Unfortunately, the data you get in real life is far less perfect than what you’d expect. Most experienced data science professionals are well aware of this situation as well. Expect to be tested on this subject thoroughly in an interview.

Data is being generated at an unprecedented pace but collecting and cleaning it isn’t getting any easier. Without building a pipeline to collect the data, your data science project is going nowhere. Although, this is typically within the job scope of a data engineer but as a data scientist, you should be aware of this function as well. We cannot emphasise enough on the importance of the data collection step. Collecting honest and accurate data is imperative to your final model working well.

There are too many sources of data available. How do you connect to each? What data format do you receive from each? What’s the cost of data collection from each of these sources? This is a microcosm of the kind of questions you’ll need to ask in a real-world setting. Roles like database manager, database architect and data engineer have taken on a new level of importance. Maintaining the integrity of the data and the aforementioned pipeline is as important as any other task that succeeds it.