1. You’ll likely be using SQL a lot more than you’d expect
tl;dr: Like it or not, SQL will always be here to haunt you, so make sure you take the time to be proficient in it.
It’s a myth that SQL is a skill that only data engineers, data scientists or data analysts used. Rather, if you are working in a data-related role, whether it’s a data science role or not, you’ll be exposed to SQL. As a data scientist, you’ll need data to build machine learning models. This means that you’re either going to have to query your data from existing data banks, or build pipelines if the data doesn’t exist yet. Therefore, it’s extremely important that you know SQL well so that your data is robust and scalable.
2. Data in the real world is a lot messier than you’d imagine it to be
If you’ve ever worked with data on Kaggle, real world data is nothing like it. On Kaggle, the data is typically clean, descriptions are provided for each table, and each column and feature names are fairly easy to navigate around. Unfortunately, this is not the case in the real world. Not only will you not likely have any of the things that I listed above, but you probably will face one or more of the following problems:
- Differently spelled data entries, i.e. United States, USA, US, United States of America
- Incomplete or missing data entries
- Inconsistent data where numbers or logic does not tally
To manage your expectations, you should bare in mind that majority of your time is going to be spent on cleaning your data. It’s very unlikely that you’ll be able to jump straight into modelling.
3. A vague term like “data science” equates to vague responsibilities
The more you read about data science, the more you’ll realise how broad data science. In fact, it’s so broad that there are different types of data science jobs our there – data scientist, data analyst, decision scientist, research scientist, applied scientist, data engineer, data specialist and this doesn’t stop here. Additionally, as it’s a multi-disciplinary field, the term “data science” covers such a wide variety of skills that its unlikely that you’ll be able to perfect all of them. Therefore have an open mindset and try not to stay so fixed on the glamorous parts of data science. For example, if you find yourself querying tables or working on data architecture instead of working on machine learning models, don’t be discouraged. Any data-related skill is a valuable skill to know and will most likely come in handy in the future!
4. Communication skills are essential
Working in a data science-related role doesn’t mean you simply work with data to build models all day long. Rather, you’ll be required to collaborate and communicate with other cross-functional stakeholders. Even if you’re a team of one, you’re going to have to communicate with leadership about the work that you’re doing and it’s tangible business impact. You’ll also likely have to collaborate with other teams and business analysts to build that domain knowledge. So yes, communication skills is instrumental in helping you become a successful in your data science career!