What is a recovering data scientist?

“What is a recovering data scientist?” This question arrives in my LinkedIn messages at least once a week. It’s easy to see why, since my LinkedIn title says “recovering data scientist”. While “recovering data scientist” is admittedly a bit clickbaity, there’s also considerable truth to the moniker.

The “recovering data scientist” schtick started several years ago as an inside joke with some data scientist friends. Working as data scientists, we’d grown tired of seeing data science projects (mostly machine learning related) fail over and over again. At the time, data science was touted as “The Sexiest Job of the 21st Century”. It seemed like every company wanted to “do data science”, and people were jumping head first into the field. Today, the field of data science has matured somewhat, but it’s much the same – lots of interest, and a white hot euphoria about how data’s going to change the world.

Why don’t I feel the same level of euphoria about data science right now? Simply put, reality and expectations don’t often align. There’s a belief that data science will magically, instantly, and painlessly transform a business. While there are some success stories, most data science projects fail. Failure is good if you can learn from your mistakes. My problem is that several years later, the same mistakes still happen over and over again.

What’s going on here? There’s a lot of cargo cult data science. Simpler (and proven) approaches are often ignored because they’re “not machine learning”. Many data scientists I’ve seen get the order of operations wrong. All too often, a data science project starts without understanding the data, the domain, or building the proper infrastructure to support production machine learning. Instead, data scientists simply leap head first into machine learning. The result is usually a giant graveyard of data science projects stuck on someone’s laptop, never seeing the light of day in the broader business.

What’s the solution? In short – get back to basics with data. Be realistic. Set proper expectations. Develop a plan with data science, and take the time to build the right foundation for success. Good things take time. I firmly believe that data science can yield amazing benefits for companies when it’s executed correctly. And personally, I’d love to stop being a “recovering data scientist”.

Leave a Reply