How to start a data science business?
Never start an ambitious project alone. It is important to know that how customers are obtained, how ruthless marketing is done, how you lure experts to hire. You need someone for managing such issues – while you continue working on your core idea – either building your framework for data processing, or product or whatever.
You need to decide a business model. Such as:
1. Product that Uses Data
A social network that recommends you new pubs in the city, or Uber for bicycle riders, dating service for geeks, or something equally crazy. Here data is not the visible part – but the hidden core of your product. Your target customers will be standard users in the street.
2. Product the Crunches Data
If you are developing ML or NLP tools or writing/improving new algorithms, here target customers will be those who need to mine the massive amount of data. Think it as a B2B product.
3. Service in Analytics, Prediction and Visualization
Rather than a generic tool, you’ll work on a specific requirement of your clients. If you are good with networking, marketing and finding new people, given you’re decently good, this can be a good model to start with. I’ve been engaged in small freelance projects within this model. You can later slowly form a team and get people with deeper expertise into Visualization or Optimization. This is B2B service.
Once you’ve decided a model and have a clear, move on to build a team, create a framework and build a business plan.
You need following skills to start as data scientist:
(1) Communication, Contacts and Networks.
This is going to be your sales or marketing or business development guy. He can be a college fresher who attends too many hackathons for the sake of free pizza and t-shirts. He is smart, and has some great contacts – personally know people at top level of your prospect clients. Or it might be a non-tech arts student who’s been active in social media platforms. She’ll get you in touch with those who need your service.
Translate tech-speak to normal English
See larger picture and knows what will be happening 5 years ahead
Preferably, knows how to advertise or sell. Or write business plans.
(a) Theoretical and State of the Art expertise
This can be a Masters or PhD graduate who’s done some cool work in Machine Learning, Artificial Intelligence or Statistics. These people know what are the new ideas and concepts coming up, and which one of these will be used widely five years ahead in line. The field is rapidly growing and you have to run with the time.
☐ Good theoretical knowledge of statistics, pattern recognition and
☐ Ability to quickly grasp the core concepts in research papers and other scientific literature
☐ Ability to translate mathematical concepts into algorithms or programs
(b) Development Ninja
The core operations in any company that’s distinctively related with Computers include ability to develop and deploy a complete application.
☐ Full stack development capability for your planned architecture
☐ Databases, NoSQL or whatever.
☐ DevOps. Maybe you’ll need a dedicated person later.
☐ Big Data tools. There are several options here. Hadoop-Mahout stack is good. You can work with Spark + MlLib.
(c) Domain Specific Knowledge
If you’re working with a product of certain kind, you need a person with good amount of expertise and experience with the kind of data you might expect. It can be Natural Language Processing if you’re working on textual data from web or social media. You can be an expert in Computer Vision if you’ll be working with images. Or BioInformatics – if you’re preying for such clients. Or simply marketing or sales knowledge if you want to improve marketing processes using data.
☐ Domain Expertise (define it your own way)
☐ Figure out best and most interesting solution for the problem
☐ Thinking out of the box. In past 2 years, there’s a sudden rise in new companies as well as new problems which need to be solved.
☐ Presentation skills. How to present results to the users? Visualization, graphics, or just a cool app which shows the analysis results in an intuitive manner.
You definitely need a good team. If you are great with tech, you need someone to manage the business aspects. You’ll need more tech guys with you. However 2-3 is the best number. Define the roles and responsibilities from the start. Do not define the titles – there’s no use of calling yourself as a CEO when you don’t even have a product to offer.
Then work on building a framework. You should have ready-to-use internal tools for common tasks. Train tools yourself with the data from the domains you’re interested to begin with. Build complete pipelines for NLP stuff – preprocessing, feature generation, vectorization. Make sure every component is a building block and can be easily replaced with a next version and another block. If it’s a product, start worrying about the UI before it’s too late.
Do not be overly eager to market before you have a demo to showcase. Get a website, online pages, start writing white papers, find contacts among top positions of your target companies, attend tech-cons and buy a stall, find more contacts.
Lastly, be slow and steady. If you are an undergrad, you still have a lot of time in hand. Find someone with good amount of industry experience who can mentor you in both business and technical aspects. There’re many things we don’t see as important in academia – you may use Word Embeddings everywhere and get five research papers out of your work – while many big companies are still using rule based system which can produce a result in 200 milliseconds which you take five seconds to process.
Slow, and yes, steady.