Words Ali Kokaz
November 28th 2022 / 8 min read
In the past decade, data has become impossible to ignore. Buzzwords like ‘big data’, ‘AI’, and ‘machine learning’ are plastered across pitch decks and company strategy documents.
Search data science online, and you’ll find an endless trove of tutorials and articles. But making data science work at your company is much more than building a fantastic and complex algorithm, and for that, material is few and far between.
Across a decade as a data scientist, at investment banks and early stage startups, I’ve gathered a good idea of what good data science looks like, and how, used appropriately, it can help any company regardless of size. In this guide, I’ll share my three first steps to building a data-driven startup environment:
Applying appropriate methods
Evaluating what level you’re operating at
Understanding how to take steps up the ‘data pyramid’
AI or machine learning is often used as a catch-all term to describe a range of data science capabilities. While it would be tedious to create a definitive list of data science use cases, here are eight of the main areas in data science that have a solid track of creating impact across a business:
Predictive Analyticsusing pattern learning algorithms to predict future outcomes. (usually what people think when they mention ‘Machine Learning’)
In property, regression algorithms are used to drive pricing strategies
Adaptive machine learning fraud management systems can reduce ‘genuine transactions declined’ by 70%, and undetected real fraud by 25%: a saving of roughly $12 billion annually
Recommender Systemsfor recommending relevant products by understanding product characteristics or previous user patterns. Popular in any type of marketplace where you want to optimise choice for users
Around 35% of what customers spend on Amazon is due to their recommendation engine
Netflix’s recommendation engine is estimated to drive around 80% of what viewers choose, saving the company over $1 billion a year
Computer Visionalgorithms/methods that can understand, process, or generate visual content. This can include object detection, counting, classification, recognition, image/video processing, tuning and creation.
In commerce—Amazon Fresh stores use computer vision to remove need for checkouts
Tesla has chosen to adopt computer vision for its autonomous vehicles, instead of lidar which most driverless car manufacturers use
Natural Language Processing (NLP)understanding meaning and patterns in free-form text from users or documents. Used to process data from text, as well as generating different mediums (speech or text).
Search engines like Google Search, where NLP is used to decipher meaning and better understand queries in ways that humans would understand them. Similarly, Alexa or Siri (or any of their counterparts) use a form of NLP
Many hedge funds use ‘sentiment analysis’ (a type of NLP to decipher opinion) to help them take advantage of sentiments to make better predictions about financial markets
Clustering & Segmentationto discover natural patterns and groupings in data, such as groups of customers based on their behaviour or demographics. At the heart of any market research or customer understanding: finding broader patterns across large groups of people to inform your decisions.
‘Moneyball’: using data to drive recruitment decisions in sport. Liverpool FC used clustering as an integral part of building their squad that went on to win their first ever Premier League title in 2020.
Time Seriesalgorithms/methods that specialise in analysing/forecasting data that evolves over time, or are inherently time focussed
COVID pandemic responses used time series forecasting to build models of how waves of infection might pan out.
Retailers like Tesco use time series approaches to build better stock forecasting to drastically reduce wastage (by around £50 million a year)
Geospatialmethod of analysing objects, events, or other features on the earth’s surface, using imagery, GPS, satellite photography, and historic data, described explicitly in terms of geographic coordinates or implicitly, in terms of a street address, postal code, or forest stand identifier as they are applied to geographic models.
In delivery services—Royal Mail achieves 50% lower van usage through geospatial analysis. Food apps, speedy groceries, and last mile delivery all use similar approaches.
Visualisationto show trends & findings in data more easily. This is a popular method of helping laypersons to understand data that you are trying to present, or trying to highlight particular findings. Here are some great examples of visualisations.
Interested in receiving more insights like this? Subscribe to our newsletter and join our 16k+ community of founders, investor, and innovators.
More often than not, companies are not ready for data science. Some companies don’t have the right infrastructure to maintain simple models, let alone Machine Learning or AI. Many even struggle to identify what data they collect and where it's stored. Before you hire a team of stellar data scientists to transform your organisation, you need to understand where you sit as an organisation.
I use this hierarchy to guide businesses to understand which stage they are at, and what they need to do to achieve the holy grail of AI transformation.
Be honest about your company's status, judging where the majority of the company or projects are at, not just your most advanced project. Aim to work your way up the pyramid, setting strong foundations to help you achieve the next step. Going through this pyramid exercise not only allows you to set realistic expectations, but gives you a rough roadmap of what you need to achieve to maximise value from your data.
For example, for the first stage, you have data collection and storage. What data do you want/need to collect, and where is it stored? Have you created steps to help you automate collection processes? Have you logged what data is held where? If you haven’t done the majority of that, then any steps you build above that stage will likely be unreliable, and not show the true value.
One common challenge I see is, after establishing what level they’re at, companies struggle to create momentum to move through the hierarchy. It’s easy to get stuck in a slow loop where projects fade into “Proof of Concept purgatory”—a world where POCs either struggle to prove value, or move too slow for the needs of the business and become outdated, sometimes before they are even finished.
There are three pieces of advice that I would give to a business in order to avoid POC purgatory:
1. Outline/create a shared vocabulary
When I land in a new role or team, one of my first tasks is aligning on terminology. For one, this helps develop a two-sided understanding. It helps me understand the business, the terminology used within and how certain metrics are defined. It also allows me to clarify and explain DS terms, which ones are important, and educate on how to view and interpret them. This process will also help define key metrics that you want to use.
This will also give you transparency. Many view data science as a ‘black box’ environment, so explaining these terms and vocabulary helps teams appreciate and understand how DS works, which more broadly will build up trust with what you’re doing.
2. Make sure your department goals are aligned with the overall business
This is pivotal to the success of a data department within an organisation. There are a few steps you should take to ensure this happens:
Define critical business KPIs to target—key to tracking what is important for the business. For product-led businesses, this comes from your North Star Metric or OKRs.
Prioritise projects based on these KPIs—rank the proposed impact of all projects based on agreed KPIs. This allows you to focus on projects & workstreams with the best and most important return.
Agree on which areas the data team should focus on—this prevents the team from being stretched too thin and going in all directions. As the team grows in size and maturity, this scope can be expanded/altered accordingly.
Create a roadmap with the business—merge all of the above to help create a roadmap. Depending on how mature the objectives are, you could agree on actual projects, or more broadly themes that will be tackled by the data team. Regularly revisit and update. Ensure that you get agreement from the business on this, so everyone has clear expectations and understanding of what the data science team will deliver
3. Create simple, consistent, repeatable processes
Consistent innovation rarely happens through genius moments in brainstorming sessions, but rather through data driven experimentation and fast iteration. In order to make that happen, data departments should create simple, understandable and repeatable ways of working, allowing them to test hypotheses and place winners into production rapidly.
To give a real life example of this, Meta has around 60 different versions of its Facebook software at any one point, and is able to rapidly place the best performing version into production in just over a few weeks. It is only through having a consistent testing and implementation approach that Meta engineers are able to so quickly validate and promote the best performing iterations.
To put this into practice, here are the set of questions I ask before undertaking any data science project:
Why are you doing the project—i.e. what value does the project bring and how does it contribute to the wider data science team and business goals?
Who are the main stakeholders of the project?
How will the project be used?
What are the success criterias for this project?
What is the current solution to the problem?
Is there a simple and effective solution to the problem that can be performed quickly?
Have you made an effort to involve the right people with enough notice and information?
How will you make sure that the project can be easily understood and handed over to someone else?
How will you deploy your solution?
How will you validate your work in production?
How will you gather feedback for the solution once implemented?
Another effective example is to have a predefined workflow for tackling Data Science projects. A well-defined workflow for data science applications is a great way to ensure that various teams in the organisation remain in sync, while helping your data scientists know what to work on next without much overhead.
One thing to note is that there isn’t a “one size fits all” solution to all data science projects. Components should be tuned and added depending on the company and team objectives. But for me, there are certain steps that should be ubiquitous in all data science teams, and some common approaches should accompany those. My steps are:
Understand the business problem or question, gather requirements, and define scope. Define and reach out to the stakeholders you need for this project
Acquire the required data
Clean & explore data. Understand what the data shows (and its limits), along with cleaning the data and handling outliers, unclear business logic etc. You will be heavily involved with the SMEs at this point, and you may have to iterate between steps 1-3 for a while.
Model. This is where the actual analysis happens - e.g. mathematical modelling, graphing analysis, ML model creation, etc
Evaluate how well your models perform? This can take different forms, ranging from ML model performance testing, to A/B testing uplift.
Deploy your analysis/model to drive decisions. This delivery can take different forms, the most common being an ML model API, dashboard, regular email etc
De-brief by communicating the results and impact, identifying what went well and what didn’t. Use this as a teaching opportunity for members of the team that weren’t involved, and to fine tune and improve processes.
Monitor. Build the required maintenance parts of the project, how do you update the model? How do you keep track of actions or outputs? How do you collect feedback from the business?
Once you’ve delivered on the previous steps, you may be in a position to start building out your data science function and hiring a team of expert data people. We’ll cover this in greater detail in a future blog post—so stay tuned.
You should now have a starting point to how to get the most out of data science within your startup, below are some of my favourite guides and articles to help you on your journey even further.
How to structure your data science team
Tips on building a data science team at a small company
awesome-datascience - A highly in-depth github page with material ranging from books, podcasts and training material to datasets and packages.
Data-Science-Cheatsheet - A two-page data science cheat sheet with great coverage on major Data Science topics
data science project management
data science weekly newsletter
A host of free data courses on Google
Ali Kokaz is the Head of Data Science at Founders Factory. He’s also the founder of Buzzing Stocks, using social media algorithms to identify trending stocks. He has previously worked in senior data science roles at Credit Suisse, proptech investor Bricklane, and clothing startup Vintage Threads.
Learn more about our latest Blue Action Accelerator investment, and how they’re building dynamic technology to address the heat and aridity of the hottest deserts
Learn about the story behind TrustedHousesitters, the marketplace for pet owners and sitters, recently acquired at $100m valuation
Learn more about our latest Fastweb investment, and how they’re building an integrated solution against the most sophisticated classical and quantum level threats to data