We produce quality training data for computer vision and natural language processing, which means we have a lot of experience with machine learning projects throughout their lifecycle. Our experience with firsttime ML project teams has given us valuable insights into the kinds of approaches that foster project success and the obstacles that can cause costly delays and even project failure. Data scientists are well aware of the best approaches, the ugly obstacles, and what it takes to be AI-ready. Does the rest of the organization?
Here are 7 tips to being AI-ready:
- Make sure your budget reflects the importance of AI to your corporate strategy.
- You will need to make sure your budget matches the board’s enthusiasm for AI. Research has shown that organizations list AI as a top 3 business priority while the same research shows spending forecasts place it way down in the list of budget priorities. AI projects have high costs upfront. Your project team of Data Scientists, machine learning engineers, project managers, data specialists, and the data infrastructure investments are high value and high cost. our budget needs to reflect that.
- Make sure your data house is in order first.
- A data team is more than just data scientists. Data scientists are an important and expensive hire, and should not be expected to clean up, or create, the company's data infrastructure (this is both a technical and political challenge that data scientists aren't trained for) --- And that unfortunate request has resulted in Data Scientists leaving their jobs.
- Hire well.
- Which brings us to hiring practices. We’ve seen some big companies stand up a small lab to support their first AI projects, and try to staff it with junior data scientists. This strategy often ends with failure. It's important to hire a mix of people covering an ever-broadening list of titles. Enterprises that have already learned this lesson are trying to build well-rounded teams that include data scientists, machine learning engineers, project managers, data specialists, and others. The problem most enterprises are encountering when doing this is that the demand for these titles is huge, but supply is shockingly low.
- Protect the lab.
- We mentioned above that data scientists leave their jobs when they are tasked with data clean up. Data scientists who joined an organization with ambitions of doing disruptive and innovative work being called on to fix departmental database problems are likely to similarly follow suit and leave. As will a constant bombardment of random requests for reports or spreadsheets, because they have “data” in their title.
- Get agile.
- Don't retrace the steps that software development teams have taken over the last 30 years. Software engineers no longer build large systems according to the waterfall method. Now they follow the agile method by building, testing and releasing small pieces of functionality early and often. AI projects invariably are a return to waterfall. Where an agile approach to AI could lead to ROI from some parts of the project quickly, the waterfall method pushes any benefits to the end of the entire project. Agile can also help to find any issues earlier in the project, saving time and money in the long run.
- Understand the scope of the training data challenge.
- Machine learning algorithms require immense volumes of training data. People outside of data science often have no idea how much data is required. As a result, data scientists are expected to try to label and annotate training datasets themselves in-house. They become enmeshed in a bottomless task which is a waste of their talents. This wasted effort often results in the project budget evaporating and deadlines missed. Either plan to bring in all of the resources required to train your own data - specialized technology, a large pool of labor, and specialized project management skills - or go outside.
- Understand that training never ends.
- Some models operate in universes that are constantly evolving and changing, which presents that model with new things it hasn't been trained on. It’s the rare algorithm that can be trained to anticipate and understand every edge case. Your training apparatus needs to stay in place, and human judgement will be required, even after your model is in production.
Learn More About Our Annotation Solutions