The success of AI is perceived differently by various groups within organizations.
- AI experts typically define success as the creation of sophisticated AI solutions and high-accuracy models.
- In contrast, executives prioritize financial gains and cost savings from AI initiatives.
- Innovation managers, such as product managers, often find themselves caught between these two perspectives.
- Additionally, technical terms like "precision," "recall," and "accuracy," which assess model success, can confuse managers.
Poorly framed AI initiatives, further complicate the ability to envision impact and articulate measurable metrics, contributing to the overall confusion.
What Do Successful AI Initiatives Look Like?
MBU : 最终用户真正采用并且满意(优秀的UIUX),两个基本面:模型性能 + 业务价值
- Pillar 1: Model Success (whether your AI model is performing at an acceptable level in development and production)
- Pillar 2: Business Success (whether the AI is meeting your organizational objectives)
- Pillar 3: User Success (whether users are satisfied with the AI solution and perceive it to be a valid solution)
问三个问题,需要三个肯定 - AI取得了真正的成功:
how to measure:
Pillar 1: Model Success
1, 具体任务相关的metric定义
Model success refers to how well your AI solution performs on a specific task.
For example, how accurate is your model in correctly diagnosing patients with
lung cancer? Or what’s the model’s false-positive rate in detecting duplicate
content? For ReviewCrunch’s task of complaint detection from user reviews,
they would likely be considering accuracy or precision and recall. The metrics
are always task dependent.
2, 同时关注开发和生成环境的性能
To measure model success, collaborate with your AI experts to establish the best
metrics to assess model performance—both in development (DevPerform) and
production (ProdPerform). The reason I make this distinction is because a
successful model in development does not automatically translate to a successful
model in production. So, you need to be sure you’re tracking model performance
in both scenarios. Plus, depending on the application, the way you measure
success in development can be different from how you assess model success in
production—not always but in.
In the technical world, DevPerform is often referred to as offline performance, and ProdPerform is often known as online performance.
Pillar 2: Business Success (ROAIs)
To recall, ROAI stands for return on AI investment, and it measures change over
a baseline. For example, if the goal is to decrease the number of incoming
support questions, then measuring this reduction is an example of ROAI. It ties
in closely to the pain points you’re looking to ease with AI, along with the
overall benefits the automation provides. You can use one or more ROAIs for a
single problem.
Here are several things you must do to measure business success once you’ve
determined the business metrics of interest:
- Establish a baseline for each metric (your starting point)
- Set your expected ROAIs (your targets)
- Continuously track ROAIs (the current improvement)
- Continuously track the percentage of expected ROAI reached (the intended
improvement attained)
An example:
The Relationship between Business Success and Model Success
diminishing return
Bottom line: don’t wait for a perfect model to test business success, but of
course, models must pass a minimum quality threshold.
投入产出依赖业务价值,一味追求模型精度导致巨大投入,但可能产出甚微。
达到收益递减点时,模型达到上限,如果依然无法提升ROAI,此时要重新审视模型。
As you improve models, the ROAI will also improve. Yet, there’ll be a point of diminishing returns.
Pillar 3: User Success
regardless of how well your model performs or how much of the expected ROAI is achieved,
if the AI users are not satisfied with the solution, there’ll be user adoption risks.
Plus, by evaluating user success, you can surface a range of unknown model and
non-model issues just by talking to users. 【客户访谈】
Qualitative user feedback:
- Unknown Model Issues
- Non-Model Issues
- User Satisfaction & Adoption Risks
examples:
- Question 1: Ar1. e you able to easily review the automatically extracted
complaints? (This question surfaces issues in the workflow.) - Question 2: What would you say about the quality of the AI output that
you’re seeing? (This question surfaces “hidden” model issues.) - Question 3: Do you feel that the AI solution is making you more productive?
(This question surfaces user satisfaction issues.) - Question 4: Do you see yourself continuing to use the AI solution three years
from now? (This question surfaces user adoption risks.)
non-model Factors
客户的抱怨可能不是模型的问题,或许问题定义和目标本身没有意义。此外一些UIUX和非AI相关的问题也会导致客户抱怨
There is a tendency for companies to blame models when users are unhappy or
the ROAIs are miserable. That’s an easy way out, as models don’t get offended.
But this is wrong. The majority of the time, if models are well developed and
tested, poor ROAIs are often caused by non-model factors. Also, although
certain complaints from the AI users would require model improvements, many
will be non-model related. Some common non-model factors that impact AI
success include:
- A poor user interface
- The lack of user training in consuming the AI output
- Wrong metrics to measure ROAI
- Network latency that causes delays in accessing the AI output
The success of an AI initiative consists of three core pillars: model success,
business success, and user success. Non-model factors can also affect the
success of AI initiatives, but problems can be discovered while evaluating the
three success pillars.
put three pillars into action
Stage 1: Establish Metrics
- Identify Business Metrics: Determine relevant metrics for measuring Return on AI Investment (ROAI).
- Collaborate with AI Experts: Work with AI specialists to define metrics for Development Performance (DevPerform) and Production Performance (ProdPerform).
- Understand the Metrics: Ensure you comprehend what each metric signifies and what it measures.
- Set Targets: Establish all necessary metrics and targets before proceeding to the Post Development Testing (PDT) phase.
Stage 2: Prepare for Evaluation
As development is underway, you should lay the groundwork for PDT. The goal
here is to set up the foundation so that later, when the AI model is ready for use,
you can promptly start measuring ProdPerform (model success in production) as
well as your ROAIs. Some of the preparation work for PDT can include:
- Hire Evaluators: Recruit human evaluators for assessing the AI model.
- Engineering Setup: Prepare the necessary engineering infrastructure for AI result consumption, workflow integration, and data collection (e.g., click-through rates).
- Dashboard Setup: Create dashboards for monitoring metrics.
Stage 3a: Integrate Model and Measure Performance
Track ProdPerform: Monitor the model's performance in production.
Track ROAIs: Measure the Return on AI Investments and compare them against expected values.
Wait for Data: Accumulate enough data over time to make informed assessments.
Iterate on Model: If ProdPerform is suboptimal, refine the model until acceptable performance is achieved.
Observe ROAI Trends: Look for positive changes in ROAIs and decide whether to continue iterating or to stop if targets are met.
Investigate Issues: If ROAI is not improving or declining, thoroughly investigate both model and non-model factors without jumping to conclusions.
Stage 3b: Collect and Evaluate User Feedback
Gather User Insights: After initial iterations, collect user feedback to evaluate user success.
Fair Assessment: Assess feedback objectively and communicate any model issues to the development team or resolve non-model issues with the relevant teams.
Impact Tracking: Monitor the effects of changes on ROAI and ProdPerform.
Stage 4: Stop and Integrate?
Evaluate Success: Based on ProdPerform, ROAI, and user feedback, decide whether to stop iterating and deploy the model or to document lessons learned and shelve the initiative.
Consider Deployment: If all three success pillars are strong, proceed with deployment while possibly continuing model improvements in the background.
Use Scoring (Optional): Assign scores to each success pillar to aid in decision-making, with a score of 3 or above indicating readiness for deployment.
The following table provides some examples of when you may choose to iterate versus integrate (i.e., deploy fully).
Stage 5: Continue Tracking ROAIs and ProdPerform
Monitor Post-Deployment: Keep an eye on ProdPerform and ROAIs after deployment to ensure they remain stable or improve.
Iterate if Necessary: If a new model version is developed, test and replace the existing one if it significantly enhances ROAI.
Additional Considerations
Existing AI Solutions: If evaluating an already deployed solution, adjust Stages 3 and 4 to focus on evaluation in production and deciding on further actions.
Off-the-Shelf Solutions: The process is similar to building from scratch, minus the development phase, and iterations may involve testing different vendor solutions or customizing with your data.