首页 > 其他分享 >Chapter 14. Measure Success

Chapter 14. Measure Success

时间:2024-12-26 10:42:00浏览次数:7  
标签:Chapter 14 success AI ROAIs ROAI Success user model

The success of AI is perceived differently by various groups within organizations.

  • AI experts typically define success as the creation of sophisticated AI solutions and high-accuracy models.
  • In contrast, executives prioritize financial gains and cost savings from AI initiatives.
  • Innovation managers, such as product managers, often find themselves caught between these two perspectives.
  • Additionally, technical terms like "precision," "recall," and "accuracy," which assess model success, can confuse managers.

Poorly framed AI initiatives, further complicate the ability to envision impact and articulate measurable metrics, contributing to the overall confusion.

What Do Successful AI Initiatives Look Like?

MBU : 最终用户真正采用并且满意(优秀的UIUX),两个基本面:模型性能 + 业务价值

  1. Pillar 1: Model Success (whether your AI model is performing at an acceptable level in development and production)
  2. Pillar 2: Business Success (whether the AI is meeting your organizational objectives)
  3. Pillar 3: User Success (whether users are satisfied with the AI solution and perceive it to be a valid solution)

问三个问题,需要三个肯定 - AI取得了真正的成功:

how to measure:

Pillar 1: Model Success

1, 具体任务相关的metric定义
Model success refers to how well your AI solution performs on a specific task.
For example, how accurate is your model in correctly diagnosing patients with
lung cancer? Or what’s the model’s false-positive rate in detecting duplicate
content? For ReviewCrunch’s task of complaint detection from user reviews,
they would likely be considering accuracy or precision and recall. The metrics
are always task dependent.

2, 同时关注开发和生成环境的性能
To measure model success, collaborate with your AI experts to establish the best
metrics to assess model performance—both in development (DevPerform) and
production (ProdPerform). The reason I make this distinction is because a
successful model in development does not automatically translate to a successful
model in production. So, you need to be sure you’re tracking model performance
in both scenarios. Plus, depending on the application, the way you measure
success in development can be different from how you assess model success in
production—not always but in.

devperf VS prodperf
In the technical world, DevPerform is often referred to as offline performance, and ProdPerform is often known as online performance.

Pillar 2: Business Success (ROAIs)

To recall, ROAI stands for return on AI investment, and it measures change over
a baseline. For example, if the goal is to decrease the number of incoming
support questions, then measuring this reduction is an example of ROAI. It ties
in closely to the pain points you’re looking to ease with AI, along with the
overall benefits the automation provides. You can use one or more ROAIs for a
single problem.

Here are several things you must do to measure business success once you’ve
determined the business metrics of interest:

  1. Establish a baseline for each metric (your starting point)
  2. Set your expected ROAIs (your targets)
  3. Continuously track ROAIs (the current improvement)
  4. Continuously track the percentage of expected ROAI reached (the intended
    improvement attained)

An example:

The Relationship between Business Success and Model Success

diminishing return
Bottom line: don’t wait for a perfect model to test business success, but of
course, models must pass a minimum quality threshold.

投入产出依赖业务价值,一味追求模型精度导致巨大投入,但可能产出甚微。
达到收益递减点时,模型达到上限,如果依然无法提升ROAI,此时要重新审视模型。

As you improve models, the ROAI will also improve. Yet, there’ll be a point of diminishing returns.

Pillar 3: User Success

regardless of how well your model performs or how much of the expected ROAI is achieved,
if the AI users are not satisfied with the solution, there’ll be user adoption risks.
Plus, by evaluating user success, you can surface a range of unknown model and
non-model issues just by talking to users. 【客户访谈】
Qualitative user feedback:

  • Unknown Model Issues
  • Non-Model Issues
  • User Satisfaction & Adoption Risks
    examples:
  1. Question 1: Ar1. e you able to easily review the automatically extracted
    complaints? (This question surfaces issues in the workflow.)
  2. Question 2: What would you say about the quality of the AI output that
    you’re seeing? (This question surfaces “hidden” model issues.)
  3. Question 3: Do you feel that the AI solution is making you more productive?
    (This question surfaces user satisfaction issues.)
  4. Question 4: Do you see yourself continuing to use the AI solution three years
    from now? (This question surfaces user adoption risks.)

non-model Factors

客户的抱怨可能不是模型的问题,或许问题定义和目标本身没有意义。此外一些UIUX和非AI相关的问题也会导致客户抱怨

There is a tendency for companies to blame models when users are unhappy or
the ROAIs are miserable. That’s an easy way out, as models don’t get offended.
But this is wrong. The majority of the time, if models are well developed and
tested, poor ROAIs are often caused by non-model factors. Also, although
certain complaints from the AI users would require model improvements, many
will be non-model related. Some common non-model factors that impact AI
success include:

  • A poor user interface
  • The lack of user training in consuming the AI output
  • Wrong metrics to measure ROAI
  • Network latency that causes delays in accessing the AI output

The success of an AI initiative consists of three core pillars: model success,
business success, and user success. Non-model factors can also affect the
success of AI initiatives, but problems can be discovered while evaluating the
three success pillars.

put three pillars into action

Stage 1: Establish Metrics

  1. Identify Business Metrics: Determine relevant metrics for measuring Return on AI Investment (ROAI).
  2. Collaborate with AI Experts: Work with AI specialists to define metrics for Development Performance (DevPerform) and Production Performance (ProdPerform).
  3. Understand the Metrics: Ensure you comprehend what each metric signifies and what it measures.
  4. Set Targets: Establish all necessary metrics and targets before proceeding to the Post Development Testing (PDT) phase.

Stage 2: Prepare for Evaluation

As development is underway, you should lay the groundwork for PDT. The goal
here is to set up the foundation so that later, when the AI model is ready for use,
you can promptly start measuring ProdPerform (model success in production) as
well as your ROAIs. Some of the preparation work for PDT can include:

  1. Hire Evaluators: Recruit human evaluators for assessing the AI model.
  2. Engineering Setup: Prepare the necessary engineering infrastructure for AI result consumption, workflow integration, and data collection (e.g., click-through rates).
  3. Dashboard Setup: Create dashboards for monitoring metrics.

Stage 3a: Integrate Model and Measure Performance

Track ProdPerform: Monitor the model's performance in production.
Track ROAIs: Measure the Return on AI Investments and compare them against expected values.
Wait for Data: Accumulate enough data over time to make informed assessments.
Iterate on Model: If ProdPerform is suboptimal, refine the model until acceptable performance is achieved.
Observe ROAI Trends: Look for positive changes in ROAIs and decide whether to continue iterating or to stop if targets are met.
Investigate Issues: If ROAI is not improving or declining, thoroughly investigate both model and non-model factors without jumping to conclusions.

Stage 3b: Collect and Evaluate User Feedback

Gather User Insights: After initial iterations, collect user feedback to evaluate user success.
Fair Assessment: Assess feedback objectively and communicate any model issues to the development team or resolve non-model issues with the relevant teams.
Impact Tracking: Monitor the effects of changes on ROAI and ProdPerform.

Stage 4: Stop and Integrate?

Evaluate Success: Based on ProdPerform, ROAI, and user feedback, decide whether to stop iterating and deploy the model or to document lessons learned and shelve the initiative.
Consider Deployment: If all three success pillars are strong, proceed with deployment while possibly continuing model improvements in the background.
Use Scoring (Optional): Assign scores to each success pillar to aid in decision-making, with a score of 3 or above indicating readiness for deployment.

The following table provides some examples of when you may choose to iterate versus integrate (i.e., deploy fully).

Stage 5: Continue Tracking ROAIs and ProdPerform

Monitor Post-Deployment: Keep an eye on ProdPerform and ROAIs after deployment to ensure they remain stable or improve.
Iterate if Necessary: If a new model version is developed, test and replace the existing one if it significantly enhances ROAI.

Additional Considerations

Existing AI Solutions: If evaluating an already deployed solution, adjust Stages 3 and 4 to focus on evaluation in production and deciding on further actions.
Off-the-Shelf Solutions: The process is similar to building from scratch, minus the development phase, and iterations may involve testing different vendor solutions or customizing with your data.

标签:Chapter,14,success,AI,ROAIs,ROAI,Success,user,model
From: https://www.cnblogs.com/luweiseu/p/18632215

相关文章

  • Chapter 13 Build or Buy
    ManyorganizationstodaystartbyhiringateamofdatascientiststoimplementAI.Yet,thisisnottheonlywayofbringingyourAIvisiontolife,norisitthemostcost-effective,especiallywhenyou’rejustgettingstarted.differentstrategiesfor......
  • oscp备战系列-Kioptrix2014
    文章目录一、信息收集二、漏洞探测三、漏洞利用四、后渗透一、信息收集主机探测nmap192.168.30.0/24-sP端口及版本探测nmap192.168.30.199-sV可以看到开放了80,8080端口,采用apache2.2.21mod_ssl2.2.21openssl0.9.8qWebDAV2php5.3.8OS:FreeBSD,22端......
  • Chapter 5. How AI can improve Business Process
    AIcanimproveBusinessProcess(customerservice&HR)CustomerServiceWorkloadReduction:Employeeburnoutisasignificantissueincustomerservice.AIAssistants:VirtualAIassistantscandecreasesupportrequestvolumebyproviding24/7suppo......
  • # 学期(如2024-2025-1) 学号(如:20241402) 《计算机基础与程序设计》第14周学习总结
    学期(如2024-2025-1)学号(如:20241402)《计算机基础与程序设计》第14周学习总结作业信息这个作业属于哪个课程<班级的链接>(如2024-2025-1-计算机基础与程序设计)这个作业要求在哪里<作业要求的链接>(如2024-2025-1计算机基础与程序设计第一周作业)这个作业的目标<写上......
  • Chapter 10-11-12. Find AI Opportunities - 4 Stages
    WhoseJobisAIIt’scommonformanagementteamstoassumethatdatascientistsinherentlyknowwhichproblemstosolveforthecompany.However,thisbottom-upapproachtoAIrarelyleadstomeaningfulresults.WhiledatascientistsandMLengineerscan......
  • P3313 [SDOI2014] 旅行
    P3313[SDOI2014]旅行题意简述:给一颗树,点有点权以及颜色,要求实现四种操作:1.修改某点点权2.修改某点颜色3.求一条树上最短路(x,y)上颜色与x,y都相同的点的点权和,保证x,y颜色相同4.求一条树上最短路(x,y)上颜色与x,y都相同的点的点权最大值,保证x,y颜色相同$1\len,m......
  • Linux期末考试简答题(14道)
    1、FHS中,“/usr”目录的“include”、“src”、“share”、“local”、“lib”子目录分别有哪些用途?答:在FHS(FilesystemHierarchyStandard)中,/usr目录通常用于存储只读的、共享的用户程序和数据文件。其子目录的用途如下:(1)“/usr/include”目录用于存放C程序设计语言使用的标......
  • Next.js 14 基础入门:从项目搭建到核心概念
    Next.js14带来了许多激动人心的新特性,包括局部渲染、ServerActions增强等。作为一名前端开发者,我最近在项目中升级到了Next.js14,今天就来分享一下从项目搭建到实际应用的完整过程。项目初始化首先,让我们创建一个全新的Next.js14项目:#使用create-next-app创建项目n......
  • Chapter 8, 9 B-CIDS: 5 pillars of AI preparation → Jumpstart Approach
    B-CIDSAI-Readiness:AcompanyisAI-readywhenitcansmoothlyprogressfromAIconcepttoimplementationandbenefitrealization,anddosoconsistently.Preparation:AchievingAI-readinessisacomprehensiveprocessinvolvingcompanyculture,talent,......
  • Chapter 6 Optimize decision making with AI - Simple versus Intelligent Data Anal
    decisionmaking≈datadrivendecisionmakingData-drivendecision-makingreferstoleveragingaggregatedandsummarizeddatatodrivecriticaldecisions.Thedataservesasacompass,allowingyoutorefineyour“gutfeeling”andminimizebiasinyourde......