首页 > 其他分享 >推荐系统-协同过滤

推荐系统-协同过滤

时间:2024-01-21 18:23:21浏览次数:29  
标签:协同 people may 推荐 过滤 systems recommendations evaluations

前言

今天继续读论文,今天读的文章是一篇1997年发表于Communications of the ACM上的论文,名叫推荐系统(Recommender systems),

全文及翻译

IT IS OFTEN NECESSARY TO MAKE CHOICES WITHOUT SUFFICIENTpersonal experience of the alternatives. In everyday life, we rely on recommendations from other people either by word of mouth, recommendation letters, movie and book reviews printed in newspapers, orgeneral surveys such as Zagat’s restaurant guides.

在没有足够的个人经验的情况下,常常需要做出选择。在日常生活中,我们依赖他人的推荐,要么是口口相传,要么是推荐信,要么是报纸上的电影和书评,要么是像Zagat餐厅指南这样的一般性调查。

Recommender systems assist and augment thisnatural social process. In a typical recommender system people provide recommendations as inputs,which the system then aggregates and directs toappropriate recipients. In some cases the primarytransformation is in the aggregation; in others thesystem’s value lies in its ability to make good matches between the recommenders and those seeking recommendations.

推荐系统协助并增强了这一自然的社会过程。在典型的推荐系统中,人们提供建议作为输入,然后系统将其聚合并定向到适当的接收者。在某些情况下,主要的转换是在聚合中;在其他情况下,系统的价值在于它能够在推荐者和寻求推荐者之间进行良好的匹配。

The developers of the first recommender system, Tapestry, coined the phrase “collaborative filtering”and several others have adopted it. We prefer the moregeneral term “recommender system” for two reasons. First, recommenders may not explictly collaborate with recipients, who may beunknown to each other. Second, recommendations may suggest particularly interesting items, in addition to indicating those that should be filtered out.

第一个推荐系统Tapestry的开发者创造了“协同过滤”这个词,其他几个推荐系统也采用了这个词。出于两个原因,我们更喜欢更通用的术语“推荐系统”。首先,推荐人可能不会明确地与收件人合作,他们可能彼此不认识。其次,推荐可能会建议特别有趣的项目,除了指出那些应该被过滤掉。

This special section includes descriptions of five recommender systems. A sixth article analyzes incentives for provision of recommendations.

这个特殊的部分包括五个推荐系统的描述。第六篇文章分析了提供建议的动机。

Figure 1 places the systems in a technical design space defined by five dimensions. First, the contents of an evaluation can be anything from a single bit (recommended or not) to unstructured textual annotations.Second, recommendations may be entered explicitly, but several systems gather implicit evaluations: GroupLens monitors users’ reading times;PHOAKS mines Usenet articles for mentions of URLs; and Siteseer mines personal bookmark lists.Third, recommendations may be anonymous, tagged with the source’s identity, or tagged with a pseudonym. The fourth dimension, and one of the richest areas for exploration, is how to aggregate evaluations.GroupLens, PHOAKS, and Siteseer employ variantson weighted voting. Fab takes that one step further to combine evaluations with content analysis. ReferralWeb combines suggested links between people to form longer referral chains. Finally, the (perhaps aggregated)evaluations may be used in several ways: negative recommendations may be filtered out, the items may be sorted according to numeric evaluations, or evaluations may accompany items in a display.Figures 2 and 3 identify dimensions of the domain space: The kinds of items being recommended and the people among whom evaluations are shared. Consider, first, the domain ofitems. The sheer volume is an important variable: Detailed textual reviews of restaurants or movies may be practical, but applying the same approach to thousands of daily Netnews messages would not. Ephemeral media such as netnews(most news servers throw away articles after one or two weeks) place a premium on gathering and distributing evaluations quickly, while evaluations for 19th century books can be gathered at a more leisurely pace. The last dimension describes the cost structure of choices people make about the items.Is it very costly to miss a good item or sample a bad one?How do those costs compare to the benefits of hitting a good one? This cost structure is likely to interact with technical design choices. For example, when the costs of incorrect decisions are high, as they would be, say, with evaluations of medical treatments, evaluations that convey more nuances are likely to be more useful.

图1将系统置于由五个维度定义的技术设计空间中。首先,求值的内容可以是任何东西,从一个比特位(推荐与否)到非结构化的文本注释。第二,推荐可能是显式输入的,但一些系统会收集隐式评估:GroupLens监控用户的阅读时间,PHOAKS挖掘Usenet文章中提到的url;Siteseer挖掘个人书签列表。第三,推荐可以是匿名的,标记了消息来源的身份,或者标记了假名。第四个维度,也是最丰富的探索领域之一,是如何聚合评估。GroupLens、PHOAKS和Siteseer采用了variantson加权投票。Fab在这方面更进一步,将评估与内容分析结合起来。推荐网结合了人们之间的建议链接,形成更长的推荐链。最后,(可能是聚合的)评估可以以多种方式使用:过滤负面推荐,根据数值评估对项目进行排序,或者评估可以在显示中伴随项目。图2和图3确定了领域空间的维度:被推荐的项目的种类和在其中共享评估的人员。首先,考虑物品的领域。数量是一个重要的变量:餐馆或电影的详细文本评论可能是实用的,但将同样的方法应用于每日数千条网络新闻消息就不实用了。像网络新闻这样的短暂媒体(大多数新闻服务器在一两个星期后就会扔掉文章)重视快速收集和分发评估,而19世纪书籍的评估可以以更悠闲的速度收集。最后一个维度描述了人们对物品所做选择的成本结构。错过一件好东西或者拿一件不好的样品,代价会很大吗?这些成本与获得一个好游戏的收益相比如何?这种成本结构可能与技术设计选择相互影响。例如,当错误决策的成本很高时,比如医疗评估,传达更多细微差别的评估可能会更有用。

Next, consider the set of recommendations and the people providing and consuming them. Who provides recommendations? Do they tend to evaluate many items in common, leading to a dense set of recommendations? How many consumers are there, and do their tastes vary? These factors also will interact with technical choices. For example, matching people by tastes automatically is far more valuable in a larger set of people who may not know each other. Personalized aggregation of recommendations will be more valuable when people’s tastes differ than when there are a few experts.

接下来,考虑推荐集以及提供和使用它们的人。谁提供建议?它们是否倾向于评估许多共同的项目,从而产生密集的推荐集合?有多少消费者,他们的口味不同吗?这些因素也将与技术选择相互作用。例如,在一群彼此不认识的人当中,根据口味自动配对就更有价值。当人们的品味不同时,个性化的推荐聚合将比只有几个专家时更有价值。

Social Implications 社会影响

Recommender systems introduce two interesting incentive problems. First, once one has established a profile of interests, it is easy to free ride by consuming evaluations provided by others. Moreover, as Avery and Zeckhauser argue, this problem is not entirely solved even if evaluations are gathered implicitly from existing resources or from monitoring user behavior.Future systems will likely need to offer some incentive for the provision of recommendations by making it a prerequisite for receiving recommendations or by offering monetary compensation. Second, if anyone can provide recommendations, content owners may generate mountains of positive recommendations for their own materials and negative recommendations for their competitors. Future systems are likely to introduce precautions that discourage the “vote early and often” phenomenon.

推荐系统引入了两个有趣的激励问题。首先,一旦一个人建立了兴趣的轮廓,很容易通过消费其他人提供的评价搭便车。此外,正如Avery和Zeckhauser所指出的,即使评估是从现有资源或监控用户行为中隐式收集的,这个问题也没有完全解决。未来的系统可能需要通过将提供建议作为接受建议的先决条件或通过提供货币补偿来为提供建议提供一些激励。其次,如果有人可以提供推荐,内容所有者可能会为自己的材料产生大量的正面推荐,为竞争对手产生大量的负面推荐。未来的系统可能会引入预防措施,阻止“尽早和经常投票”的现象。

Recommender systems also raise concerns about personal privacy. In general, the more information individuals have about the recommendations, the better they will be able to evaluate those recommendations. However, people may not want their habits or views widely known. Some recommender systems permit anonymous participation or participation under a pseudonym, but this is not a complete solution since some people may desire an intermediate blend of privacy and attributed credit for their efforts. Both incentive and privacy problems arise in an evaluation-sharing system familiar to our readers: the peer review system used in academia. With respect to incentives, every editor knows the best source for a prompt and careful review is an author who currently has an article under consideration. With respect to privacy, blind and double-blind refereeing are common practices. These practices evolved to solve problems inherent to the refereeing process, and it may be worthwhile to consider ways to incorporate such practices into automated systems.

推荐系统也引起了对个人隐私的关注。一般来说,人们获得的推荐信息越多,他们就越能更好地评估这些推荐。然而,人们可能不希望自己的习惯或观点广为人知。一些推荐系统允许匿名参与或使用假名参与,但这不是一个完整的解决方案,因为有些人可能希望他们的努力得到隐私和荣誉的中间混合。在我们的读者所熟悉的评价共享系统中,激励和隐私问题同时出现:学术界使用的同行评议系统。关于激励机制,每个编辑都知道,及时仔细审查的最佳来源是当前正在考虑文章的作者。就隐私而言,盲裁判和双盲裁判是常见的做法。这些实践的发展是为了解决评审过程中固有的问题,考虑将这些实践合并到自动化系统中的方法可能是值得的。

image

image

image

Business Models 商业模式

Maintenance of a recommender system is costly, and it is worth thinking about what business models might be used to generate revenues sufficient to cover those costs. One model is to charge recipients of recommendations either through subscriptions or pay-per-use. A second model for cost recovery is advertiser support, as Firefly (http://www.firefly.com) seems to provide. Presumably advertisers would find such systems very useful since they generate detailed marketing information about consumers. If a user revealed a taste for, say, cyberpunk books, publishers could make sure the users saw ads targeted to that market. A third model is to charge a fee to the owners of the items being evaluated. For example, filmmakers pay a fee for official ratings of their movies.

推荐系统的维护成本很高,考虑使用什么商业模式可以产生足够的收入来支付这些成本是值得的。一种模式是通过订阅或按次付费向推荐的接受者收费。第二个回收成本的模式是广告商支持,就像Firefly (http://www.firefly.com)提供的那样。想必广告商会发现这样的系统非常有用,因为它们可以生成关于消费者的详细营销信息。如果用户透露出对赛博朋克书籍的兴趣,出版商可以确保用户看到针对该市场的广告。第三种模式是向被评估物品的所有者收取费用。例如,电影制作人为其电影的官方评级支付费用。

The latter two business models both carry a danger of corruption. Mass market computer magazines that carry ads and reviews are often accused of biasing reviews toward companies that are heavy advertisers. In this case, the perception of bias is almost as bad as the reality. Recommender systems that collect fees from advertisers or others who may have a vested interest in the contents of the recommendations must be very careful to make sure that users recognize the difference between unbiased recommendations and advertisements in order to maintain credibility with their readers.

后两种商业模式都有腐败的危险。刊登广告和评论的大众市场电脑杂志经常被指责为对大量广告公司的评论有偏见。在这种情况下,偏见的感知几乎和现实一样糟糕。从广告商或其他可能对推荐内容有既得利益的人那里收取费用的推荐系统必须非常小心,以确保用户认识到无偏见的推荐和广告之间的区别,以便在读者中保持可信度。

There are economies of scale in recommender systems: The bigger the set of users, the more likely I am to find someone like me. Hence, other things being equal, I would prefer to use the biggest system. When several recommender systems start to compete in a given market, we should expect to see very intense competition since there is likely to be only one eventual survivor. This argument suggests that a possible market structure will be one or two big players in each medium or subject area who then subcontract with sellers of products to provide recommendations as a value-added service. For example, a book rating/review service might operate autonomously and sell its recommendation services to a number of independent online bookstores. It should be noted the independence of the rating/review service may also help to solve the problem of credibility.

推荐系统存在规模经济效应:用户越多,我就越有可能找到像我这样的人。因此,在其他条件相同的情况下,我更倾向于使用最大的系统。当几个推荐系统开始在一个给定的市场上竞争时,我们应该看到非常激烈的竞争,因为很可能只有一个最终的幸存者。这一观点表明,一个可能的市场结构将是在每个媒介或主题领域有一两个大玩家,然后他们与产品销售商分包合同,提供作为增值服务的推荐。例如,图书评级/评论服务可以自主运行,并将其推荐服务出售给许多独立的在线书店。应该指出的是,评级/审查服务的独立性也可能有助于解决可信度问题。

A flurry of commercial ventures have recently introduced recommender systems for products ranging from Web URLs to music, videos, and books. In the coming years, we can look forward to continued technical innovation, and a better understanding of which technical features are best suited to various characteristics of the items evaluated and the people who participate in the process.

最近,一大批商业企业推出了推荐系统,推荐的产品范围从网址到音乐、视频和书籍。在未来几年里,我们可以期待持续的技术革新,并更好地了解哪些技术特征最适合被评估项目的各种特征和参与过程的人员。

标签:协同,people,may,推荐,过滤,systems,recommendations,evaluations
From: https://www.cnblogs.com/wephiles/p/17978043

相关文章

  • Java如何过滤掉一段字符串中出现重复的字母或数字?
    可以使用Java中的HashSet来去除一段字符串中出现重复的字母或数字。HashSet是一个不允许有重复元素的集合,因此可以利用它的特性来去除重复的字符或数字。示例代码如下:importjava.util.HashSet;publicclassRemoveDuplicates{publicstaticvoidmain(String[]args){......
  • springboot项目结合filter,jdk代理实现敏感词过滤(简单版)
    我们对getParameter()这个方法得到的参数进行敏感词过滤。实现思路:利用过滤器拦截所有的路径请求同时在在过滤器执行的时候对getParameter得到的value值进行过滤。最后呢,到我们自己的实现的逻辑中呢?这个value值就被我们做过处理了。1:自定义的过滤配置文件把文件位置放在resource下的......
  • 2024年常用的数据恢复软件推荐
    引言:在现代社会中,我们越来越依赖于电子设备来保存和管理我们的个人和工作数据。然而,数据丢失的风险也随之增加。无论是由于误删除、硬件故障还是其他原因,数据丢失对我们造成的损失都是不可忽视的。因此,具备一款可靠的、专业的数据恢复软件是非常有必要的。本文将向大家推荐几款值得......
  • 基于协同过滤的音乐推荐算法实现
    基于协同过滤的音乐推荐算法实现导入相关模块importpandasaspdimportnumpyasnp#importtimeimportsqlite3读取、清洗数据#读取数据triplet_dataset=pd.read_csv(filepath_or_buffer=data_home+'train_triplets.txt',sep='\t'......
  • 2024年好用的中国香港云服务器(外贸建站推荐)
    ​2024年,对于外贸建站,越来越多的企业正在经历数字化转型中。关于出海上云需求,各个厂商拿出众多的产品和解决方案,以优惠价格提供给广大客户,帮助企业轻松满足,尤其适合外贸建站的中国香港云服务器比较受青睐。本文将为您推荐几款适合建站的2024年中国香港云服务器,帮助您选择。......
  • 基于矩阵分解的协同过滤算法
    引言随着互联网、大数据等新技术的迅速发展,人们的生活变得更加便捷,但同时也导致网络数据爆炸式增长。为了快速帮助用户找到感兴趣的内容,越来越多的研究者致力于推荐算法的研究,以提高推荐质量,向用户推荐更符合其喜好的内容。然而,目前的推荐算法仍存在数据稀疏性、隐私保护和冷启动......
  • 协同过滤笔记
    笔记记录一下学习工作中遇到的一些知识,以防遗忘,不清楚的可以回来再看。一些专有名词embedding:隐向量非常重要无处不在召回:粗略计算要返回结果,例如从100W商品中取比较可能的100个负采样负采样(NegativeSampling)是一种用于训练词嵌入模型的技术。在自然语言处理中,词嵌入......
  • 后端登陆的过滤器
    后端登陆的过滤器packagecom.itheima.filter;importcom.google.gson.Gson;importcom.google.gson.JsonObject;importcom.itheima.pojo.Result;importcom.itheima.utils.JwtUtils;importlombok.extern.slf4j.Slf4j;importorg.springframework.boot.configurationpro......
  • 精品IDEA插件推荐:Apipost-Helper
    Apipost-Helper是由Apipost推出的IDEA插件,写完接口可以进行快速调试,且支持搜索接口、根据method跳转接口,还支持生成标准的API文档,注意:这些操作都可以在代码编辑器内独立完成,非常好用!这里给大家介绍一下Apipost-Helper的安装和使用安装在IDEA编辑器插件中心输入Apipost搜索安装:......
  • 推荐两个用于下载在线视频的在线工具
     只看楼主推荐两个用于下载在线视频的在线工具浏览(853) 评论(5)1 楼阮高峰楼主04-2508:32 推荐两个可以用于下载视频分享站点的短视频的网站,供有需要的老师择用。第一个叫“小视频”,地址是:http://www.downfi.com/video/ ,支持下载的站点见图片。操作时将包含视......