推荐系统-协同过滤

标签：协同 people may 推荐过滤 systems recommendations evaluations

前言

今天继续读论文，今天读的文章是一篇1997年发表于Communications of the ACM上的论文,名叫推荐系统(Recommender systems),

全文及翻译

IT IS OFTEN NECESSARY TO MAKE CHOICES WITHOUT SUFFICIENTpersonal experience of the alternatives. In everyday life, we rely on recommendations from other people either by word of mouth, recommendation letters, movie and book reviews printed in newspapers, orgeneral surveys such as Zagat’s restaurant guides.

在没有足够的个人经验的情况下，常常需要做出选择。在日常生活中，我们依赖他人的推荐，要么是口口相传，要么是推荐信，要么是报纸上的电影和书评，要么是像Zagat餐厅指南这样的一般性调查。

Recommender systems assist and augment thisnatural social process. In a typical recommender system people provide recommendations as inputs,which the system then aggregates and directs toappropriate recipients. In some cases the primarytransformation is in the aggregation; in others thesystem’s value lies in its ability to make good matches between the recommenders and those seeking recommendations.

推荐系统协助并增强了这一自然的社会过程。在典型的推荐系统中，人们提供建议作为输入，然后系统将其聚合并定向到适当的接收者。在某些情况下，主要的转换是在聚合中;在其他情况下，系统的价值在于它能够在推荐者和寻求推荐者之间进行良好的匹配。

The developers of the first recommender system, Tapestry, coined the phrase “collaborative filtering”and several others have adopted it. We prefer the moregeneral term “recommender system” for two reasons. First, recommenders may not explictly collaborate with recipients, who may beunknown to each other. Second, recommendations may suggest particularly interesting items, in addition to indicating those that should be filtered out.

第一个推荐系统Tapestry的开发者创造了“协同过滤”这个词，其他几个推荐系统也采用了这个词。出于两个原因，我们更喜欢更通用的术语“推荐系统”。首先，推荐人可能不会明确地与收件人合作，他们可能彼此不认识。其次，推荐可能会建议特别有趣的项目，除了指出那些应该被过滤掉。

This special section includes descriptions of five recommender systems. A sixth article analyzes incentives for provision of recommendations.

这个特殊的部分包括五个推荐系统的描述。第六篇文章分析了提供建议的动机。

Figure 1 places the systems in a technical design space defined by five dimensions. First, the contents of an evaluation can be anything from a single bit (recommended or not) to unstructured textual annotations.Second, recommendations may be entered explicitly, but several systems gather implicit evaluations: GroupLens monitors users’ reading times;PHOAKS mines Usenet articles for mentions of URLs; and Siteseer mines personal bookmark lists.Third, recommendations may be anonymous, tagged with the source’s identity, or tagged with a pseudonym. The fourth dimension, and one of the richest areas for exploration, is how to aggregate evaluations.GroupLens, PHOAKS, and Siteseer employ variantson weighted voting. Fab takes that one step further to combine evaluations with content analysis. ReferralWeb combines suggested links between people to form longer referral chains. Finally, the (perhaps aggregated)evaluations may be used in several ways: negative recommendations may be filtered out, the items may be sorted according to numeric evaluations, or evaluations may accompany items in a display.Figures 2 and 3 identify dimensions of the domain space: The kinds of items being recommended and the people among whom evaluations are shared. Consider, first, the domain ofitems. The sheer volume is an important variable: Detailed textual reviews of restaurants or movies may be practical, but applying the same approach to thousands of daily Netnews messages would not. Ephemeral media such as netnews(most news servers throw away articles after one or two weeks) place a premium on gathering and distributing evaluations quickly, while evaluations for 19th century books can be gathered at a more leisurely pace. The last dimension describes the cost structure of choices people make about the items.Is it very costly to miss a good item or sample a bad one?How do those costs compare to the benefits of hitting a good one? This cost structure is likely to interact with technical design choices. For example, when the costs of incorrect decisions are high, as they would be, say, with evaluations of medical treatments, evaluations that convey more nuances are likely to be more useful.

图1将系统置于由五个维度定义的技术设计空间中。首先，求值的内容可以是任何东西，从一个比特位(推荐与否)到非结构化的文本注释。第二，推荐可能是显式输入的，但一些系统会收集隐式评估:GroupLens监控用户的阅读时间，PHOAKS挖掘Usenet文章中提到的url;Siteseer挖掘个人书签列表。第三，推荐可以是匿名的，标记了消息来源的身份，或者标记了假名。第四个维度，也是最丰富的探索领域之一，是如何聚合评估。GroupLens、PHOAKS和Siteseer采用了variantson加权投票。Fab在这方面更进一步，将评估与内容分析结合起来。推荐网结合了人们之间的建议链接，形成更长的推荐链。最后，(可能是聚合的)评估可以以多种方式使用:过滤负面推荐，根据数值评估对项目进行排序，或者评估可以在显示中伴随项目。图2和图3确定了领域空间的维度:被推荐的项目的种类和在其中共享评估的人员。首先，考虑物品的领域。数量是一个重要的变量:餐馆或电影的详细文本评论可能是实用的，但将同样的方法应用于每日数千条网络新闻消息就不实用了。像网络新闻这样的短暂媒体(大多数新闻服务器在一两个星期后就会扔掉文章)重视快速收集和分发评估，而19世纪书籍的评估可以以更悠闲的速度收集。最后一个维度描述了人们对物品所做选择的成本结构。错过一件好东西或者拿一件不好的样品，代价会很大吗?这些成本与获得一个好游戏的收益相比如何?这种成本结构可能与技术设计选择相互影响。例如，当错误决策的成本很高时，比如医疗评估，传达更多细微差别的评估可能会更有用。

Next, consider the set of recommendations and the people providing and consuming them. Who provides recommendations? Do they tend to evaluate many items in common, leading to a dense set of recommendations? How many consumers are there, and do their tastes vary? These factors also will interact with technical choices. For example, matching people by tastes automatically is far more valuable in a larger set of people who may not know each other. Personalized aggregation of recommendations will be more valuable when people’s tastes differ than when there are a few experts.

接下来，考虑推荐集以及提供和使用它们的人。谁提供建议?它们是否倾向于评估许多共同的项目，从而产生密集的推荐集合?有多少消费者，他们的口味不同吗?这些因素也将与技术选择相互作用。例如，在一群彼此不认识的人当中，根据口味自动配对就更有价值。当人们的品味不同时，个性化的推荐聚合将比只有几个专家时更有价值。

Social Implications 社会影响

Recommender systems introduce two interesting incentive problems. First, once one has established a profile of interests, it is easy to free ride by consuming evaluations provided by others. Moreover, as Avery and Zeckhauser argue, this problem is not entirely solved even if evaluations are gathered implicitly from existing resources or from monitoring user behavior.Future systems will likely need to offer some incentive for the provision of recommendations by making it a prerequisite for receiving recommendations or by offering monetary compensation. Second, if anyone can provide recommendations, content owners may generate mountains of positive recommendations for their own materials and negative recommendations for their competitors. Future systems are likely to introduce precautions that discourage the “vote early and often” phenomenon.

推荐系统引入了两个有趣的激励问题。首先，一旦一个人建立了兴趣的轮廓，很容易通过消费其他人提供的评价搭便车。此外，正如Avery和Zeckhauser所指出的，即使评估是从现有资源或监控用户行为中隐式收集的，这个问题也没有完全解决。未来的系统可能需要通过将提供建议作为接受建议的先决条件或通过提供货币补偿来为提供建议提供一些激励。其次，如果有人可以提供推荐，内容所有者可能会为自己的材料产生大量的正面推荐，为竞争对手产生大量的负面推荐。未来的系统可能会引入预防措施，阻止“尽早和经常投票”的现象。

Recommender systems also raise concerns about personal privacy. In general, the more information individuals have about the recommendations, the better they will be able to evaluate those recommendations. However, people may not want their habits or views widely known. Some recommender systems permit anonymous participation or participation under a pseudonym, but this is not a complete solution since some people may desire an intermediate blend of privacy and attributed credit for their efforts. Both incentive and privacy problems arise in an evaluation-sharing system familiar to our readers: the peer review system used in academia. With respect to incentives, every editor knows the best source for a prompt and careful review is an author who currently has an article under consideration. With respect to privacy, blind and double-blind refereeing are common practices. These practices evolved to solve problems inherent to the refereeing process, and it may be worthwhile to consider ways to incorporate such practices into automated systems.

推荐系统也引起了对个人隐私的关注。一般来说，人们获得的推荐信息越多，他们就越能更好地评估这些推荐。然而，人们可能不希望自己的习惯或观点广为人知。一些推荐系统允许匿名参与或使用假名参与，但这不是一个完整的解决方案，因为有些人可能希望他们的努力得到隐私和荣誉的中间混合。在我们的读者所熟悉的评价共享系统中，激励和隐私问题同时出现:学术界使用的同行评议系统。关于激励机制，每个编辑都知道，及时仔细审查的最佳来源是当前正在考虑文章的作者。就隐私而言，盲裁判和双盲裁判是常见的做法。这些实践的发展是为了解决评审过程中固有的问题，考虑将这些实践合并到自动化系统中的方法可能是值得的。

Business Models 商业模式

Maintenance of a recommender system is costly, and it is worth thinking about what business models might be used to generate revenues sufficient to cover those costs. One model is to charge recipients of recommendations either through subscriptions or pay-per-use. A second model for cost recovery is advertiser support, as Firefly (http://www.firefly.com) seems to provide. Presumably advertisers would find such systems very useful since they generate detailed marketing information about consumers. If a user revealed a taste for, say, cyberpunk books, publishers could make sure the users saw ads targeted to that market. A third model is to charge a fee to the owners of the items being evaluated. For example, filmmakers pay a fee for official ratings of their movies.

推荐系统的维护成本很高，考虑使用什么商业模式可以产生足够的收入来支付这些成本是值得的。一种模式是通过订阅或按次付费向推荐的接受者收费。第二个回收成本的模式是广告商支持，就像Firefly (http://www.firefly.com)提供的那样。想必广告商会发现这样的系统非常有用，因为它们可以生成关于消费者的详细营销信息。如果用户透露出对赛博朋克书籍的兴趣，出版商可以确保用户看到针对该市场的广告。第三种模式是向被评估物品的所有者收取费用。例如，电影制作人为其电影的官方评级支付费用。

The latter two business models both carry a danger of corruption. Mass market computer magazines that carry ads and reviews are often accused of biasing reviews toward companies that are heavy advertisers. In this case, the perception of bias is almost as bad as the reality. Recommender systems that collect fees from advertisers or others who may have a vested interest in the contents of the recommendations must be very careful to make sure that users recognize the difference between unbiased recommendations and advertisements in order to maintain credibility with their readers.

后两种商业模式都有腐败的危险。刊登广告和评论的大众市场电脑杂志经常被指责为对大量广告公司的评论有偏见。在这种情况下，偏见的感知几乎和现实一样糟糕。从广告商或其他可能对推荐内容有既得利益的人那里收取费用的推荐系统必须非常小心，以确保用户认识到无偏见的推荐和广告之间的区别，以便在读者中保持可信度。

There are economies of scale in recommender systems: The bigger the set of users, the more likely I am to find someone like me. Hence, other things being equal, I would prefer to use the biggest system. When several recommender systems start to compete in a given market, we should expect to see very intense competition since there is likely to be only one eventual survivor. This argument suggests that a possible market structure will be one or two big players in each medium or subject area who then subcontract with sellers of products to provide recommendations as a value-added service. For example, a book rating/review service might operate autonomously and sell its recommendation services to a number of independent online bookstores. It should be noted the independence of the rating/review service may also help to solve the problem of credibility.

推荐系统存在规模经济效应:用户越多，我就越有可能找到像我这样的人。因此，在其他条件相同的情况下，我更倾向于使用最大的系统。当几个推荐系统开始在一个给定的市场上竞争时，我们应该看到非常激烈的竞争，因为很可能只有一个最终的幸存者。这一观点表明，一个可能的市场结构将是在每个媒介或主题领域有一两个大玩家，然后他们与产品销售商分包合同，提供作为增值服务的推荐。例如，图书评级/评论服务可以自主运行，并将其推荐服务出售给许多独立的在线书店。应该指出的是，评级/审查服务的独立性也可能有助于解决可信度问题。

A flurry of commercial ventures have recently introduced recommender systems for products ranging from Web URLs to music, videos, and books. In the coming years, we can look forward to continued technical innovation, and a better understanding of which technical features are best suited to various characteristics of the items evaluated and the people who participate in the process.

最近，一大批商业企业推出了推荐系统，推荐的产品范围从网址到音乐、视频和书籍。在未来几年里，我们可以期待持续的技术革新，并更好地了解哪些技术特征最适合被评估项目的各种特征和参与过程的人员。

标签：协同,people,may,推荐,过滤,systems,recommendations,evaluations
From： https://www.cnblogs.com/wephiles/p/17978043

前言

全文及翻译

相关文章

赞助商

阅读排行