前言
协同过滤推荐系统,包括基于用户的、基于项目的息肉通过率等,今天我们读一篇基于项目的协同过滤算法的论文。
今天读的论文为一篇名叫《基于项目的协同过滤推荐算法》(Item-Based Collaborative Filtering RecommendationAlgorithms)。
摘要
Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information, products or services during a live interaction. These systems, especially the k-nearest neighbor collaborative \x0cltering based ones, are achieving widespread success on the Web. The tremendous growth in the amount of available information and the number of visitors to Web sites in recent years poses some key challenges for recommender systems. These are: producing high quality recommendations, performing many recommendations per second for millions of users and items and achieving high coverage in the face of data sparsity. In traditional collaborative filtering systems the amount of work increases with the number of participants in the system. New recommender system technologies are needed that can quickly produce high quality recommendations, even for very large-scale problems. To address these issues we have explored item-based collaborative filtering techniques. Item-based techniques first analyze the user-item matrix to identify relationships between different items, and then use these relationships to indirectly compute recommendations for users.
推荐系统将知识发现技术应用于实时交互中,为信息、产品或服务提供个性化推荐。这些系统,特别是基于k近邻协作聚类的系统,在Web上取得了广泛的成功。近年来,网站可用信息量和访问量的急剧增长对推荐系统提出了严峻的挑战。这些是:产生高质量的推荐,每秒为数百万用户和物品执行多次推荐,以及在数据稀疏的情况下实现高覆盖率。在传统的协同过滤系统中,工作量会随着参与者数量的增加而增加。新的推荐系统技术需要能够快速产生高质量的推荐,即使是对于非常大规模的问题。为了解决这些问题,我们探索了基于物品的协同过滤技术。基于物品的推荐技术首先通过分析用户-物品矩阵来识别不同物品之间的关系,然后利用这些关系间接地为用户计算推荐。
In this paper we analyze different item-based recommendation generation algorithms. We look into different techniques for computing item-item similarities (e.g., item-item correlation vs. cosine similarities between item vectors) and different techniques for obtaining recommendations from them (e.g., weighted sum vs. regression model). Finally, we ex- perimentally evaluate our results and compare them to the basic k-nearest neighbor approach. Our experiments suggest that item-based algorithms provide dramatically better performance than user-based algorithms, while at the same time providing better quality than the best available userbased algorithms.
本文分析了不同的基于项目的推荐生成算法。我们研究了计算物品相似度的不同技术(例如物品之间的相关度和物品向量之间的余弦相似度),以及从中获得推荐的不同技术(例如加权和和回归模型)。最后,对实验结果进行评估,并与基本的k近邻方法进行比较。实验表明,基于物品的算法在性能上明显优于基于用户的算法,同时在质量上也优于现有的最好的基于用户的算法。
Sarwar B, Karypis G, Konstan J, et al. Item-based collaborative filtering recommendation algorithms[C]//Proceedings of the 10th international conference on World Wide Web. 2001: 285-295.
摘要部分主要内容
摘要主要介绍了传统的K近邻算法的缺陷:随着互联网技术的快速发展,对推荐系统产生了很大的冲击,文章提出了计算物品相似度的技术,并从中获得不同的推荐技术,最后分析实验结果,同时与K近邻算法比较,实验结果表明,协同过滤推荐算法更好。
引言
The amount of information in the world is increasing far more quickly than our ability to process it. All of us have known the feeling of being overwhelmed by the number of new books, journal articles, and conference proceedings coming out each year. Technology has dramatically reduced the barriers to publishing and distributing information. Now it is time to create the technologies that can help us sift through all the available information to find that which is most valuable to us.
世界上信息量的增长速度远远超过了我们处理信息的能力。我们都有过被每年涌现的新书、期刊文章和会议记录所淹没的感觉。科技极大地减少了出版和传播信息的障碍。现在是时候创造一种技术,帮助我们筛选所有可用的信息,找到对我们最有价值的信息。
One of the most promising such technologies is col laborative filtering [19,27,14,16]. Collaborative filtering works by building a database of preferences for items by users. A new user, Neo, is matched against the database to discover neighbors, which are other users who have historically had similar taste to Neo. Items that the neighbors like are then recommended to Neo, as he will probably also like them. Collaborative filtering has been very successful in both research and practice, and in both information filtering applications and E-commerce applications. However, there remain important research questions in overcoming two fundamental challenges for collaborative filtering recommender systems.
其中最有前途的技术之一是协同过滤。协同过滤的工作原理是建立用户对项目的偏好数据库。将新用户Neo与数据库进行匹配,以发现邻居,这些邻居是历史上与Neo有着相似品味的其他用户。邻居喜欢的物品会被推荐给Neo,因为他可能也会喜欢这些物品。协同过滤在信息过滤应用和电子商务应用中都取得了很大的成功。然而,在克服协同过滤推荐系统的两个基本挑战方面,仍然存在重要的研究问题。
The first challenge is to improve the scalability of the collaborative filtering algorithms. These algorithms are able to search tens of thousands of potential neighbors in real-time, but the demands of modern systems are to search tens of millions of potential neighbors. Further, existing algorithms have performance problems with individual users for whomthe site has large amounts of information. For instance, if a site is using browsing patterns as indications of con- tent preference, it may have thousands of data points for its most frequent visitors. These "long user rows" slow down the number of neighbors that can be searched per second, further reducing scalability.
第一个挑战是提高协同过滤算法的可扩展性。这些算法能够实时搜索数以万计的潜在邻居,但现代系统的需求是搜索数以千万计的潜在邻居。此外,现有算法在处理拥有大量网站信息的个人用户时存在性能问题。例如,如果一个网站使用浏览模式作为内容偏好的指示,那么它可能有数千个最频繁访问者的数据点。这些“长用户行”减慢了每秒可以搜索的邻居的数量,进一步降低了可伸缩性。
The second challenge is to improve the quality of the recommendations for the users. Users need recommendations they can trust to help them find items they will like. Users will "vote with their feet" by refusing to use recommender systems that are not consistently accurate for them.
第二个挑战是提高用户推荐的质量。用户需要他们信任的推荐来帮助他们找到他们喜欢的东西。用户将“用脚投票”,拒绝使用对他们来说不始终准确的推荐系统。
In some ways these two challenges are in con ict, since the less time an algorithm spends searching for neighbors, the more scalable it will be, and the worse its quality. For this reason, it is important to treat the two challenges simultaneously so the solutions discovered are both useful and practical.
在某些方面,这两个挑战是相互冲突的,因为算法搜索邻居的时间越少,它的可扩展性就越强,质量就越差。因此,同时处理这两个挑战非常重要,这样所发现的解决方案才既有用又实用。
In this paper, we address these issues of recommender systems by applying a different approach{item-based algorithm. The bottleneck in conventional collaborative filtering algorithms is the search for neighbors among a large user population of potential neighbors [12]. Item-based algorithms avoid this bottleneck by exploring the relationships between items first, rather than the relationships between users. Recommendations for users are computed by finding items that are similar to other items the user has liked. Because the relationships between items are relatively static,item-based algorithms may be able to provide the same quality as the user-based algorithms with less online computation.
在本文中,我们通过应用一种不同的方法(基于项目的算法)来解决推荐系统的这些问题。传统协同过滤算法的瓶颈是在大量潜在邻居用户群中搜索邻居。基于项目的算法通过首先探索项目之间的关系而不是用户之间的关系来避免这个瓶颈。对用户的推荐是通过查找与用户喜欢的其他物品相似的物品来计算的。因为项目之间的关系是相对静态的,基于项目的算法可能能够提供与基于用户的算法相同的质量,并且在线计算较少。
结尾
今天的论文就先读到这里了,今天主要学习相关概念与知识,下次再补充详细的信息吧。
标签:Based,Collaborative,item,推荐,Item,算法,过滤,algorithms,based From: https://www.cnblogs.com/wephiles/p/17980545