首页 > 其他分享 >粗读Multi-Task Recommendations with Reinforcement Learning

粗读Multi-Task Recommendations with Reinforcement Learning

时间:2023-07-17 20:34:14浏览次数:49  
标签:Multi Task wise RMTL models Reinforcement task MTL recommendation

论文:

Multi-Task Recommendations with Reinforcement Learning

地址:

https://arxiv.org/abs/2302.03328

摘要

In recent years, Multi-task Learning (MTL) has yielded immense success in Recommender System (RS) applications [41]. However, current MTL-based recommendation models tend to disregard the session-wise patterns of user-item interactions because they are predominantly constructed based on item-wise datasets. Moreover, balancing multiple objectives has always been a challenge in this field, which is typically avoided via linear estimations in existing works. To address these issues, in this paper, we propose a Reinforcement Learning (RL) enhanced MTL framework, namely RMTL, to combine the losses of different recommendation tasks using dynamic weights. To be specific, the RMTL structure can address the two aforementioned issues by \((i)\) constructing an MTL environment from session-wise interactions and (ii) training multi-task actor-critic network structure, which is compatible with most existing MTL-based recommendation models, and (iii) optimizing and fine-tuning the MTL loss function using the weights generated by critic networks. Experiments on two real-world public datasets demonstrate the effectiveness of RMTL with a higher AUC against state-of-the-art MTL-based recommendation models. Additionally, we evaluate and validate RMTL's compatibility and transferability across various MTL models.

引言

The evolution of the Internet industry has led to a tremendous increase in the information volume of online services [1], such as social media and online shopping. In this scenario, the Recommender System (RS), which distributes various types of items to match users' interests, has made significant contributions to the enhancement of user online experiences in a variety of application fields, such as products recommendation in e-commerce platforms, short video recommendation in social media applications [31, 46]. In recent years, researchers have proposed numerous techniques for recommendations, including collaborative filtering [35], matrix factorization based approaches [23], deep learning powered recommendations \([9,51]\), etc. The primary objective of RS is to optimize a specific recommendation object, such as click-through rate and user conversion rate. However, users usually have varying interaction behaviors on a single item. In short-video recommendation services, for instance, users exhibit a wide range of behavior indicators, such as clicks, thumbs, and continuous dwelling time [18]; while in e-commerce platforms, the developers not only focus on the users' clicks but also on the final purchases to guarantee profits. All these potential issues prompted the development of Multi-Task Learning (MTL) techniques for recommender systems in research and industry communities [30, 46, 47].

MTL-based recommendation models learn multiple recommendation tasks simultaneously by training in a shared representation and transferring information among tasks [5], which has been developed for a wide range of machine learning applications, including computer vision [39], natural language processing [15], clickthrough rate (CTR) and click-through&conversion rate (CTCVR) prediction [30]. The objective functions for most existing MTL works are typically linear scalarizations of the multiple-task loss functions [30, 31, 46], which fix the weight with a constant. This item-wise multi-objective loss function is incapable of ensuring the convergence of the global optimum and typically yields limited prediction performance. On the other hand, at the representation level, the input of most existing MTL models is assumed to be the feature embeddings and user-item interaction (called item-wise), despite the fact that sequentially organized data (i.e., session-wise inputs) are relatively more prevalent in real-world RS applications. For example, the click and conversion behaviors of short video users typically occur during a specific session, so their inputs are also timing-related. However, this will downgrade the MTL model performance, while some tasks may have conflicts between sessionwise and item-wise labels [5]. Exiting MTL models concentrate on the design of network structures to improve the generalization ability of the model, while the study of proposing a new method that enhances the multi-task prediction weights considering the session-wise patterns has not received sufficient attention.

To address the two above-mentioned problems, we propose an RL-enhanced multi-task recommendation framework, RMTL, which is capable of incorporating the sequential property of user-item interactions into MTL recommendations and automatically updating the task-wise weights in the overall loss function. Reinforcement Learning (RL) algorithms have recently been applied in the RS research, which models the sequential user behaviors as Markov Decision Process (MDP) and utilizes RL to generate recommendations at each decision step [32, 58]. The RL-based recommender system is capable of handling the sequential user-item interaction and optimizing long-term user engagement [2]. Therefore, our RL-enhanced framework RMTL can convert the session-wise RS data into MDP manner, and train an actor-critic framework to generate dynamic weights for optimizing the MTL loss function. To achieve multi-task output, we employ a two-tower MTL backbone model as the actor network, which is optimized by two distinct critic networks for each task. In contrast to existing MTL models with item-wise input and constant loss function weight design, our RMTL model extracts sequential patterns from session-wise MDP input and updates the loss function weights automatically for each batch of data instances. In this paper, we focus on the CTR/CTCVR prediction, which is a crucial metric in e-commerce and short video platform [26]. Experiments against state-of-the-art MTL-based recommendation models on two real-world datasets demonstrate the effectiveness of the proposed model.

We summarize the contributions of our work as follows: (i) The multi-task recommendation problem is converted into an actorcritic reinforcement learning scheme, which is capable of achieving session-wise multi-task prediction; (ii) We propose an RL-enhanced Multi-task learning framework RMTL, which can generate adaptively adjusted weights for loss function design. RMTL is compatible with most existing MTL-based recommendation models; (iii) Extensive experiments on two real-world datasets demonstrate the superior performance of RMTL than SOTA MTL models, we also verify RMTL's transferability across various MTL models.

结论

In this paper, we propose a novel multi-task learning framework, RMTL, to improve the prediction performance of multi-tasks by generating dynamic total loss weights in an RL manner. The RMTL model can adaptively modify the weights of BCE for each prediction task by Q-value output from the critic network. By constructing a session-wise MDP environment, we estimate the multi-actor-critic networks using a specific MTL agent and then polish the optimization of the MTL overall loss function using dynamic weight, which is the linear transformation of the critic network output. We conduct several experiments on two real-world commercial datasets to verify the effectiveness of our proposed method with five baseline MTL-based recommendation models. The results demonstrate that RMTL is compatible with most existing MTL-based recommendation models and can improve multi-task prediction performance with excellent transferability.

方法

标签:Multi,Task,wise,RMTL,models,Reinforcement,task,MTL,recommendation
From: https://www.cnblogs.com/tuyuge/p/17561120.html

相关文章

  • 设备驱动-10.中断子系统-4.1中断下半部使用-tasklet
    1中断下半部分引入引入中断下半部介绍了硬件中断和软件中断,硬件中断有gpio中断,网卡,外部电路IP引起的中断,而软件中断则有定时器,tasklet这些为软件中断。cpu会先处理硬件中断,然后处理软件中断。简单说可以认为内核中有一个数组softirq[],里面有很多项,某一项都应timer,某一项表示t......
  • ThreadPoolTaskExecutor自定义线程池的配置和使用
    ThreadPoolTaskExecutor自定义线程池的配置和使用线程池ThreadPoolTaskExecutor和ThreadPoolExecutor的区别ThreadPoolExecutor,这个类是JDK中的线程池类,继承自Executor,里面有一个execute()方法,用来执行线程,线程池主要提供一个线程队列,队列中保存着所有等待状态的线程,避免了创......
  • spark SLF4J: Class path contains multiple SLF4J bindings.
    解决"sparkSLF4J:ClasspathcontainsmultipleSLF4Jbindings."问题1.概述当在Spark应用程序中出现"sparkSLF4J:ClasspathcontainsmultipleSLF4Jbindings."错误时,这意味着在类路径上存在多个SLF4J绑定。SLF4J是一个日志框架,用于在应用程序中记录日志。这个......
  • return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
    HadoopHive和SparkTask中的错误代码30041解析在使用HadoopHive进行数据处理时,有时会遇到错误代码30041,该错误代码来自于org.apache.hadoop.hive.ql.exec.spark.SparkTask。本篇文章将解释这个错误代码的含义,并提供一些可能的解决方案。错误代码30041的含义错误代码30041表示在......
  • Python的多线程(threading)与多进程(multiprocessing )
    可以用来做后台任务,可以在djangoview中调用,当做异步任务考核系统中要的threading,用来异步考核结果和考核进度的统计Python的多线程(threading)与多进程(multiprocessing)......
  • [已过万次测试] MIT 6.5840 2023 Lab 4 Shard KV Server TaskA, TaskB, Challenge 通
    MIT6.58402023Lab4ShardKVServerTaskA,TaskB,Challenge前言这波是终于写完了MIT6.5840的所有lab了。lab均是独立完成,没有任何参考,哈哈,还是挺有成就感的。lab4其实在上周就已经写完了,不过比较懒,拖了一周才开始写总结。本次lab4,在所有lab中,个人认为难度仅次于lab2,也......
  • verilog task/function 语句
    task模块任务task在模块中任意位置定义,并在模块内任意位置引用,作用范围也局限于此模块。模块内子程序出现下面任意一个条件时,则必须使用任务而不能使用函数。1)子程序中包含时序控制逻辑,例如延迟,事件控制等2)没有输入变量3)没有输出或输出端的数量大于1//任务task定义:如下......
  • multipart上传下载文件
    multipart上传下载文件使用apiPost工具测试上传文件下载文件 ......
  • newcoder61132L <multiset 维护中位数>
    题目中位数多次询问,每次修改数组中一个数,问修改后n个数的中位数思路使用multiset,分别维护数组的较大的\(n/2+1\)个和较小的\(n/2\)个;根据数据范围,或许可用线段树+二分...代码Code#include<iostream>#include<algorithm>#include<vector>#include<cstring>......
  • C# winfrom 自定义一个多选下拉控件MultiCombobox
    先看效果图:下拉框可自由拖动大小,内部checkbox会自动换行。主要代码片段自定义控件MultiComboboxCtrl1publicpartialclassMultiComboBoxCtrl:UserControl2{3MyCheckboxListCtrlcheckBoxListCtrl;4publiceventAction<string>Selec......