本篇综述主要介绍两个或多个路口路网的基于强化学习的交通信号灯控制,覆盖了1994年至2022年来自20个国家的160多篇文章。具体内容有:
-
A review on Reinforcement Learning in the network-scale Traffic Signal Control area.(网络规模交通信号控制领域的强化学习综述。)
-
Presents a comprehensive systematic literature review of 160 included articles.(对 160 篇收录文章进行全面系统的文献综述。)
-
Consolidates and characterizes the existing research on the defined area.(巩固和描述对特定领域的现有研究。)
-
Explores the methods, applications, domains, and first events in the defined scope.(探索定义范围内的方法、应用程序、域和第一个事件。)
-
Identifies past and present trends and directions for further research in the area.(确定该领域过去和现在的进一步研究趋势和方向。)
ABSTRACT
- (i) publication and authors’ data,
- (ii) method identification and analysis,
- (iii) environment attributes and traffic simulation,
- (iv) application domains of RL-NTSC,
- (v) major first events of RL-NTSC and authors’ key statements,
- (vi) code availability, and
- (vii) evaluation.
作者:
1.Induction
针对现有信号灯控制方式的弊端所提出的方案,分为三类:
These methods include
- (i) traffic theory based (The cell transmission model, part II: Network traffic,Daganzo, 1995;
Analytical derivation of the optimal traffic signal timing: Minimizing delay variability and spillback probability for undersaturated intersections,Mohajerpoor, Saberi, & Ramezani, 2019;
Real-time decentralized traffic signal control for congested urban networks considering queue spillbacks,Noaeen, Mohajerpoor, Far, & Ramezani, 2021;
Traffic Signal Timing Optimization by Modelling the Lost Time Effect in the Shock Wave Delay Model,Noaeen, Rassafi, & Far, 2016;
Max pressure control of a network of signalized intersections,Varaiya, 2013), - (ii) simulation based (A Simulation-Based Optimization Algorithm for Dynamic Large-Scale Urban Transportation Problems,Chong & Osorio, 2017;
A Simulation-Based Optimization Algorithm for Dynamic Large-Scale Urban Transportation Problems,Osorio & Selvam, 2017), and - (iii) data-driven methods (Balaji, German, & Srinivasan, 2010;
The efficacy of using social media data for designing traffic management systems,Noaeen & Far, 2019, 2020).
接着,给出RL在交通信号灯控制相较其他方法的优点:
An advantage of RL over conventional methods, e.g. traffic theory based and heuristic methods, is that RL can learn from the interaction with the environment via trial and error to take appropriate actions based on the feedback it receives from the environment, rather than relying on pre-defined rules which are often used in conventional methods.(RL 优于传统方法的优势,例如基于交通理论和启发式方法,RL 可以通过反复试验从与环境的交互中学习,以根据从环境中接收到的反馈采取适当的行动,而不是依赖于传统中经常使用的预定义规则方法。)
强调本文章要做的工作:
Due to the rising popularity of RL in TSC recently, specifically in NTSC, we aim to thoroughly characterize the existing research in the area of urban traffic networks where RL is applied and to provide a complete account of what has already been explored.(由于最近 RL 在 TSC 中越来越受欢迎,特别是在 NTSC 中,我们的目标是彻底描述应用 RL 的城市交通网络领域的现有研究,并提供对已经探索的内容的完整说明。)
最后,对采纳文章的标准进行阐述,并给出本文覆盖范围:
2.Backgroud
在TSC,通常单个agent为单个路口,由此引出多智能体强化学习(Multi-Agent Reinforcement Learning,MARL)。同时对TSC中的相位等进行阐述。
在 reinforcement learning fundamentals in traffic signal control 小节,文章首先阐述RL相关知识,并给出RL在TSC中的完整过程。接着基于引文提出Q-Learning是TSC中使用频率最高最成功的方法(One of the most frequently used and successful RL methods in traffic signal control is Q-learning (Reinforcement Learning: An Introduction,Sutton & Barto, 2018), which was first investigated in 1989. )。同时对Q-Learning算法做了完整介绍。
3. Review method
(略)