Deep Reinforcement Learning (Deep RL) has been going through a rapid development in recent years and it is becoming one of the major focuses in Artificial Intelligence research. RL is a Machine Learning (ML) paradigm whose key elements are the agent and the environment. The agent is an entity that can act and sense changes in the environment. More formally a RL problem is described by a Markov Decision Process (MDP), which defines agent-environment interactions. In most cases, and unlike supervised learning methods, Deep RL has the advantage of not needing a training dataset --- data is collected from the environment by the agent itself. The underlying structure which models the agent behavior (i.e. agent policy) is a deep neural network which is trained to maximize the expected return obtained through sequences of actions. Inspired by learning in humans, the reward function provides feedback, telling whether or not an action is adequate in a particular state. Because of its generality as a problem solving framework, Deep RL can be applied to many different research scenarios and in particular to build heuristics for Combinatorial Optimization (CO) problems. Unlike traditional algorithms, where the design of the heuristic function is a costly process that requires domain experts, Deep RL allows to learn heuristics algorithmically. The first part of this work analyzes how attention-based Deep Learning models, often used in Natural Language Generation, have been successfully applied to routing problems with a focus on the well-known Travelling Salesman Problem (TSP). The second and main part of this work proposes a novel Deep RL approach to scheduling, and applies it to the Job Shop Problem (JSP). Although heuristic methods, like priority dispatching, can solve the JSP efficiently, they are quite difficult to design and their performance is often unexceptional. To the contrary, the method proposed in this work can learn dispatching rules automatically, showing competitive results against state-of-the-art Deep RL methods when tested on a set of JSP benchmark instances of various sizes. Our method can also be extended to tackle the Open Shop Problem (OSP) and Flow Shop Problem (FSP) with minimal intervention. Like previous works, our method shows that Deep RL is a powerful and flexible tool for CO problems.

Deep Learning con Reinforcement per l'ottimizzazione combinatoria: un approccio ai problemi di routing e di scheduling

ZAGO, DAVIDE
2021/2022

Abstract

Deep Reinforcement Learning (Deep RL) has been going through a rapid development in recent years and it is becoming one of the major focuses in Artificial Intelligence research. RL is a Machine Learning (ML) paradigm whose key elements are the agent and the environment. The agent is an entity that can act and sense changes in the environment. More formally a RL problem is described by a Markov Decision Process (MDP), which defines agent-environment interactions. In most cases, and unlike supervised learning methods, Deep RL has the advantage of not needing a training dataset --- data is collected from the environment by the agent itself. The underlying structure which models the agent behavior (i.e. agent policy) is a deep neural network which is trained to maximize the expected return obtained through sequences of actions. Inspired by learning in humans, the reward function provides feedback, telling whether or not an action is adequate in a particular state. Because of its generality as a problem solving framework, Deep RL can be applied to many different research scenarios and in particular to build heuristics for Combinatorial Optimization (CO) problems. Unlike traditional algorithms, where the design of the heuristic function is a costly process that requires domain experts, Deep RL allows to learn heuristics algorithmically. The first part of this work analyzes how attention-based Deep Learning models, often used in Natural Language Generation, have been successfully applied to routing problems with a focus on the well-known Travelling Salesman Problem (TSP). The second and main part of this work proposes a novel Deep RL approach to scheduling, and applies it to the Job Shop Problem (JSP). Although heuristic methods, like priority dispatching, can solve the JSP efficiently, they are quite difficult to design and their performance is often unexceptional. To the contrary, the method proposed in this work can learn dispatching rules automatically, showing competitive results against state-of-the-art Deep RL methods when tested on a set of JSP benchmark instances of various sizes. Our method can also be extended to tackle the Open Shop Problem (OSP) and Flow Shop Problem (FSP) with minimal intervention. Like previous works, our method shows that Deep RL is a powerful and flexible tool for CO problems.
ENG
IMPORT DA TESIONLINE
File in questo prodotto:
File Dimensione Formato  
859146_tesi_magistrale_davide_zago.pdf

non disponibili

Tipologia: Altro materiale allegato
Dimensione 8.51 MB
Formato Adobe PDF
8.51 MB Adobe PDF

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14240/53542