This paper was accepted at “Reinforcement Studying for Actual Life” workshop at NeurIPS 2022.
Developments in reinforcement studying (RL) have impressed new instructions in clever automation of community protection. Nevertheless, many of those developments have both outpaced their software to community safety or haven’t thought-about the challenges related to implementing them within the real-world. To grasp these issues, this work evaluates a number of RL approaches carried out within the second version of the CAGE Problem, a public competitors to construct an autonomous community defender agent in a high-fidelity community simulator. Our approaches all construct on the Proximal Coverage Optimization (PPO) household of algorithms, and embrace hierarchical RL, motion masking, customized coaching, and ensemble RL. We discover that the ensemble RL method performs strongest, outperforming our different fashions and taking second place within the competitors. To grasp applicability to actual environments we consider every methodology’s means to generalize to unseen networks and in opposition to an unknown assault technique. In unseen environments, all of our approaches carry out worse, with degradation diversified based mostly on the kind of environmental change. Towards an unknown attacker technique, we discovered that our fashions had diminished total efficiency regardless that the brand new technique was much less environment friendly than those our fashions educated on. Collectively, these outcomes spotlight promising analysis instructions for autonomous community protection in the true world.