This paper was accepted on the “Human within the Loop Studying Workshop” at NeurIPS 2022.
Specification of reward features for Reinforcement Studying is a difficult process which is bypassed by the framework of Desire Primarily based Studying strategies which as a substitute be taught from choice labels on trajectory queries. These strategies, nonetheless, nonetheless undergo from excessive necessities of choice labels and infrequently would nonetheless obtain low reward restoration. We current the PRIOR framework that alleviates the problems of impractical variety of queries to people in addition to poor reward restoration via computing priors concerning the reward perform based mostly on the surroundings dynamics and a surrogate choice classification mannequin. We discover that imposing these priors as mushy constraints considerably reduces the queries made to the human within the loop and improves the general reward restoration. Moreover, we examine using an summary state area for the computation of those priors to additional enhance the agent’s efficiency.