New publication at MODELS’26 on the model-driven development of families of reinforcement learning environments

Our paper A Model-Driven Approach for Developing Families of Reinforcement Learning Environments, co-authored with my fantastic PhD student, Xiaoran (Sharon) Liu has been accepted for the ACM/IEEE 29th International Conference on Model Driven Engineering Languages and Systems (MODELS).

MODELS is a CORE A-ranked conference, the flagship event of the model-driven engineering community. It is always an honor and a special kind of joy to have a manuscript accepted here. I’m especially happy for Sharon’s success as this is an important piece in her doctoral research roadmap. Sharon has been working on the principled engineering of reinforcement learning, and this is her second CORE A-ranked publication this year alone: earlier this year, Sharon published about her reference architecture of reinforcement learning environments at ICSA’26. Our approach fosters better generalization of RL agents while keeping the RL lifecycle under rigorous engineering control.

Tentative preprint available here: https://arxiv.org/abs/2606.20324. (Content subject to minor changes for the camera-ready version.)

Abstract. Virtual training environments are software-intensive systems in which reinforcement learning (RL) agents learn, adapt, and demonstrate meaningful behavior. Virtual training environments offer a safe and cost-efficient alternative to training agents in real-world settings. However, to converge, most realistic RL problems require training in multiple, mostly similar but slightly different environments—i.e., families of environment variants. The typical development process of environment families is a labor-intensive and error-prone manual endeavor that does not scale well. To alleviate these issues, in this paper, we propose a model-driven approach for developing families of RL training environments. To obtain the family of environments, we develop an approach and prototype tool. In our approach, a hybrid genetic algorithm—a combination of population-based global search and heuristic local search—generates environment families. Mutations and constraints are expressed as model transformations and are operationalized into a search process by a state-of-the-art model transformation engine. We demonstrate the soundness of our approach in a wildfire mitigation scenario and curriculum learning—a particular learning paradigm that relies on environment families.