Apprenticeship learning with prior beliefs using inverse optimization

Abstract

The relationship between inverse reinforcement learning (IRL) and inverse optimization (IO) for Markov decision processes (MDPs) has been relatively underexplored in the literature, despite addressing the same problem. In this work, we revisit the relationship between the IO framework for MDPs, IRL, and apprenticeship learning (AL). We incorporate prior beliefs on the structure of the cost function into the IRL and AL problems, and demonstrate that the convex-analytic view of the AL formalism emerges as a relaxation of our framework. Notably, the AL formalism is a special case in our framework when the regularization term is absent. Focusing on the suboptimal expert setting, we formulate the AL problem as a regularized min-max problem. The regularizer plays a key role in addressing the ill-posedness of IRL by guiding the search for plausible cost functions. To solve the resulting regularized-convex-concave-min-max problem, we use stochastic mirror descent (SMD) and establish convergence bounds for the proposed method. Numerical experiments highlight the critical role of regularization in learning cost vectors and apprentice policies.

1. Formulation

We frame apprenticeship learning (AL) as an inverse optimization (IO) problem, incorporating prior beliefs on the cost function to address the ill-posedness of inverse reinforcement learning (IRL).

2. Solution method

We solve the unconstrained AL problem via stochastic mirror descent (SMD) using gradient oracles and provide convergence guarantees.

3. Experiments

We perform ablation studies within a low-dimensional inventory control setting and assess the convergence behavior of our method in a high-dimensional gridworld environment. The visualization below demonstrates the framework's ability to recover the underlying cost function in the gridworld task.

BibTeX

@article{junca2026apprenticeship,
      title = {Apprenticeship Learning with Prior Beliefs Using Inverse Optimization},
      author = {Junca, M. and Leiva, E.},
      journal = {Machine Learning},
      volume = {115},
      year = {2026},
      doi = {10.1007/s10994-026-07019-9},
      url = {https://doi.org/10.1007/s10994-026-07019-9}
    }