SEMINAR: Bandits roaming Hilbert space

JOSEP LUMBRERAS ZARAPICO

Nanyang Technological University

In this talk, I’ll discuss our work on the trade-off between exploration and exploitation in online learning of properties of quantum states using multi-armed bandits. Given streaming access to an unknown quantum state, in each round we select an observable from a set of actions to maximize its expectation value on the state. Using past information (measurement outcomes), we refine actions to minimize regret; the cumulative gap between current reward and the maximum possible. We derive information-theoretic lower bounds and optimal strategies with matching upper bounds, showing regret typically scales as the square root of rounds. As an application, we reframe quantum state tomography to both learn the state efficiently and minimize measurement disturbance. For pure states and continuous actions, we achieve polylogarithmic regret using a sample-optimal algorithm based on a weighted online least squares estimator. The algorithm relies on the optimistic principle and controls the eigenvalues of the design matrix. We also apply our framework to quantum recommender systems and thermodynamic work extraction from unknown states. In this last setting, our results demonstrate an exponential advantage in work dissipation over tomography-based protocols.

Hosted by Prof. Dr. Antonio Acín

Seminarios

15 diciembre 2025

JOSEP LUMBRERAS ZARAPICO

SEMINAR: Bandits roaming Hilbert space

Hora: Desde 12:00h a 13:00h

Lugar: Seminar Room

SEMINAR: Bandits roaming Hilbert space

JOSEP LUMBRERAS ZARAPICO

Nanyang Technological University

Hosted by Prof. Dr. Antonio Acín