Abstract— Q-learning is a technique used to compute an optimal policy for a controlled Markov chain based on observations of the system controlled using a non-optimal policy. It ...
Stochastic programming problems appear as mathematical models for optimization problems under stochastic uncertainty. Most computational approaches for solving such models are base...
We introduce a new formal model in which a learning algorithm must combine a collection of potentially poor but statistically independent hypothesis functions in order to approxima...
We propose total subset variation (TSV), a convexity preserving generalization of the total variation (TV) prior, for higher order clique MRF. A proposed differentiable approximat...