Cristin-resultat-ID: 2078556
Sist endret: 25. november 2022, 09:04
NVI-rapporteringsår: 2022
Resultat
Vitenskapelig artikkel
2022

A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results

Bidragsytere:
  • John Oommen
  • Xuan Zhang og
  • Jiao Lei

Tidsskrift

Lecture Notes in Networks and Systems
ISSN 2367-3370
e-ISSN 2367-3389
NVI-nivå 1

Om resultatet

Vitenskapelig artikkel
Publiseringsår: 2022
Volum: 289
Sider: 33 - 52

Beskrivelse Beskrivelse

Tittel

A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results

Sammendrag

The pre-cursor field to Reinforcement Learning is that of Learning Automata (LA). Within this field, Estimator Algorithms (EAs) can be said to be the state-of-the-art. Further, the subset of Pursuit Algorithms (PAs), discovered by Thathachar and Sastry [34, 39], were the pioneering schemes. This chapter contains a comprehensive survey of the various EAs, and the most recent convergence results for PAs. Unlike the prior LA, EAs are based on a fundamentally distinct phenomenon. They are also the most accurate LA, converging in the least time. EAs operate on two vectors, namely, the action probability vector which is updated using responses from the Environment, and quickly-computed estimates of the reward probabilities of the various actions. The proofs that they are ε-optimal is thus very complex. They have to incorporate two rather snon-orthogonal phenomena, which are the convergence of these estimates and the convergence of the probabilities of selecting the various actions. For almost three decades, the reported proofs of PAs possessed an infirmity (or flaw), which we refer to as the claim of the “monotonicity” property. This flaw was discovered by the authors of [37], who also provided an alternate proof for a specific PA where the scheme’s parameter decreased with time. This paper first records all the reported EAs. It then reports a comprehensive survey of the proofs from a different perspective. These proofs have not required that the sequence of action probabilities of selecting the optimal action satisfies the property of monotonicity. On the other hand, whenever any action probability is close enough to unity, we require that the process jumps to an absorbing barrier at the next time instant, i.e., in a single step. By requiring such a constraint, these proofs invoke the weaker property, i.e., the submartinagale property of pm(t), to demonstrate the ε-optimality. We have thus proven the ε-optimality for the Absorbing CPA [49, 50], the Discretized PA [51, 52], and for the family of Bayesian PA [53], where the estimates are obtained by a Bayesian (rather than a Maximum Likelihood (ML)) process.

Bidragsytere

John Oommen

  • Tilknyttet:
    Forfatter
    ved Institutt for informasjons- og kommunikasjonsteknologi ved Universitetet i Agder
  • Tilknyttet:
    Forfatter
    ved Carleton University

Xuan Zhang

  • Tilknyttet:
    Forfatter
    ved NORCE Energi og teknologi ved NORCE Norwegian Research Centre AS

Lei Jiao

Bidragsyterens navn vises på dette resultatet som Jiao Lei
  • Tilknyttet:
    Forfatter
    ved Institutt for informasjons- og kommunikasjonsteknologi ved Universitetet i Agder
1 - 3 av 3