\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Approachability, regret and calibration: Implications and equivalences

Abstract Related Papers Cited by
  • Blackwell approachability, regret minimization and calibration are three criteria used to evaluate a strategy (or an algorithm) in sequential decision problems, described as repeated games between a player and Nature. Although they have at first sight not much in common, links between them have been discovered: for instance, both consistent and calibrated strategies can be constructed by following, in some auxiliary game, an approachability strategy.
        We gather seminal and recent results, develop and generalize Blackwell's elegant theory in several directions. The final objectives is to show how approachability can be used as a basic powerful tool to exhibit a new class of intuitive algorithms, based on simple geometric properties. In order to be complete, we also prove that approachability can be seen as a byproduct of the very existence of consistent or calibrated strategies.
    Mathematics Subject Classification: Primary: 91A25, 68W27; Secondary: 91A26.

    Citation:

    \begin{equation} \\ \end{equation}
  • [1]

    J. Abernethy, P. Bartlett and E. Hazan, Blackwell approachability and low-regret learning are equivalent, J. Mach. Learn. Res.: Workshop Conf. Proc., 19 (2011), 27-46.

    [2]

    S. As Soulaimani, M. Quincampoix and S. Sorin, Repeated games and qualitative differential games: Approachability and comparison of strategies, SIAM J. Control Optim., 48 (2009), 2461-2479.doi: 10.1137/090749098.

    [3]

    P. Auer, N. Cesa-Bianchi and C. Gentile, Adaptive and self-confident on-line learning algorithms, J. Comput. System Sci., 64 (2002), 48-75, Special issue on COLT 2000 (Palo Alto, CA).doi: 10.1006/jcss.2001.1795.

    [4]

    R. J. Aumann, Subjectivity and correlation in randomized strategies, J. Math. Econom., 1 (1974), 67-96.doi: 10.1016/0304-4068(74)90037-8.

    [5]

    R. J. Aumann and M. B. Maschler, Repeated Games with Incomplete Information, MIT Press, Cambridge, MA, 1995, With the collaboration of Richard E. Stearns.

    [6]

    M. Benaïm and M. Faure, Consistency of vanishingly smooth fictitious play, Math. Oper. Res., 38 (2013), 437-450.doi: 10.1287/moor.1120.0568.

    [7]

    M. Benaïm, J. Hofbauer and S. Sorin, Stochastic approximations and differential inclusions. II. Applications, Math. Oper. Res., 31 (2006), 673-695.doi: 10.1287/moor.1060.0213.

    [8]

    A. Bernstein, S. Mannor and N. Shimkin, Opportunistic strategies for generalized no-regret problems, J. Mach. Learn. Res.: Workshop Conf. Proc., 30 (2013), 158-171.

    [9]

    A. Bernstein and N. Shimkin, Response-based approachability and its application to generalized no-regret algorithms, Manuscript.

    [10]

    L. J. Billera and B. Sturmfels, Fiber polytopes, The Annals of Mathematics, 135 (1992), 527-549.doi: 10.2307/2946575.

    [11]

    D. Blackwell, An analog of the minimax theorem for vector payoffs, Pacific J. Math., 6 (1956), 1-8.doi: 10.2140/pjm.1956.6.1.

    [12]

    D. Blackwell, Controlled random walks, in Proceedings of the International Congress of Mathematicians, 1954, Amsterdam, vol. III, 1956, 336-338.

    [13]

    D. Blackwell and M. A. Girshick, Theory of Games and Statistical Decisions, John Wiley and Sons, Inc., New York, 1954.

    [14]

    A. Blum and Y. Mansour, From external to internal regret, in Learning theory, vol. 3559 of Lecture Notes in Comput. Sci., Springer, Berlin, (2005), 621-636.doi: 10.1007/11503415_42.

    [15]

    S. Bubeck, Introduction to online optimization, Manuscript.

    [16]

    N. Cesa-Bianchi and G. Lugosi, Potential-based algorithms in on-line prediction and game theory, Machine Learning, 51 (2003), 239-261.

    [17]

    N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games, Cambridge University Press, Cambridge, 2006.doi: 10.1017/CBO9780511546921.

    [18]

    X. Chen and H. White, Laws of large numbers for Hilbert space-valued mixingales with applications, Econometric Theory, 12 (1996), 284-304.doi: 10.1017/S0266466600006599.

    [19]

    A. P. Dawid, The well-calibrated Bayesian, J. Amer. Statist. Assoc., 77 (1982), 605-613.doi: 10.1080/01621459.1982.10477856.

    [20]

    A. P. Dawid, Self-calibrating priors do not exist: Comment, J. Amer. Statist. Assoc., 80 (1985), 339-342.doi: 10.1080/01621459.1985.10478117.

    [21]

    L. Evans and P. E. Souganidis, Differential games and representation formulas for solutions of Hamilton-Jacobi equations, Indiana Univ. Math. J., 33 (1984), 773-797.doi: 10.1512/iumj.1984.33.33040.

    [22]

    K. Fan, Minimax theorems, Proc. Nat. Acad. Sci. U. S. A., 39 (1953), 42-47.doi: 10.1073/pnas.39.1.42.

    [23]

    K. Fan, A minimax inequality and applications, in Inequalities, III (Proc. Third Sympos., Univ. California, Los Angeles, Calif., 1969; dedicated to the memory of Theodore S. Motzkin), Academic Press, New York, (1972), 103-113.

    [24]

    W. Feller, An Introduction to Probability Theory and its Applications. Vol. I, Third edition, John Wiley & Sons Inc., New York, 1968.

    [25]

    D. P. Foster, A proof of calibration via blackwell's approachability theorem, Games and Economic Behavior, 29 (1999), 73-78.doi: 10.1006/game.1999.0719.

    [26]

    D. P. Foster, A. Rakhlin, K. Sridharan and A. Tewari, Complexity-based approach to calibration with checking rules, J. Mach. Learn. Res.: Workshop Conf. Proc., 19 (2011), 293-314.

    [27]

    D. P. Foster and R. V. Vohra, Calibrated learning and correlated equilibrium, Games Econom. Behav., 21 (1997), 40-55.doi: 10.1006/game.1997.0595.

    [28]

    D. P. Foster and R. V. Vohra, Asymptotic calibration, Biometrika, 85 (1998), 379-390.doi: 10.1093/biomet/85.2.379.

    [29]

    D. P. Foster and R. V. Vohra, Regret in the on-line decision problem, Games Econom. Behav., 29 (1999), 7-35.doi: 10.1006/game.1999.0740.

    [30]

    D. P. Foster and R. V. Vohra, Calibration: Respice, adspice, prospice, Advances in Economics and Econometrics, 1 (2013), 423-442.doi: 10.1017/CBO9781139060011.014.

    [31]

    D. Fudenberg and D. M. Kreps, Learning mixed equilibria, Games Econom. Behav., 5 (1993), 320-367.doi: 10.1006/game.1993.1021.

    [32]

    D. Fudenberg and D. K. Levine, Conditional universal consistency, Games Econom. Behav., 29 (1999), 104-130.doi: 10.1006/game.1998.0705.

    [33]

    D. Fudenberg and D. K. Levine, An easier way to calibrate, Games Econom. Behav., 29 (1999), 131-137.doi: 10.1006/game.1999.0726.

    [34]

    F. Gul, D. Pearce and E. Stachetti, A bound on the proportion of pure strategy equilibria in generic games, Math. Oper. Res., 18 (1993), 548-552.doi: 10.1287/moor.18.3.548.

    [35]

    P. Hall and C. C. Heyde, Martingale Limit Theory and its Application, Academic Press Inc., New York, 1980, Probability and Mathematical Statistics.

    [36]

    J. Hannan, Approximation to bayes risk in repeated play, in Contributions to the Theory of Games, vol. III of Annals of Mathematics Studies 39, Princeton University Press, Princeton, N. J., 1957, 97-139.

    [37]

    S. Hart and A. Mas-Colell, A simple adaptive procedure leading to correlated equilibrium, Econometrica, 68 (2000), 1127-1150.doi: 10.1111/1468-0262.00153.

    [38]

    S. Hart and A. Mas-Colell, A general class of adaptive strategies, J. Econom. Theory, 98 (2001), 26-54.doi: 10.1006/jeth.2000.2746.

    [39]

    S. Hart and A. Mas-Colell, Regret-based continuous-time dynamics, Games Econom. Behav., 45 (2003), 375-394.doi: 10.1016/S0899-8256(03)00178-7.

    [40]

    E. Hazan and S. M. Kakade, (weak) calibration is computationaly hard, J. Mach. Learn. Res.: Workshop Conf. Proc., 23 (2012), 3.1-3.10.

    [41]

    J. Hofbauer and W. H. Sandholm, On the global convergence of stochastic fictitious play, Econometrica, 70 (2002), 2265-2294.doi: 10.1111/1468-0262.00376.

    [42]

    J. Hofbauer, S. Sorin and Y. Viossat, Time average replicator and best-reply dynamics, Math. Oper. Res., 34 (2009), 263-269.doi: 10.1287/moor.1080.0359.

    [43]

    S. M. Kakade and D. P. Foster, Deterministic calibration and Nash equilibrium, in Learning theory, vol. 3120 of Lecture Notes in Comput. Sci., Springer, Berlin, (2004), 33-48.doi: 10.1007/978-3-540-27819-1_3.

    [44]

    O. Kallenberg and R. Sztencel, Some dimension-free features of vector-valued martingales, Probability Theory and Related Fields, 88 (1991), 215-247.doi: 10.1007/BF01212560.

    [45]

    E. Kohlberg, Optimal strategies in repeated games with incomplete information, Internat. J. Game Theory, 4 (1975), 7-24.doi: 10.1007/BF01766399.

    [46]

    J. Kwon, Hilbert Distance, Bounded Convex Functions, and Application to the Exponential Weight Algorithm, Master's thesis, ENS Lyon, 2012.

    [47]

    E. Lehrer, Any inspection is manipulable, Econometrica, 69 (2001), 1333-1347.doi: 10.1111/1468-0262.00244.

    [48]

    E. Lehrer, Approachability in infinite dimensional spaces, Internat. J. Game Theory, 31 (2002), 253-268.doi: 10.1007/s001820200115.

    [49]

    E. Lehrer, A wide range no-regret theorem, Games Econom. Behav., 42 (2003), 101-115.doi: 10.1016/S0899-8256(03)00032-0.

    [50]

    E. Lehrer and E. Solan, Excludability and bounded computational capacity, Math. Oper. Res., 31 (2006), 637-648.doi: 10.1287/moor.1060.0211.

    [51]

    E. Lehrer and E. Solan, Learning to play partially-specified equilibrium, Manuscript.

    [52]

    E. Lehrer and E. Solan, Approachability with bounded memory, Games Econom. Behav., 66 (2009), 995-1004.doi: 10.1016/j.geb.2007.07.011.

    [53]

    E. Lehrer and S. Sorin, Minmax via differential inclusion, J. Conv. Analysis, 14 (2007), 271-273.

    [54]

    N. Littlestone and M. Warmuth, The weighted majority algorithm, Information and Computation, 108 (1994), 212-261.doi: 10.1006/inco.1994.1009.

    [55]

    R. D. Luce and H. Raiffa, Games and Decisions: Introduction and Critical Survey, John Wiley & Sons Inc., New York, N. Y., 1957.

    [56]

    S. Mannor and N. Shimkin, Regret minimization in repeated matrix games with variable stage duration, Games Econom. Behav., 63 (2008), 227-258.doi: 10.1016/j.geb.2007.07.006.

    [57]

    S. Mannor and G. Stoltz, A geometric proof of calibration, Math. Oper. Res., 35 (2010), 721-727.doi: 10.1287/moor.1100.0465.

    [58]

    S. Mannor, G. Stoltz and V. Perchet, Robust approachability and regret minimization in games with partial monitoring, J. Mach. Learn. Res.: Workshop Conf. Proc., 19 (2011), 515-536.

    [59]

    S. Mannor, G. Stoltz and V. Perchet, Set-valued approachability, with applications to regret minimization in games with partial monitoring, Manuscript.

    [60]

    S. Mannor and J. N. Tsitsiklis, Approachability in repeated games: Computational aspects and a Stackelberg variant, Games Econom. Behav., 66 (2009), 315-325.doi: 10.1016/j.geb.2008.03.008.

    [61]

    D. McFadden, Conditional logit analysis of qualitative choice behavior, Frontiers in econometrics, Academic Press: New York, 1974, 105-142.

    [62]

    J.-F. Mertens, S. Sorin and S. Zamir, Repeated games, CORE discussion paper, (1994), 9420-9422.doi: 10.1057/9780230226203.3424.

    [63]

    J. Neveu, Bases Mathématiques du Calcul des Probabilités, Masson et Cie, éditeurs, Paris, 1970.

    [64]

    J. Neveu, Martingales à Temps Discret, Masson et Cie, éditeurs, Paris, 1972.

    [65]

    D. Oakes, Self-calibrating priors do not exist, J. Amer. Statist. Assoc., 80 (1985), 339-342, With comments by A. P. Dawid and Mark J. Schervish.doi: 10.1080/01621459.1985.10478117.

    [66]

    W. Olszewski, Calibration and expert testing, in Handbook of Game Theory (eds. P. Young and S. Zamir), vol. IV, to appear.

    [67]

    V. Perchet, Calibration and internal no-regret with random signals, Algorithmic learning theory, 68-82, Lecture Notes in Comput. Sci., 5809, Springer, Berlin, 2009.doi: 10.1007/978-3-642-04414-4_10.

    [68]

    V. Perchet, Approchabilité, Calibration et Regret dans les Jeux à Observations Partielles (in French), PhD thesis, Université Pierre et Marie Curie, 2010.

    [69]

    V. Perchet, Approachability of convex sets in games with partial monitoring, J. Optim. Theory Appl., 149 (2011), 665-677.doi: 10.1007/s10957-011-9797-3.

    [70]

    V. Perchet, No-regret with partial monitoring: Calibration-based optimal algorithms, J. Mach. Learn. Res., 12 (2011), 1893-1921.

    [71]

    V. Perchet, Exponential weights approachability, applications to regret minimization and calibration, Manuscript.

    [72]

    V. Perchet and M. Quincampoix, Purely informative game: Application to approachability with partial monitoring, Manuscript.

    [73]

    A. Rakhlin, Lecture notes on online learning, Manuscript.

    [74]

    A. Rakhlin, K. Sridharan and A. Tewari, Online Learning: Random Averages, Combinatorial Parameters, and Learnability, in Neural Information Processing Systems, 2010.

    [75]

    A. Rakhlin, K. Sridharan and A. Tewari, Online learning: Beyond regret, J. Mach. Learn. Res.: Workshop Conf. Proc., 19 (2011), 559-594.

    [76]

    W. Rudin, Real and Complex Analysis, McGraw-Hill Series in Higher Mathematics, New York, 1974.

    [77]

    A. Sandroni, R. Smorodinsky and R. V. Vohra, Calibration with many checking rules, Math. Oper. Res., 28 (2003), 141-153.doi: 10.1287/moor.28.1.141.14264.

    [78]

    E. Seneta, Nonnegative Matrices and Markov Chains, 2nd edition, Springer Series in Statistics, Springer-Verlag, New York, 1981.

    [79]

    S. Sorin, A First Course on Zero-Sum Repeated Games, Springer-Verlag, 2002.

    [80]

    S. Sorin, Lectures on Dynamics in Games, Unpublished Lecture Notes, 2008.

    [81]

    S. Sorin, Exponential weight algorithm in continuous time, Math. Program., 116 (2009), 513-528.doi: 10.1007/s10107-007-0111-y.

    [82]

    X. Spinat, A necessary and sufficient condition for approachability, Math. Oper. Res., 27 (2002), 31-44.doi: 10.1287/moor.27.1.31.333.

    [83]

    G. Stoltz, Incomplete Information and Internal Regret in Prediction of Individual Sequences, PhD thesis, Université Paris-Sud, 2005.

    [84]

    G. Stoltz and G. Lugosi, Internal regret in on-line portfolio selection, Mach. Learn., 59 (2005), 125-159.doi: 10.1007/s10994-005-0465-4.

    [85]

    G. Stoltz and G. Lugosi, Learning correlated equilibria in games with compact sets of strategies, Games Econom. Behav., 59 (2007), 187-208.doi: 10.1016/j.geb.2006.04.007.

    [86]

    N. Vieille, Weak approachability, Math. Oper. Res., 17 (1992), 781-791.doi: 10.1287/moor.17.4.781.

    [87]

    Y. Viossat and A. Zapechelnyuk, No-regret dynamics and fictitious play, J. Econom. Theory, 148 (2013), 825-842.doi: 10.1016/j.jet.2012.07.003.

    [88]

    V. Vovk, Aggregating strategies, in Proceedings of the 3rd Annual Workshop on Computational Learning Theory, 1990, 372-383.

    [89]

    V. Vovk, I. Nouretdinov, A. Takemura and G. Shafer, Defensive forecasting for linear protocols, in Algorithmic Learning Theory, vol. 3734 of Lecture Notes in Comput. Sci., Springer, Berlin, (2005), 459-473.doi: 10.1007/11564089_35.

    [90]

    D. Walkup and R. J.-B. Wets, A lipschitzian characterization of convex polyhedra, Proceedings of the American Mathematical Society, 23 (1969), 167-173.doi: 10.1090/S0002-9939-1969-0246200-8.

    [91]

    A. Zapechelnyuk, Better-reply dynamics with bounded recall, Math. Oper. Res., 33 (2008), 869-879.doi: 10.1287/moor.1080.0323.

    [92]

    M. Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, in Proceedings of the Twentieth International Conference on Machine Learning (ICML), 2003.

  • 加载中
SHARE

Article Metrics

HTML views() PDF downloads(324) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return