# American Institute of Mathematical Sciences

March  2021, 3(1): 1-26. doi: 10.3934/fods.2021002

## The rankability of weighted data from pairwise comparisons

 1 Department of Computer Science and Software Engineering, California Polytechnic State University, San Luis Obispo, CA, USA 2 Department of Mathematics and Computer Science, Davidson College, Davidson, NC, USA 3 Mathematics Department, College of Charleston, Charleston, SC, USA

* Corresponding author: Langville

Received  August 2020 Revised  December 2020 Published  March 2021 Early access  January 2021

In prior work [4], Anderson et al. introduced a new problem, the rankability problem, which refers to a dataset's inherent ability to produce a meaningful ranking of its items. Ranking is a fundamental data science task with numerous applications that include web search, data mining, cybersecurity, machine learning, and statistical learning theory. Yet little attention has been paid to the question of whether a dataset is suitable for ranking. As a result, when a ranking method is applied to a dataset with low rankability, the resulting ranking may not be reliable.

Rankability paper [4] and its methods studied unweighted data for which the dominance relations are binary, i.e., an item either dominates or is dominated by another item. In this paper, we extend rankability methods to weighted data for which an item may dominate another by any finite amount. We present combinatorial approaches to a weighted rankability measure and apply our new measure to several weighted datasets.

Citation: Paul E. Anderson, Timothy P. Chartier, Amy N. Langville, Kathryn E. Pedings-Behling. The rankability of weighted data from pairwise comparisons. Foundations of Data Science, 2021, 3 (1) : 1-26. doi: 10.3934/fods.2021002
##### References:
 [1] T. Achterberg, Scip: Solving constraint integer programs, Mathematical Programming Computation, 1 (2009), 1-41.  doi: 10.1007/s12532-008-0001-1. [2] N. Ailon, M. Charikar and A. Newman, Aggregating inconsistent information: Ranking and clustering, Journal of the ACM, 55 (2008), 27 pp. doi: 10.1145/1411509.1411513. [3] I. Ali, W. D. Cook and M. Kress, On the minimum violations ranking of a tournament, Management Science, 32 (1986), 660-672.  doi: 10.1287/mnsc.32.6.660. [4] P. Anderson, T. Chartier and A. Langville, The rankability of data, SIAM Journal on the Mathematics of Data Science, (2019), 121–143. doi: 10.1137/18M1183595. [5] P. Anderson, T. Chartier, A. Langville and K. Pedings-Behling, IGARDS technical report, 2019. Available from: https://igards.github.io/research/IGARDS_Technical_Report_November_2019.pdf. [6] P. Anderson, T. Chartier, A. Langville and K. Pedings-Behling, Revisiting rankability of unweighted data technical report, 2020. Available from: https://igards.github.io/research/Revisiting_Rankability_Unweighted_Data.pdf. [7] R. D. Armstrong, W. D. Cook and L. M. Seiford, Priority ranking and consensus formation: The case of ties, Management Science, 28 (1982), 638-645.  doi: 10.1287/mnsc.28.6.638. [8] J. Brenner and P. Keating, Chaos kills, ESPN, The Magazine, (2016), 20–23. [9] S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Networks and ISDN Systems, 33 (1998), 107–117. doi: 10.1016/S0169-7552(98)00110-X. [10] S. Brin, L. Page, R. Motwami and T. Winograd, The PageRank citation ranking: Bringing order to the web, Tech. Report 1999-0120, Computer Science Department, Stanford University, 1999. [11] T. R. Cameron, A. N. Langville and H. C. Smith, On the graph Laplacian and the rankability of data, Linear Algebra and its Applications, 588 (2020), 81-100.  doi: 10.1016/j.laa.2019.11.026. [12] C. R. Cassady, L. M. Maillart and S. Salman, Ranking sports teams: A customizable quadratic assignment approach, INFORMS: Interfaces, 35 (2005), 497-510.  doi: 10.1287/inte.1050.0171. [13] T. P. Chartier, E. Kreutzer, A. N. Langville and K. E. Pedings, Sensitivity of ranking vectors, SIAM Journal on Scientific Computing, 33 (2011), 1077–1102. doi: 10.1137/090772745. [14] B. J. Coleman, Minimizing game score violations in college football rankings, INFORMS: Interfaces, 35 (2005), 483-496.  doi: 10.1287/inte.1050.0172. [15] W. D. Cook and L. M. Seiford, Priority ranking and consensus formation, Management Science, 24 (1978), 1721-1732.  doi: 10.1287/mnsc.24.16.1721. [16] T. G. Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, 10 (1998), 1895-1923.  doi: 10.1162/089976698300017197. [17] S. Dutta, S. H. Jacobson and J. J. Sauppe, Identifying ncaa tournament upsets using balance optimization subset selection, Journal of Quantitative Analysis in Sports, 13 (2017), 79-93. [18] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, 1979. [19] F. M. Harper and J. A. Konstan, The movielens datasets: History and context, Acm Transactions on Interactive Intelligent Systems (TIIS), 5 (2015), 1-19.  doi: 10.1145/2827872. [20] X. Jiang, L.-H. Lim, Y. Yao and Y. Ye, Statistical ranking and combinatorial Hodge theory, Mathematical Programming, 127 (2011), 203-244.  doi: 10.1007/s10107-010-0419-x. [21] P. Keating, personal communication. [22] Y. Kondo, Triangulation of input-output tables based on mixed integer programs for inter-temporal and inter-regional comparison of production structures, Journal of Economic Structures, 3 (2014), 1-19.  doi: 10.1186/2193-2409-3-2. [23] A. N. Langville and C. D. Meyer, Who's #1? The Science of Rating and Ranking Items, Princeton University Press, Princeton, 2012. [24] A. N. Langville, K. Pedings and Y. Yamamoto, A minimum violations ranking method, Optimization and Engineering, (2011), 1–22. doi: 10.1007/s11081-011-9135-5. [25] A. S. Lee and D. R. Shier, A method for ranking teams based on simple paths, UMAP Journal, 39 (2018), 353-371. [26] W. W. Leontief, Input-Output Economics, 2nd edition, Oxford University Press, 1986. doi: 10.1038/scientificamerican1051-15. [27] M. J. Lopez and G. J. Matthews, Building an NCAA men's basketball predictive model and quantifying its success, Journal of Quantitative Analysis in Sports, 11 (2015), 5-12.  doi: 10.1515/jqas-2014-0058. [28] R. Marti and G. Reinelt, The linear ordering problem: Exact and heuristic methods in combinatorial optimization, AMS, 2011. doi: 10.1007/978-3-642-16729-4. [29] S. Mehrotra and Y. Ye, Finding an interior point in the optimal face of linear programs, 62 (1993), 497–515. doi: 10.1007/BF01585180. [30] J. Park, On minimum violations ranking in paired comparisons, 2005. [31] G. Reinelt, The Linear Ordering Problem: Algorithms and Applications, Heldermann Verlag, 1985. [32] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, Berlin, Heidelberg, 1995. doi: 10.1007/978-1-4757-2440-0. [33] S. Wartenberg, How many people will fill out March Madness brackets?, The Columbus Dispatch, (2015).

show all references

##### References:
 [1] T. Achterberg, Scip: Solving constraint integer programs, Mathematical Programming Computation, 1 (2009), 1-41.  doi: 10.1007/s12532-008-0001-1. [2] N. Ailon, M. Charikar and A. Newman, Aggregating inconsistent information: Ranking and clustering, Journal of the ACM, 55 (2008), 27 pp. doi: 10.1145/1411509.1411513. [3] I. Ali, W. D. Cook and M. Kress, On the minimum violations ranking of a tournament, Management Science, 32 (1986), 660-672.  doi: 10.1287/mnsc.32.6.660. [4] P. Anderson, T. Chartier and A. Langville, The rankability of data, SIAM Journal on the Mathematics of Data Science, (2019), 121–143. doi: 10.1137/18M1183595. [5] P. Anderson, T. Chartier, A. Langville and K. Pedings-Behling, IGARDS technical report, 2019. Available from: https://igards.github.io/research/IGARDS_Technical_Report_November_2019.pdf. [6] P. Anderson, T. Chartier, A. Langville and K. Pedings-Behling, Revisiting rankability of unweighted data technical report, 2020. Available from: https://igards.github.io/research/Revisiting_Rankability_Unweighted_Data.pdf. [7] R. D. Armstrong, W. D. Cook and L. M. Seiford, Priority ranking and consensus formation: The case of ties, Management Science, 28 (1982), 638-645.  doi: 10.1287/mnsc.28.6.638. [8] J. Brenner and P. Keating, Chaos kills, ESPN, The Magazine, (2016), 20–23. [9] S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Networks and ISDN Systems, 33 (1998), 107–117. doi: 10.1016/S0169-7552(98)00110-X. [10] S. Brin, L. Page, R. Motwami and T. Winograd, The PageRank citation ranking: Bringing order to the web, Tech. Report 1999-0120, Computer Science Department, Stanford University, 1999. [11] T. R. Cameron, A. N. Langville and H. C. Smith, On the graph Laplacian and the rankability of data, Linear Algebra and its Applications, 588 (2020), 81-100.  doi: 10.1016/j.laa.2019.11.026. [12] C. R. Cassady, L. M. Maillart and S. Salman, Ranking sports teams: A customizable quadratic assignment approach, INFORMS: Interfaces, 35 (2005), 497-510.  doi: 10.1287/inte.1050.0171. [13] T. P. Chartier, E. Kreutzer, A. N. Langville and K. E. Pedings, Sensitivity of ranking vectors, SIAM Journal on Scientific Computing, 33 (2011), 1077–1102. doi: 10.1137/090772745. [14] B. J. Coleman, Minimizing game score violations in college football rankings, INFORMS: Interfaces, 35 (2005), 483-496.  doi: 10.1287/inte.1050.0172. [15] W. D. Cook and L. M. Seiford, Priority ranking and consensus formation, Management Science, 24 (1978), 1721-1732.  doi: 10.1287/mnsc.24.16.1721. [16] T. G. Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, 10 (1998), 1895-1923.  doi: 10.1162/089976698300017197. [17] S. Dutta, S. H. Jacobson and J. J. Sauppe, Identifying ncaa tournament upsets using balance optimization subset selection, Journal of Quantitative Analysis in Sports, 13 (2017), 79-93. [18] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, 1979. [19] F. M. Harper and J. A. Konstan, The movielens datasets: History and context, Acm Transactions on Interactive Intelligent Systems (TIIS), 5 (2015), 1-19.  doi: 10.1145/2827872. [20] X. Jiang, L.-H. Lim, Y. Yao and Y. Ye, Statistical ranking and combinatorial Hodge theory, Mathematical Programming, 127 (2011), 203-244.  doi: 10.1007/s10107-010-0419-x. [21] P. Keating, personal communication. [22] Y. Kondo, Triangulation of input-output tables based on mixed integer programs for inter-temporal and inter-regional comparison of production structures, Journal of Economic Structures, 3 (2014), 1-19.  doi: 10.1186/2193-2409-3-2. [23] A. N. Langville and C. D. Meyer, Who's #1? The Science of Rating and Ranking Items, Princeton University Press, Princeton, 2012. [24] A. N. Langville, K. Pedings and Y. Yamamoto, A minimum violations ranking method, Optimization and Engineering, (2011), 1–22. doi: 10.1007/s11081-011-9135-5. [25] A. S. Lee and D. R. Shier, A method for ranking teams based on simple paths, UMAP Journal, 39 (2018), 353-371. [26] W. W. Leontief, Input-Output Economics, 2nd edition, Oxford University Press, 1986. doi: 10.1038/scientificamerican1051-15. [27] M. J. Lopez and G. J. Matthews, Building an NCAA men's basketball predictive model and quantifying its success, Journal of Quantitative Analysis in Sports, 11 (2015), 5-12.  doi: 10.1515/jqas-2014-0058. [28] R. Marti and G. Reinelt, The linear ordering problem: Exact and heuristic methods in combinatorial optimization, AMS, 2011. doi: 10.1007/978-3-642-16729-4. [29] S. Mehrotra and Y. Ye, Finding an interior point in the optimal face of linear programs, 62 (1993), 497–515. doi: 10.1007/BF01585180. [30] J. Park, On minimum violations ranking in paired comparisons, 2005. [31] G. Reinelt, The Linear Ordering Problem: Algorithms and Applications, Heldermann Verlag, 1985. [32] V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, Berlin, Heidelberg, 1995. doi: 10.1007/978-1-4757-2440-0. [33] S. Wartenberg, How many people will fill out March Madness brackets?, The Columbus Dispatch, (2015).
Cityplot of $8 \times 8$ data matrix with original ordering and hillside reordering
Cityplots of $n = 8$ college football data matrices with the original ordering (left) and the optimal hillside reordering (right). The top row is the 2008 season, a less rankable season with hillside $\delta = 155$ and $\rho = 6$. The bottom row is the 2005 season, a more rankable season with hillside $\delta = 92$ and $\rho = 4$
Approximate fractional matrix ${\bf X}^*({\bf r}, {\bf r})$ for Example 1 obtained by the Interior Point solver of the linear programming relaxation
Spaghetti plots and summary of diversity of $P$ sets for Examples 1 and 2
Two maximally discordant optimal solutions for Examples 1 and 2
$X^*$ color-coded visualization of years 2009, 2013, and 2016 NCAA March Madness using the top performing LOP formulation
$X^*$ color-coded visualization of years 2009, 2013, and 2016 NCAA March Madness using the top performing hillside formulation
$X^*$ color-coded visualization for each year of NCAA March Madness using the top performing LOP formulation
$X^*$ color-coded visualization for each year of NCAA March Madness using the top performing hillside formulation
Sample Input/Output economic data based on Japan 2005 [22] (A) and its graphical representation in (B)
Sample of movie rating data based on MovieLens [19] (A) and graphical representation of user rating data transformed into pairwise comparisons (B)
Sample of NCAA Men's Basketball games is shown in (A) and the graphical representation of the aggregate dominance information, i.e., ${\bf D}$ is shown in (B)
5 fold cross-validation results for predicting the upset measure for NCAA Men's March Madness 2002-2018
 Dominance Method Parameters Method MAE 1 Direct+Indirect dt=0, st=1, wi=1 Hillside 3.148221 2 Direct+Indirect dt=0, st=0, wi=1 Hillside 3.148221 3 Direct+Indirect dt=1, st=0, wi=1 LOP 3.172793 4 Direct+Indirect dt=1, st=1, wi=1 LOP 3.172793 5 Direct+Indirect dt=0, st=0, wi=0.25 Hillside 3.213978 6 Direct+Indirect dt=0, st=1, wi=0.25 Hillside 3.213978 7 Direct dt=2 Hillside 3.309455 8 Direct+Indirect dt=2, st=1, wi=1 LOP 3.311588 9 Direct+Indirect dt=2, st=0, wi=1 LOP 3.311588 10 Direct+Indirect dt=2, st=2, wi=0.5 Hillside 3.331698
 Dominance Method Parameters Method MAE 1 Direct+Indirect dt=0, st=1, wi=1 Hillside 3.148221 2 Direct+Indirect dt=0, st=0, wi=1 Hillside 3.148221 3 Direct+Indirect dt=1, st=0, wi=1 LOP 3.172793 4 Direct+Indirect dt=1, st=1, wi=1 LOP 3.172793 5 Direct+Indirect dt=0, st=0, wi=0.25 Hillside 3.213978 6 Direct+Indirect dt=0, st=1, wi=0.25 Hillside 3.213978 7 Direct dt=2 Hillside 3.309455 8 Direct+Indirect dt=2, st=1, wi=1 LOP 3.311588 9 Direct+Indirect dt=2, st=0, wi=1 LOP 3.311588 10 Direct+Indirect dt=2, st=2, wi=0.5 Hillside 3.331698
 [1] Xiantao Xiao, Jian Gu, Liwei Zhang, Shaowu Zhang. A sequential convex program method to DC program with joint chance constraints. Journal of Industrial and Management Optimization, 2012, 8 (3) : 733-747. doi: 10.3934/jimo.2012.8.733 [2] Ted Greenwood. Superstar of the Sloan Minority Ph.D. Program. Mathematical Biosciences & Engineering, 2013, 10 (5&6) : 1539-1540. doi: 10.3934/mbe.2013.10.1539 [3] Xin Zhao, Jinyan Fan. On subspace properties of the quadratically constrained quadratic program. Journal of Industrial and Management Optimization, 2017, 13 (4) : 1625-1640. doi: 10.3934/jimo.2017010 [4] James H. Elder. A new training program in data analytics & visualization. Big Data & Information Analytics, 2016, 1 (1) : i-iii. doi: 10.3934/bdia.2016.1.1i [5] Thomas R. Cameron, Sebastian Charmot, Jonad Pulaj. On the linear ordering problem and the rankability of data. Foundations of Data Science, 2021, 3 (2) : 133-149. doi: 10.3934/fods.2021010 [6] Michal Kočvara, Jiří V. Outrata. Inverse truss design as a conic mathematical program with equilibrium constraints. Discrete and Continuous Dynamical Systems - S, 2017, 10 (6) : 1329-1350. doi: 10.3934/dcdss.2017071 [7] Renato Huzak. Cyclicity of degenerate graphic $DF_{2a}$ of Dumortier-Roussarie-Rousseau program. Communications on Pure and Applied Analysis, 2018, 17 (3) : 1305-1316. doi: 10.3934/cpaa.2018063 [8] Jie Zhang, Shuang Lin, Li-Wei Zhang. A log-exponential regularization method for a mathematical program with general vertical complementarity constraints. Journal of Industrial and Management Optimization, 2013, 9 (3) : 561-577. doi: 10.3934/jimo.2013.9.561 [9] Li Chu, Bo Wang, Jie Zhang, Hong-Wei Zhang. Convergence analysis of a smoothing SAA method for a stochastic mathematical program with second-order cone complementarity constraints. Journal of Industrial and Management Optimization, 2021, 17 (4) : 1863-1886. doi: 10.3934/jimo.2020050 [10] Feng Yang, Kok Lay Teo, Ryan Loxton, Volker Rehbock, Bin Li, Changjun Yu, Leslie Jennings. VISUAL MISER: An efficient user-friendly visual program for solving optimal control problems. Journal of Industrial and Management Optimization, 2016, 12 (2) : 781-810. doi: 10.3934/jimo.2016.12.781 [11] Anatole Katok, Federico Rodriguez Hertz. Rigidity of real-analytic actions of $SL(n,\Z)$ on $\T^n$: A case of realization of Zimmer program. Discrete and Continuous Dynamical Systems, 2010, 27 (2) : 609-615. doi: 10.3934/dcds.2010.27.609 [12] Renato Bruni, Gianpiero Bianchi, Alessandra Reale. A combinatorial optimization approach to the selection of statistical units. Journal of Industrial and Management Optimization, 2016, 12 (2) : 515-527. doi: 10.3934/jimo.2016.12.515 [13] Simone Göttlich, Oliver Kolb, Sebastian Kühn. Optimization for a special class of traffic flow models: Combinatorial and continuous approaches. Networks and Heterogeneous Media, 2014, 9 (2) : 315-334. doi: 10.3934/nhm.2014.9.315 [14] Yasmine Cherfaoui, Mustapha Moulaï. Biobjective optimization over the efficient set of multiobjective integer programming problem. Journal of Industrial and Management Optimization, 2021, 17 (1) : 117-131. doi: 10.3934/jimo.2019102 [15] Yong Xia, Yu-Jun Gong, Sheng-Nan Han. A new semidefinite relaxation for $L_{1}$-constrained quadratic optimization and extensions. Numerical Algebra, Control and Optimization, 2015, 5 (2) : 185-195. doi: 10.3934/naco.2015.5.185 [16] Artyom Nahapetyan, Panos M. Pardalos. A bilinear relaxation based algorithm for concave piecewise linear network flow problems. Journal of Industrial and Management Optimization, 2007, 3 (1) : 71-85. doi: 10.3934/jimo.2007.3.71 [17] Yanjun Wang, Shisen Liu. Relaxation schemes for the joint linear chance constraint based on probability inequalities. Journal of Industrial and Management Optimization, 2021  doi: 10.3934/jimo.2021132 [18] Gabrielle Demange. Collective attention and ranking methods. Journal of Dynamics and Games, 2014, 1 (1) : 17-43. doi: 10.3934/jdg.2014.1.17 [19] Mahmoud Ameri, Armin Jarrahi. An executive model for network-level pavement maintenance and rehabilitation planning based on linear integer programming. Journal of Industrial and Management Optimization, 2020, 16 (2) : 795-811. doi: 10.3934/jimo.2018179 [20] Elham Mardaneh, Ryan Loxton, Qun Lin, Phil Schmidli. A mixed-integer linear programming model for optimal vessel scheduling in offshore oil and gas operations. Journal of Industrial and Management Optimization, 2017, 13 (4) : 1601-1623. doi: 10.3934/jimo.2017009

Impact Factor:

## Tools

Article outline

Figures and Tables