Article Contents
Article Contents

A combinatorial optimization approach to the selection of statistical units

• In the case of some large statistical surveys, the set of units that will constitute the scope of the survey must be selected. We focus on the real case of a Census of Agriculture, where the units are farms. Surveying each unit has a cost and brings a different portion of the whole information. In this case, one wants to determine a subset of units producing the minimum total cost for being surveyed and representing at least a certain portion of the total information. Uncertainty aspects also occur, because the portion of information corresponding to each unit is not perfectly known before surveying it. The proposed approach is based on combinatorial optimization, and the arising decision problems are modeled as multidimensional binary knapsack problems. Experimental results show the effectiveness of the proposed approach.
Mathematics Subject Classification: Primary: 90C90, 90C06; Secondary: 05A99.

 Citation:

•  [1] E. Balas, Facets of the knapsack polytope, Mathematical Programming, 8 (1975), 146-164.doi: 10.1007/BF01580440. [2] R. M. Bell and M. L. Cohen, Coverage Measurement in the 2010 Census - Panel on Correlation Bias and Coverage Measurement in the 2010 Decennial Census, The National Academic Press, Washington, D.C., 2008. [3] G. Bianchi, R. Bruni and A. Reale, Information reconstruction via discrete optimization for agricultural census data, Applied Mathematical Sciences, 6 (2012), 6241-6251. [4] G. Bianchi, R. Bruni and A. Reale, Balancing of agricultural census data by using discrete optimization, Optimization Letters, 8 (2014), 1553-1565.doi: 10.1007/s11590-013-0652-3. [5] G. Bianchi, R. Bruni and A. Reale, Open source integer linear programming solvers for error localization in numerical data, In Advances in Theoretical and Applied Statistics (eds. N. Torelli, F. Pesarin and A. Bar-Hen), Springer, New York, NY, 2012.doi: 10.1007/978-3-642-35588-2_28. [6] E. Boros, A. Scozzari, F. Tardella and P. Veneziani, Polynomially computable bounds for the probability of the union of events, Mathematics of Operations Research, 39 (2014), 1311-1329.doi: 10.1287/moor.2014.0657. [7] R. Bruni, Discrete models for data imputation, Discrete Applied Mathematics, 144 (2004), 59-69.doi: 10.1016/j.dam.2004.04.004. [8] R. Bruni, Error correction for massive data sets, Optimization Methods and Software, 20 (2005), 295-314.doi: 10.1080/10556780512331318281. [9] European Parliament, Regulation of the European Parliament, N. 1166/2008, 2008. [10] Food and Agriculture Organization of the United Nations (FAO), A System of Integrated Agricultural Censuses and Surveys, Vol.1 - World Programme for the Census of Agriculture 2010. FAO Statistical Development Series (2005). [11] M. Ferri and M. Piccioni, Optimal selection of statistical units: An approach via simulated annealing, Computational Statistics & Data Analysis, 13 (1992), 47-61.doi: 10.1016/0167-9473(92)90153-7. [12] M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-completeness, W.H. Freeman, San Francisco, CA, 1979. [13] R. M. Groves, F. J. Jr. Fowler, M. P. Couper, J. M. Lepkowski, E. Singer and R. Tourangeau, Survey Methodology, Wiley Series in Survey Methodology, John Wiley & Sons Inc., Hoboken, NJ, 2009. [14] T. Hailperin, Best possible inequalities for the probability of a logical function of events, The American Mathematical Monthly, 72 (1965), 343-359.doi: 10.2307/2313491. [15] P. L. Hammer, E. L. Johnson and U. N. Peled, Facets of regular 0-1 polytopes, Mathematical Programming, 8 (1975), 179-206.doi: 10.1007/BF01580442. [16] W. K. Kremers, Completeness and unbiased estimation for sum-quota sampling, Journal of the American Statistical Association, 81 (1986), 1070-1073. [17] H. Marchand, A. Martin, R. Weismantel and L. Wolsey, Cutting planes in integer and mixed integer programming, Discrete Applied Mathematics, 123 (2002), 397-446.doi: 10.1016/S0166-218X(01)00348-1. [18] C. A. Moser, Quota sampling, Journal of the Royal Statistical Society, 115 (1952), 411-423.doi: 10.2307/2980740. [19] A. Mucherino, P. Papajorgji and P. M. Pardalos, Data Mining in Agriculture, Springer, New York, NY, 2009.doi: 10.1007/978-0-387-88615-2. [20] K. G. Murty, Linear Programming, John Wiley & Sons Inc., New York, NY, 1983. [21] G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization, John Wiley, New York, NY, 1988.doi: 10.1002/9781118627372. [22] T. M. F. Smith, Populations and selection: Limitations of statistics, Journal of the Royal Statistical Society Series A (Statistics in Society), 156 (1993), 144-166.doi: 10.2307/2982726. [23] S. Sudman, Probability sampling with quotas, Journal of the American Statistical Association, 61 (1966), 749-771.doi: 10.1080/01621459.1966.10480903. [24] L. A. Wolsey, Faces for a linear inequality in 0-1 variables, Mathematical Programming, 8 (1975), 165-178.doi: 10.1007/BF01580441. [25] Y. Zhang, F. Zhang and M. Cai, Some new results on multi-dimension Knapsack problem, Journal of Industrial and Management Optimization, 1 (2005), 315-321.doi: 10.3934/jimo.2005.1.315.