Advanced Search
Article Contents
Article Contents

Unsupervised robust nonparametric learning of hidden community properties

  • * Corresponding author: Mikhail Langovoy

    * Corresponding author: Mikhail Langovoy 
Abstract Full Text(HTML) Figure(3) / Table(1) Related Papers Cited by
  • We consider learning of fundamental properties of communities in large noisy networks, in the prototypical situation where the nodes or users are split into two classes according to a binary property, e.g., according to their opinions or preferences on a topic. For learning these properties, we propose a nonparametric, unsupervised, and scalable graph scan procedure that is, in addition, robust against a class of powerful adversaries. In our setup, one of the communities can fall under the influence of a knowledgeable adversarial leader, who knows the full network structure, has unlimited computational resources and can completely foresee our planned actions on the network. We prove strong consistency of our results in this setup with minimal assumptions. In particular, the learning procedure estimates the baseline activity of normal users asymptotically correctly with probability 1; the only assumption being the existence of a single implicit community of asymptotically negligible logarithmic size. We provide experiments on real and synthetic data to illustrate the performance of our method, including examples with adversaries.

    Mathematics Subject Classification: Primary: 62G05, 90B15, 68T05; Secondary: 65C60, 62G35.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Graph scan estimators for the artificial large graph

    Figure 2.  Political blogosphere graph for the 2004 Elections

    Figure 3.  Graph scan estimators for the 2004 Elections graph

    Table 1.  Multi-step adversary

    $ k $ 100 200 500 700 800
    $ \#(N_w=0) $ 98 98 78 56 55
    $ \#(N_w=1) $ 2 2 16 22 17
    $ \#(N_w=2) $ 0 0 3 16 12
    $ \#(N_w = 3) $ 0 0 3 5 9
    $ \#(N_w \geq 4) $ 0 0 0 1 7
    $ \mu_{\hat{a}} $ 1.705 1.815 1.913 1.949 1.961
    $ \max{\hat{a}} $ 1.806 1.903 1.980 2.030 2.031
    $ \min{\hat{a}} $ 1.603 1.686 1.824 1.848 1.879
    $ \sigma_{\hat{a}} $ 0.048 0.040 0.033 0.034 0.029
     | Show Table
    DownLoad: CSV
  • [1] F. M. Aarestrup and M. G. Koopmans, Sharing data for global infectious disease surveillance and outbreak detection, Trends in Microbiology, 24 (2016), 241-245.  doi: 10.1016/j.tim.2016.01.009.
    [2] L. A. Adamic and N. Glance, The political blogosphere and the 2004 us election: Divided they blog, in Proceedings of the 3rd International Workshop on Link Discovery, ACM, 2005, 36–43.
    [3] S. An, P. Peursum, W. Liu and S. Venkatesh, Efficient algorithms for subwindow search in object detection and localization, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, 264–271.
    [4] E. Arias-Castro and G. Grimmett, Cluster detection in networks using percolation, Bernoulli, 19 (2013), 676-719.  doi: 10.3150/11-BEJ412.
    [5] S. N. Bernstein, On certain modifications of chebyshev's inequality, Doklady Akademii Nauk SSSR, 17 (1937), 275-277. 
    [6] F. Chen and D. B. Neill, Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs, in ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, 1166–1175. doi: 10.1145/2623330.2623619.
    [7] V. CheplyginaM. de Bruijne and J. P. Pluim, Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis, Medical Image Analysis, 54 (2019), 280-296.  doi: 10.1016/j.media.2019.03.009.
    [8] C. Deng, Y. Han and B. Zhao, High-performance visual tracking with extreme learning machine framework, IEEE Transactions on Cybernetics(Early Access), 2019, 1–12. doi: 10.1109/TCYB.2018.2886580.
    [9] D. Geman and B. Jedynak, An active testing model for tracking roads in satellite images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 18 (1996), 1-14.  doi: 10.1109/34.476006.
    [10] G. Grimmett, Percolation, vol. 321 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 2nd edition, Springer-Verlag, Berlin, 1999. doi: 10.1007/978-3-662-03981-6.
    [11] M. Han and Y. Li, Influence analysis: A survey of the state-of-the-art, Mathematical Foundations of Computing, 1 (2018), 201-253.  doi: 10.3934/mfc.2018010.
    [12] M. Kulldorff, A spatial scan statistic, Communications in Statistics-Theory and Methods, 26 (1997), 1481-1496.  doi: 10.1080/03610929708831995.
    [13] M. Kulldorff, Spatial scan statistics: Models, calculations, and applications, in Scan Statistics and Applications, Springer, 1999, 303–322. doi: 10.1007/978-1-4612-1578-3_14.
    [14] M. KulldorffL. HuangL. Pickle and L. Duczmal, An elliptic spatial scan statistic, Statistics in medicine, 25 (2006), 3929-3943.  doi: 10.1002/sim.2490.
    [15] M. Langovoy, M. Habeck and B. Schoelkopf, Adaptive nonparametric detection in cryo-electron microscopy, in Proceedings of the 58-th World Statistical Congress, Session: High Dimensional Data, 2011, 4456–4461.
    [16] M. Langovoy, M. Habeck and B. Schoelkopf, Spatial statistics, image analysis and percolation theory, in The Joint Statistical Meetings Proceedings, Time Series and Network Section, American Statistical Association, Alexandria, VA, 2011, 5571–5581.
    [17] M. Langovoy and O. Wittich, Detection of Objects in Noisy Images and Site Percolation on Square Lattices, EURANDOM Report No. 2009-035, EURANDOM, Eindhoven, 2009.
    [18] M. Langovoy and O. Wittich, Randomized algorithms for statistical image analysis and site percolation on square lattices, Statistica Neerlandica, 67 (2013), 337-353.  doi: 10.1111/stan.12010.
    [19] M. Langovoy and O. Wittich, Robust nonparametric detection of objects in noisy images, Journal of Nonparametric Statistics, 25 (2013), 409-426.  doi: 10.1080/10485252.2012.759570.
    [20] Y. Liu, B. Zhou, F. Chen and D. W. Cheung, Graph topic scan statistic for spatial event detection, in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, 489–498. doi: 10.1145/2983323.2983744.
    [21] T. McInerney and D. Terzopoulos, Deformable models in medical image analysis: A survey, Medical Image Analysis, 1 (1996), 91-108.  doi: 10.1016/S1361-8415(96)80007-7.
    [22] D. B. Neill, A. W. Moore and G. F. Cooper, A bayesian spatial scan statistic, Advances in Neural Information Processing Systems (NIPS), 18 (2006), 1003.
    [23] G. P. Patil and C. Taillie, Upper level set scan statistic for detecting arbitrarily shaped hotspots, Environmental and Ecological statistics, 11 (2004), 183-197.  doi: 10.1023/B:EEST.0000027208.48919.7e.
    [24] L. D. Rotz and J. M. Hughes, Advances in detecting and responding to threats from bioterrorism and emerging infectious disease, Nature Medicine, 10 (2004), S130–S136. doi: 10.1038/nm1152.
    [25] Y. Ruan, D. Fuhry and S. Parthasarathy, Efficient community detection in large networks using content and links, in Proceedings of the 22nd International Conference on World Wide Web, ACM, 2013, 1089–1098. doi: 10.1145/2488388.2488483.
    [26] J. SharpnackA. Singh and A. Rinaldo, Changepoint detection over graphs with the spectral scan statistic, AISTATS, 13 (2013), 545-553. 
    [27] J. L. Sharpnack, A. Krishnamurthy and A. Singh, Near-optimal anomaly detection in graphs using Lovasz extended scan statistic, in Advances in Neural Information Processing Systems (NIPS), 2013, 1959–1967.
    [28] T. Tango and K. Takahashi, A flexibly shaped spatial scan statistic for detecting clusters, International Journal of Health Geographics, 4 (2005), 11.
    [29] J. V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, 1937.
    [30] Y. Wang, Y. Feng, J. Luo and X. Zhang, Voting with feet: Who are leaving hillary clinton and donald trump, in 2016 IEEE International Symposium on Multimedia (ISM), IEEE, 2016, 71–76. doi: 10.1109/ISM.2016.0022.
    [31] J. Yang, J. McAuley and J. Leskovec, Community detection in networks with node attributes, in Data Mining (ICDM), 2013 IEEE 13th International Conference on, IEEE, 2013, 1151–1156. doi: 10.1109/ICDM.2013.167.
  • 加载中




Article Metrics

HTML views(1191) PDF downloads(245) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint