Article Contents
Article Contents

# On the number of distinct k-decks: Enumeration and bounds

Parts of this paper were presented at the 19th International Symposium on Communications and Information Technologies (ISCIT), at Ho Chi Minh City, Vietnam [4].

The research of Alexander Vardy and Hanwen Yao was supported in part by the US National Science Foundation under grant CCF-1764104

• The $k$-deck of a sequence is defined as the multiset of all its subsequences of length $k$. Let $D_k(n)$ denote the number of distinct $k$-decks for binary sequences of length $n$. For binary alphabet, we determine the exact value of $D_k(n)$ for small values of $k$ and $n$, and provide asymptotic estimates of $D_k(n)$ when $k$ is fixed.

Specifically, for fixed $k$, we introduce a trellis-based method to compute $D_k(n)$ in time polynomial in $n$. We then compute $D_k(n)$ for $k \in \{3,4,5,6\}$ and $k \leqslant n \leqslant 30$. We also improve the asymptotic upper bound on $D_k(n)$, and provide a lower bound thereupon. In particular, for binary alphabet, we show that $D_k(n) = O\bigl(n^{(k-1)2^{k-1}+1}\bigr)$ and $D_k(n) = \Omega(n^k)$. For $k = 3$, we moreover show that $D_3(n) = \Omega(n^6)$ while the upper bound on $D_3(n)$ is $O(n^9)$.

Mathematics Subject Classification: Primary: 68R05, 68R15.

 Citation:

• Figure 1.  Trellis section for $k = 2$ and levels $i\in\{4,5\}$. Left vertices belong to $\mathbb{D}_2(4)$, while right vertices belong to $\mathbb{D}_2(5)$. Blue solid edges correspond to label '0', while red dashed edges correspond to label '1'

Table 1.  Values of $D_k(n)$ for $3 \leqslant k \leqslant 6$ and $k \leqslant n \leqslant 30$. Values highlighted in bold correspond to $D_k(S(k))$

 $n\backslash k$ 3 4 5 6 3 8 — — — 4 16 16 — — 5 32 32 32 — 6 64 64 64 64 7 126 128 128 128 8 247 256 256 256 9 480 512 512 512 10 926 1024 1024 1024 11 1764 2048 2048 2048 12 3337 4092 4096 4096 13 6208 8176 8192 8192 14 11408 16328 16384 16384 15 20608 32604 32768 32768 16 36649 65075 65534 65536 17 63976 129824 131064 131072 18 109866 258906 262120 262144 19 185012 516168 524212 524288 20 306285 1028448 1048360 1048576 21 497190 2048272 2096586 2097152 22 792920 4077316 4192896 4194304 23 1241936 8111400 8385216 8388608 24 1913566 16124458 16769254 16777216 25 2898574 32034016 33536094 33554432 26 4323980 63579386 67067294 67108864 27 6353060 126076522 134124596 134217728 28 9206137 249736704 268228914 268435456 29 13158574 494124382 536416730 536870912 30 18576644 976302888 1072750464 1073741820
•  [1] J. Acharya, H. Das, O. Milenkovic, A. Orlitsky and S. Pan, String reconstruction from substring compositions, SIAM J. Discrete Math., 29 (2015), 1340-1371.  doi: 10.1137/140962486. [2] G. Birkhoff, Lattice Theory, American Mathematical Society, New York, 1940. [3] Z. Chang, J. Chrisnata, M. F. Ezerman and H. M. Kiah, Rates of DNA sequence profiles for practical values of read lengths, EEE Trans. Inform. Theory, 63 (2017), 7166-7177.  doi: 10.1109/TIT.2017.2747557. [4] J. Chrisnata, H. M. Kiah, S. Rao, A. Vardy, E. Yaakobi and H. Yao, On the number of distinct $k$-Decks: Enumeration and bounds, International Symposium on Communications and Information Technologies (ISCIT), Ho Chi Minh City, Vietnam, 2019,519–524. doi: 10.1109/ISCIT.2019.8905191. [5] C. Choffrut and J. Karhumäki, Combinatorics of words, Handbook of Formal Languages, Springer, Berlin, 1 (1997), 329-438. [6] M. Dudik and L. J. Schulman, Reconstruction from subsequences, J. Combin. Theory Ser. A, 103 (2003), 337-348.  doi: 10.1016/S0097-3165(03)00103-1. [7] D. D. Freydenberger, P. Gawrychowski, J. Karhumäki, F. Manea and W. Rytter, Testing $k$-binomial equivalence, Multidisciplinary Creativity: Homage to Gheorghe Paun on his 65th Birthday, Ed. Spandugino, Bucharest, Romania, (2015), 239–248. [8] P. Fleischmann, M. Lejeune, F. Manea, D. Nowotka and M. Rigo, Reconstructing words from right-bounded-block words, Developments in Language Theory, Lecture Notes in Comput. Sci., 12086, Springer, Cham, (2020), 96–109. doi: 10.1007/978-3-030-48516-0_8. [9] R. Gabrys and O. Milenkovic, The hybrid $k$-deck problem: Reconstructing sequences from short and long traces, IEEE International Symposium on Information Theory (ISIT), (2017), 1306–1310. doi: 10.1109/ISIT.2017.8006740. [10] R. Gabrys and O. Milenkovic, Unique reconstruction of coded strings from multiset substring spectra, IEEE Trans. Inform. Theory, 65 (2019), 7682-7696.  doi: 10.1109/TIT.2019.2935973. [11] L. O. Kalashnik, The reconstruction of a word from fragments, Numerical Mathematics and Computer Technology, Akad. Nauk Ukrain. SSR Fiz.-Tehn-Inst. Nizkih Temperatur, Kharkov, (1973), 56–57. [12] H. M. Kiah, G. J. Puleo and O. Milenkovic, Codes for DNA sequence profiles, IEEE Trans. Inform. Theory, 62 (2016), 3125-3146.  doi: 10.1109/TIT.2016.2555321. [13] I. Krasikov and Y. Roditty, On a reconstruction problem for sequences, J. Combin. Theory Ser. A, 77 (1997), 344-348.  doi: 10.1006/jcta.1997.2732. [14] M. Lejeune, M. Rigo and M. Rosenfeld, The binomial equivalence classes of finite words, Internat. J. Algebra Comput., 30 (2020), 1375-1397.  doi: 10.1142/S0218196720500459. [15] P. Ligeti and P. Sziklai, Reconstruction from subwords, International Conference on Applied Informatics, Jan., (2004), 1–7. [16] J. L. Massey, Foundation and methods of channel encoding, Proc. Int. Conf. Information Theory and Systems, NTG-Fachberichte, 65 (1978). [17] B. Manvel, A. Meyerowitz, A. Schwenk, K. Smith and P. Stockmeyer, Reconstruction of sequences, Discrete Math., 94 (1991), 209-219.  doi: 10.1016/0012-365X(91)90026-X. [18] OEIS Foundation Inc, The On-Line Encyclopedia of Integer Sequences, 2019, available from: http://oeis.org/A258585. [19] M. Rigo and P. Salimov, Another generalization of abelian equivalence: Binomial complexity of infinite words, Theoret. Comput. Sci., 601 (2015), 47-57.  doi: 10.1016/j.tcs.2015.07.025. [20] J. Sakarovitch and I. Simon, Subwords, in Combinatorics on Words, edited by M. Lothaire (2nd ed.), Cambridge University Press, (1997), 105–42 [21] A. D. Scott, Reconstructing sequences, Discrete Math., 175 (1997), 231-238.  doi: 10.1016/S0012-365X(96)00153-7. [22] A. Vardy, Trellis structure of codes, Handbook of Coding Theory, North-Holland, Amsterdam, 1 (1998), 1989-2117. [23] S. M. H. T. Yazdi, H. M. Kiah, E. Garcia-Ruiz, J. Ma, H. Zhao and O. Milenkovic, DNA-based storage: Trends and methods, IEEE Transactions on Molecular, Biological and Multi-Scale Communications, 1 (2015), 230-248.  doi: 10.1109/TMBMC.2016.2537305.

Figures(1)

Tables(1)