C. C. Aggarwal, A framework for clustering massive-domain data streams, IEEE 25th International Conference on Data Engineering, 2009. ICDE'09. IEEE, pp.102-113, 2009.

R. Agrawal, J. Gehrke, and D. Gunopulos, Automatic subspace clustering of high dimensional data for data mining applications, ACM Sigmod Record, vol.27, pp.94-105, 1998.

J. Alneberg, B. S. Bjarnason, and I. De-bruijn, Binning metagenomic contigs by coverage and composition, Nat. Methods, vol.11, pp.1144-1146, 2014.

M. Ankerst, M. M. Breunig, and H. Kriegel, Optics: Ordering points to identify the clustering structure, ACM Sigmod Record, vol.28, pp.49-60, 1999.

P. Berkhin, A survey of clustering data mining techniques. Grouping Multidimensional Data, pp.25-71, 2006.

K. Berlin, S. Koren, and C. Chin, Assembling large genomes with single-molecule sequencing and localitysensitive hashing, Nat. Biotechnol, vol.33, pp.623-630, 2015.

O. Boydell, M. Landowski, and G. Wu, High-throughput continuous clustering of message streams. RealWorld Challenge for Data Stream Mining 2, 2013.

A. Z. Broder, On the resemblance and containment of documents, Proceedings of the Compression and Complexity of Sequences, pp.21-29, 1997.

B. Buchfink, C. Xie, and D. H. Huson, Fast and sensitive protein alignment using diamond, Nat. Methods, vol.12, pp.59-60, 2015.

J. Buhler, Efficient large-scale sequence comparison by locality-sensitive hashing, Bioinformatics, vol.17, pp.419-428, 2001.

M. S. Charikar, Similarity estimation techniques from rounding algorithms, Proceedings of the Thirty-Fourth Annual ACM symposium on Theory of Computing, pp.380-388, 2002.

P. Cheeseman and J. Stutz, Bayesian classification (autoclass): Theory and results, Advances in Knowledge Discovery and Data Mining. AAAI, pp.153-180, 1996.

A. Dasgupta, R. Kumar, and T. Sarlós, Fast locality-sensitive hashing, Proceedings of the 17 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.1073-1081, 2011.

M. Datar, N. Immorlica, and P. Indyk, Locality-sensitive hashing scheme based on p-stable distributions, Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp.253-262, 2004.

A. P. Dempster, N. M. Laird, R. , and D. B. , Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society. Series B (methodological), vol.39, pp.1-38, 1977.

L. Ertö-z, M. Steinbach, and V. Kumar, Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data, Proceedings of the 2003 SIAM International Conference on Data Mining. SIAM, pp.47-58, 2003.

L. Ertöz, M. Steinbach, and V. Kumar, Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach. Clustering and Information Retrieval, Network Theory and Applications, vol.11, 2004.

M. Ester, H. Kriegel, and J. Sander, A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96), pp.226-231, 1996.

S. Girotto, C. Pizzi, C. , and M. , Metaprob: Accurate metagenomic reads binning based on probabilistic sequence signatures, Bioinformatics, vol.32, pp.567-575, 2016.

A. Gkanogiannis, S. Gazut, and M. Salanoubat, A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes, BMC Bioinform, vol.17, p.311, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01868694

K. C. Gowda, K. , and G. , Agglomerative clustering using the concept of mutual nearest neighbourhood, Pattern Recognit, vol.10, pp.105-112, 1978.

S. Guha, R. Rastogi, and K. Shim, Cure: An efficient clustering algorithm for large databases, ACM Sigmod Record, vol.27, pp.73-84, 1998.

S. Har-peled, P. Indyk, and R. Motwani, Approximate nearest neighbor: Towards removing the curse of dimensionality, Theory Comput, vol.8, pp.321-350, 2012.

J. A. Hartigan and M. A. Wong, Algorithm as 136: A k-means clustering algorithm, Journal of the Royal Statistical Society. Series C (Applied Statistics), vol.28, issue.1, pp.100-108, 1979.

T. Hastie, R. Tibshirani, and J. Friedman, Overview of supervised learning. The elements of statistical learning, pp.9-41, 2009.

T. Haveliwala, A. Gionis, and P. Indyk, Scalable techniques for clustering the web, Proceedings of the WebDB Workshop, pp.129-134, 2000.

A. Hinneburg and D. A. Keim, An efficient approach to clustering in large multimedia databases with noise, Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, vol.98, pp.58-65, 1998.

M. Holtgrewe, Mason-A Read Simulator for Second Generation Sequencing Data, 2010.

Z. Huang, Clustering large data sets with mixed numeric and categorical values, Proceedings of the 1st PacificAsia Conference on Knowledge Discovery and Data Mining (PAKDD), pp.21-34, 1997.

Z. Huang, Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge Discovery 2, pp.283-304, 1998.

L. Hubert, A. , and P. , Comparing partitions, J. Classif, vol.2, pp.193-218, 1985.

P. Indyk and R. Motwani, Approximate nearest neighbors: towards removing the curse of dimensionality, Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp.604-613, 1998.

A. K. Jain and R. C. Dubes, Algorithms for Clustering Data, 1988.

R. A. Jarvis, P. , and E. A. , Clustering using a similarity measure based on shared near neighbors, IEEE Trans. Comput, vol.100, pp.1025-1034, 1973.

G. Karypis, E. Han, and V. Kumar, Chameleon: Hierarchical clustering using dynamic modeling, Computer, vol.32, pp.68-75, 1999.

N. Katayama and S. Satoh, The sr-tree: An index structure for high-dimensional nearest neighbor queries, ACM Sigmod Record, vol.26, pp.369-380, 1997.

L. Kaufman, R. , and P. , Clustering by Means of Medoids, 1987.

H. Koga, T. Ishibashi, and T. Watanabe, Fast hierarchical clustering algorithm using locality-sensitive hashing, Discovery Science, pp.114-128, 2004.

T. Kohonen, Self-Organizing Maps. Springer Series in Information Sciences, 30, 2001.

S. Kotsiantis, P. , and P. , Recent advances in clustering: A brief survey, WSEAS Transactions on Information Science and Applications, vol.1, pp.73-81, 2004.

H. Kriegel, P. Kröger, and A. Zimek, Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering, ACM Transactions on Knowledge Discovery from Data (TKDD), vol.3, issue.1, 2009.

J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining of Massive Datasets, 2014.

W. Liao, Y. Liu, C. , and A. , A grid-based clustering algorithm using adaptive mesh refinement, 7th Workshop on Mining Scientific and Engineering Datasets of SIAM International Conference on Data Mining, pp.61-69, 2004.

Q. Lv, W. Josephson, and Z. Wang, Multi-probe lsh: Efficient indexing for high-dimensional similarity search, Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, pp.950-961, 2007.

G. Mclachlan and T. Krishnan, The EM Algorithm and Extensions, vol.382, 2007.

P. Moëllic, J. Haugeard, and G. Pitel, Image clustering based on a shared nearest neighbors approach for tagged collections, Proceedings of the 2008 International Conference on Content-Based Image and Video Retrieval, pp.269-278, 2008.

R. Ounit, S. Wanamaker, and T. J. Close, Clark: Fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers, BMC Genomics, vol.16, p.236, 2015.

A. K. Patidar, J. Agrawal, and N. Mishra, Analysis of different similarity measure functions and their impacts on shared nearest neighbor clustering approach, International Journal of Computer Applications, vol.40, pp.975-8887, 2012.

L. Paulevé, H. Jégou, A. , and L. , Locality sensitive hashing: A comparison of hash function types and querying mechanisms, Pattern Recognit. Lett, vol.31, pp.1348-1358, 2010.

Z. Rasheed, H. Rangwala, B. , and D. , Efficient clustering of metagenomic sequences using locality sensitive hashing, Proceedings of the 2012 SIAM International Conference on Data Mining. SIAM, pp.1023-1034, 2012.

L. Rokach, M. , and O. , Clustering methods. Data Mining and Knowledge Discovery Handbook, pp.321-352, 2005.

A. Rosenberg and J. Hirschberg, V-measure: A conditional entropy-based external cluster evaluation measure, Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, vol.7, pp.410-420, 2007.

G. Sheikholeslami, S. Chatterjee, and A. Zhang, Wavecluster: A multi-resolution clustering approach for very large spatial databases, VLDB 98, pp.428-439, 1998.
DOI : 10.1007/s007780050009

M. Steinbach, L. Ertöz, and V. Kumar, The challenges of clustering high dimensional data, New Directions in Statistical Physics, pp.273-309, 2004.

O. Tanaseichuk, J. Borneman, J. , and T. , Separating metagenomic short reads into genomes via clustering, Algorithms Mol. Biol, vol.7, p.27, 2012.
DOI : 10.1186/1748-7188-7-27

URL : https://almob.biomedcentral.com/track/pdf/10.1186/1748-7188-7-27

J. Wang, H. T. Shen, and J. Song, Hashing for similarity search: A survey, 2014.

W. Wang, J. Yang, and R. Muntz, Sting: A statistical information grid approach to spatial data mining, VLDB 97, pp.186-195, 1997.

Y. Wang, H. C. Leung, and S. M. Yiu, Metacluster-ta: Taxonomic annotation for metagenomic data based on assembly-assisted binning, BMC Genomics, vol.15, p.12, 2014.
DOI : 10.1186/1471-2164-15-s1-s12

URL : https://bmcgenomics.biomedcentral.com/track/pdf/10.1186/1471-2164-15-S1-S12

D. E. Wood and S. L. Salzberg, Kraken: Ultrafast metagenomic sequence classification using exact alignments, Genome Biol, vol.15, p.46, 2014.
DOI : 10.1186/gb-2014-15-3-r46

URL : https://genomebiology.biomedcentral.com/track/pdf/10.1186/gb-2014-15-3-r46

J. Wu, The uniform effect of k-means clustering. Advances in K-means Clustering, pp.17-35, 2012.

Y. Wu, Y. Tang, and S. G. Tringe, Maxbin: An automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm, vol.2, p.26, 2014.
DOI : 10.1186/2049-2618-2-26

URL : https://microbiomejournal.biomedcentral.com/track/pdf/10.1186/2049-2618-2-26

B. Yang, Y. Peng, and H. Leung, Metacluster: Unsupervised binning of environmental genomic fragments and taxonomic annotation, Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, pp.170-179, 2010.
DOI : 10.1145/1651318.1651322

URL : http://europepmc.org/articles/pmc3165929?pdf=render

K. Y. Yeung, C. Fraley, and A. Murua, Model-based clustering and data transformations for gene expression data, Bioinformatics, vol.17, pp.977-987, 2001.
DOI : 10.1093/bioinformatics/17.10.977

URL : https://academic.oup.com/bioinformatics/article-pdf/17/10/977/698392/170977.pdf

K. Y. Yeung and W. L. Ruzzo, Details of the adjusted rand index and clustering algorithms, supplement to the paper: an empirical study on principal component analysis for clustering gene expression data, Bioinformatics, vol.17, pp.763-774, 2001.

T. Zhang, R. Ramakrishnan, and M. Livny, Birch: An efficient data clustering method for very large databases, ACM Sigmod Record, vol.25, pp.103-114, 1996.