Ren-zhi Li, Bo-jie Li, Guo-zhen Zhang, Jun Jiang, Yi Luo. A High-Performance and Flexible Chemical Structure & Data Search Engine Built on CouchDB & ElasticSearch[J]. Chinese Journal of Chemical Physics , 2018, 31(3): 341-349. doi: 10.1063/1674-0068/31/cjcp1711202
Citation: Ren-zhi Li, Bo-jie Li, Guo-zhen Zhang, Jun Jiang, Yi Luo. A High-Performance and Flexible Chemical Structure & Data Search Engine Built on CouchDB & ElasticSearch[J]. Chinese Journal of Chemical Physics , 2018, 31(3): 341-349. doi: 10.1063/1674-0068/31/cjcp1711202

A High-Performance and Flexible Chemical Structure & Data Search Engine Built on CouchDB & ElasticSearch

doi: 10.1063/1674-0068/31/cjcp1711202
Funds:  This work was supported by the National Natural Science Foundation of China, the Ministry of Science and Technology of China, and the Swedish Research Council.
  • Received Date: 2017-11-06
  • Rev Recd Date: 2017-12-25
  • Computer-assisted chemical structure searching plays a critical role for efficient structure screening in cheminformatics. We designed a high-performance chemical structure & data search engine called DCAIKU, built on CouchDB and ElasticSearch engines. DCAIKU converts the chemical structure similarity search problem into a general text search problem to utilize off-the-shelf full-text search engines. DCAIKU also supports flexible document structures and heterogeneous datasets with the help of schema-less document database. Our evaluations show that DCAIKU can handle both keyword search and structural search against millions of records with both high accuracy and low latency. We expect that DCAIKU will lay the foundation towards large-scale and cost-effective structural search in materials science and chemistry research.
  • 加载中
  • [1] P. Willett, J. M. Barnard, and G. M. Downs, J. Chem. Inf. Comput. Sci. 38, 983(1998).
    [2] G. M. Downs and P. Willett, Reviews in Com-putational Chemistry, K. B. Lipkowitz and D. B. Boyd Eds., Hoboken, New Jersey:Wiley-VCH, Inc., (2007).
    [3] Y. Ke and R. Sukthankar, Proceedings of the 2004 IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition, Washington, DC, USA:IEEE, (2004).
    [4] R. E. Carhart, D. H. Smith, and R. Venkata-raghavan, J. Chem. Inf. Comput. Sci. 25, 64(1985).
    [5] R. Nilakantan, N. Bauman, J. S. Dixon, and R. Venkataraghavan, J. Chem. Inf. Comput. Sci. 27, 82(1987).
    [6] V. Gutmann, The Donor-Acceptor Approach to Molecular Interactions, New York:Plenum Press, (1978).
    [7] K. Chodorow, MongoDB:the Definitive Guide, 2nd Ed, Sebastopol, US:O'Reilly Media, Inc., (2013).
    [8] J. C. Anderson, J. Lehnardt, and N. Slater, CouchDB:the Definitive Guide, Cambridge:O'Reilly Media, Inc., (2010).
    [9] M. McCandless, E. Hatcher, and O. Gospodnetić, Lucene in Action:Covers Apache Lucene 3.0, 2nd Ed, Greenwich:Manning Publications Co., (2010).
    [10] C. Gormley and Z. Tong, Elasticsearch:The Definitive Guide:A Distributed Real-Time Search and Analytics Engine, Sebastopol:O'Reilly Media, Inc., 328(2015).
    [11] M. C. Burger, J. Cheminform. 7, 35(2015).
    [12] C. Marrin, WebGL Specification, Khronos WebGL Working Group, (2011).
    [13] R. A. Sayle and E. J. Milner-White, Trends Biochem. Sci. 20, 374(1995).
    [14] W. Humphrey, A. Dalke, and K. Schulten, J. Mol. Graph. 14, 33(1996).
    [15] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, Auckland, Tokyo:McGraw-Hill, Inc., (1983).
    [16] R. W. Floyd, Commun. ACM 5, 345(1962).
    [17] M. Thorup, J. ACM 46, 362(1999).
    [18] S. Josefsson, The Base16, Base32, and Base64 Data Encodings, US:Network Working Group, (2006).
    [19] E. E. Bolton, Y. L. Wang, P. A. Thiessen, and S. H. Bryant, Annu. Rep. Comput. Chem. 4, 217(2008).
    [20] S. Gražulis, A. Daškevič, A. Merkys, D. Chateigner, L. Lutterotti, M. Quirós, N. R. Serebryanaya, P. Moeck, R. T. Downs, and A. Le Bail, Nucleic Acids Res. 40, D420(2011).
    [21] https://www.emolecules.com/.
    [22] http://www.chemspider.com/.
    [23] http://jp-minerals.org/vesta/en/.
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Article Metrics

Article views(1209) PDF downloads(883) Cited by()

Proportional views
Related

A High-Performance and Flexible Chemical Structure & Data Search Engine Built on CouchDB & ElasticSearch

doi: 10.1063/1674-0068/31/cjcp1711202
Funds:  This work was supported by the National Natural Science Foundation of China, the Ministry of Science and Technology of China, and the Swedish Research Council.

Abstract: Computer-assisted chemical structure searching plays a critical role for efficient structure screening in cheminformatics. We designed a high-performance chemical structure & data search engine called DCAIKU, built on CouchDB and ElasticSearch engines. DCAIKU converts the chemical structure similarity search problem into a general text search problem to utilize off-the-shelf full-text search engines. DCAIKU also supports flexible document structures and heterogeneous datasets with the help of schema-less document database. Our evaluations show that DCAIKU can handle both keyword search and structural search against millions of records with both high accuracy and low latency. We expect that DCAIKU will lay the foundation towards large-scale and cost-effective structural search in materials science and chemistry research.

Ren-zhi Li, Bo-jie Li, Guo-zhen Zhang, Jun Jiang, Yi Luo. A High-Performance and Flexible Chemical Structure & Data Search Engine Built on CouchDB & ElasticSearch[J]. Chinese Journal of Chemical Physics , 2018, 31(3): 341-349. doi: 10.1063/1674-0068/31/cjcp1711202
Citation: Ren-zhi Li, Bo-jie Li, Guo-zhen Zhang, Jun Jiang, Yi Luo. A High-Performance and Flexible Chemical Structure & Data Search Engine Built on CouchDB & ElasticSearch[J]. Chinese Journal of Chemical Physics , 2018, 31(3): 341-349. doi: 10.1063/1674-0068/31/cjcp1711202
Reference (23)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return