{"id":4,"date":"2007-03-17T22:10:46","date_gmt":"2007-03-18T03:10:46","guid":{"rendered":"http:\/\/wp.javatechniques.com\/public\/lucene-in-memory-text-search-example\/"},"modified":"2007-06-25T13:15:12","modified_gmt":"2007-06-25T18:15:12","slug":"lucene-in-memory-text-search-example","status":"publish","type":"page","link":"https:\/\/javatechniques.com\/blog\/lucene-in-memory-text-search-example\/","title":{"rendered":"Lucene In-Memory Text Search Example"},"content":{"rendered":"<p>The <A HREF=\"http:\/\/jakarta.apache.org\/lucene\/\">Lucene<\/A> text search engine library (from the Apache Jakarta project) provides fast and flexible search capabilities that can be easily integrated into many kinds of applications. Lucene provides a number of advanced capabilities &#8220;out of the box&#8221;, and can be extended to accomodate special needs.<\/p>\n<p>For large text collections, you will almost always want to use disk-based indices that can be updated and reused across multiple executions of an application. For small collections, especially when running in an unsigned applet or WebStart application where disk access is not permitted, Lucene provides a mechanism for maintaining an in-memory index. The example below provides a simple illustration of this capability.<\/p>\n<p>At a minimum, using Lucene typically involves the following steps:<br \/>\n<OL><br \/>\n<LI> Build an index using <CODE>IndexWriter<\/CODE><br \/>\n<OL><br \/>\n<LI> For file-based indexes, a directory name can be passed to the <CODE>IndexWriter<\/CODE> constructor. In this example, however, we use the <CODE>RAMDirectory<\/CODE> class to maintain an in-memory index.<br \/>\n<LI> Add <CODE>Document<\/CODE> objects representing each object to be searched to the <CODE>IndexWriter<\/CODE>. A <CODE>Document<\/CODE> is a collection of <CODE>Field<\/CODE> objects. Different subclasses of <CODE>Field<\/CODE> support indexed or unindexed content.<br \/>\n<LI> Optimize and close the <CODE>IndexWriter<\/CODE> object.<br \/>\n<\/OL><br \/>\n<LI> Update the index, by either rebuilding it from scratch or deleting (and, where appropriate, re-adding) <CODE>Document<\/CODE>s. Somewhat unintuitively, adding and deleting <CODE>Document<\/CODE>s from an index is done with an <CODE>IndexReader<\/CODE> object.<br \/>\n<LI> Search the index using an <CODE>IndexSearcher<\/CODE> object.<br \/>\n<OL><br \/>\n<LI> As with <CODE>IndexWriter<\/CODE>s, <CODE>IndexSearcher<\/CODE>s can be constructed with a directory name for file-based indexes. In this example, we pass in the <CODE>RAMDirectory<\/CODE> object that we created when the index was built.<br \/>\n<LI> A <CODE>Query<\/CODE> object encapulates the search query. These can be created using the <CODE>QueryParser<\/CODE> class.<br \/>\n<LI> The <CODE>Query<\/CODE> object is passed to the <CODE>IndexSearcher<\/CODE>&#8216;s <CODE>search(&#8230;)<\/CODE> method, which returns a <CODE>Hits<\/CODE> object that provides access to the <CODE>Document<\/CODE> objects that match the query.<br \/>\n<\/OL><br \/>\n<\/OL><br \/>\nThere are ways to customize practically every aspect of Lucene. The example in Figure 1 illustrates a minimal usage of the library.<br \/>\n<HR><\/p>\n<pre>\r\n\r\n\/**\r\n * A simple example of an in-memory search using Lucene.\r\n *\/\r\nimport java.io.IOException;\r\nimport java.io.StringReader;\r\n\r\nimport org.apache.lucene.search.Hits;\r\nimport org.apache.lucene.search.Query;\r\nimport org.apache.lucene.document.Field;\r\nimport org.apache.lucene.search.Searcher;\r\nimport org.apache.lucene.index.IndexWriter;\r\nimport org.apache.lucene.document.Document;\r\nimport org.apache.lucene.store.RAMDirectory;\r\nimport org.apache.lucene.search.IndexSearcher;\r\nimport org.apache.lucene.queryParser.QueryParser;\r\nimport org.apache.lucene.queryParser.ParseException;\r\nimport org.apache.lucene.analysis.standard.StandardAnalyzer;\r\n\r\npublic class InMemoryExample {\r\n\r\n    public static void main(String[] args) {\r\n        \/\/ Construct a RAMDirectory to hold the in-memory representation\r\n        \/\/ of the index.\r\n        RAMDirectory idx = new RAMDirectory();\r\n\r\n        try {\r\n            \/\/ Make an writer to create the index\r\n            IndexWriter writer = \r\n                new IndexWriter(idx, new StandardAnalyzer(), true);\r\n\r\n            \/\/ Add some Document objects containing quotes\r\n            writer.addDocument(createDocument(\"Theodore Roosevelt\", \r\n                \"It behooves every man to remember that the work of the \" +\r\n                \"critic, is of altogether secondary importance, and that, \" +\r\n                \"in the end, progress is accomplished by the man who does \" +\r\n                \"things.\"));\r\n            writer.addDocument(createDocument(\"Friedrich Hayek\",\r\n                \"The case for individual freedom rests largely on the \" +\r\n                \"recognition of the inevitable and universal ignorance \" +\r\n                \"of all of us concerning a great many of the factors on \" +\r\n                \"which the achievements of our ends and welfare depend.\"));\r\n            writer.addDocument(createDocument(\"Ayn Rand\",\r\n                \"There is nothing to take a man's freedom away from \" +\r\n                \"him, save other men. To be free, a man must be free \" +\r\n                \"of his brothers.\"));\r\n            writer.addDocument(createDocument(\"Mohandas Gandhi\",\r\n                \"Freedom is not worth having if it does not connote \" +\r\n                \"freedom to err.\"));\r\n\r\n            \/\/ Optimize and close the writer to finish building the index\r\n            writer.optimize();\r\n            writer.close();\r\n\r\n            \/\/ Build an IndexSearcher using the in-memory index\r\n            Searcher searcher = new IndexSearcher(idx);\r\n\r\n            \/\/ Run some queries\r\n            search(searcher, \"freedom\");\r\n            search(searcher, \"free\");\r\n            search(searcher, \"progress or achievements\");\r\n         \r\n            searcher.close();\r\n        }\r\n        catch(IOException ioe) {\r\n            \/\/ In this example we aren't really doing an I\/O, so this\r\n            \/\/ exception should never actually be thrown.\r\n            ioe.printStackTrace();\r\n        }\r\n        catch(ParseException pe) {\r\n            pe.printStackTrace();\r\n        }\r\n    }\r\n\r\n    \/**\r\n     * Make a Document object with an un-indexed title field and an\r\n     * indexed content field.\r\n     *\/ \r\n    private static Document createDocument(String title, String content) {\r\n        Document doc = new Document();\r\n\r\n        \/\/ Add the title as an unindexed field...\r\n        doc.add(Field.UnIndexed(\"title\", title));\r\n\r\n        \/\/ ...and the content as an indexed field. Note that indexed\r\n        \/\/ Text fields are constructed using a Reader. Lucene can read\r\n        \/\/ and index very large chunks of text, without storing the\r\n        \/\/ entire content verbatim in the index. In this example we\r\n        \/\/ can just wrap the content string in a StringReader.\r\n        doc.add(Field.Text(\"content\", new StringReader(content)));\r\n\r\n        return doc;\r\n    }\r\n\r\n    \/**\r\n     * Searches for the given string in the \"content\" field\r\n     *\/\r\n    private static void search(Searcher searcher, String queryString) \r\n        throws ParseException, IOException {\r\n\r\n        \/\/ Build a Query object\r\n        Query query = QueryParser.parse(\r\n            queryString, \"content\", new StandardAnalyzer());\r\n\r\n        \/\/ Search for the query\r\n        Hits hits = searcher.search(query);\r\n\r\n        \/\/ Examine the Hits object to see if there were any matches\r\n        int hitCount = hits.length();\r\n        if (hitCount == 0) {\r\n            System.out.println(\r\n                \"No matches were found for \\\"\" + queryString + \"\\\"\");\r\n        }\r\n        else {\r\n            System.out.println(\"Hits for \\\"\" + \r\n                queryString + \"\\\" were found in quotes by:\");\r\n\r\n            \/\/ Iterate over the Documents in the Hits object\r\n            for (int i = 0; i &lt; hitCount; i++) {\r\n                Document doc = hits.doc(i);\r\n\r\n                \/\/ Print the value that we stored in the \"title\" field. Note\r\n                \/\/ that this Field was not indexed, but (unlike the \r\n                \/\/ \"contents\" field) was stored verbatim and can be\r\n                \/\/ retrieved.\r\n                System.out.println(\"  \" + (i + 1) + \". \" + doc.get(\"title\"));\r\n            }\r\n        }\r\n        System.out.println();\r\n    }\r\n}\r\n<\/pre>\n<p><HR><br \/>\n<CENTER>Figure 1. A simple example of using Lucene for an in-memory text search.<\/CENTER><\/p>\n<p>To compile and run this class, you will need to include the lucene jar file (downloaded from <A HREF=\"http:\/\/jakarta.apache.org\/lucene\/\">http:\/\/jakarta.apache.org\/lucene\/<\/A>) in your classpath. Figure 2 shows the output from the class.<BR><\/p>\n<p><HR><\/p>\n<pre>\r\nHits for \"freedom\" were found in quotes by:\r\n  1. Mohandas Gandhi\r\n  2. Ayn Rand\r\n  3. Friedrich Hayek\r\n\r\nHits for \"free\" were found in quotes by:\r\n  1. Ayn Rand\r\n\r\nHits for \"progress or achievements\" were found in quotes by:\r\n  1. Theodore Roosevelt\r\n  2. Friedrich Hayek\r\n<\/pre>\n<p><HR\/><br \/>\n<CENTER>Figure 2. Output from an execution of the <CODE>InMemoryExample<\/CODE> class.<\/CENTER><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Lucene text search engine library (from the Apache Jakarta project) provides fast and flexible search capabilities that can be easily integrated into many kinds of applications. Lucene provides a number of advanced capabilities &#8220;out of the box&#8221;, and can be extended to accomodate special needs. For large text collections, you will almost always want &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/javatechniques.com\/blog\/lucene-in-memory-text-search-example\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Lucene In-Memory Text Search Example&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"open","template":"","meta":{"footnotes":""},"class_list":["post-4","page","type-page","status-publish","hentry","entry"],"_links":{"self":[{"href":"https:\/\/javatechniques.com\/blog\/wp-json\/wp\/v2\/pages\/4","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/javatechniques.com\/blog\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/javatechniques.com\/blog\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/javatechniques.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/javatechniques.com\/blog\/wp-json\/wp\/v2\/comments?post=4"}],"version-history":[{"count":0,"href":"https:\/\/javatechniques.com\/blog\/wp-json\/wp\/v2\/pages\/4\/revisions"}],"wp:attachment":[{"href":"https:\/\/javatechniques.com\/blog\/wp-json\/wp\/v2\/media?parent=4"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}