pjstadig/es

A simple REST-based Elasticsearch client for Clojure.


License
MPL-2.0

Documentation

es

A simple REST-based Elasticsearch client for Clojure.

It does not support every Elasticsearch operation or API, but if there are operations or APIs you’d like to see supported, open an issue or send a pull request.

Tested with Elasticsearch 0.90.10.

Example

pjstadig.es> (index-create "http://localhost:9200/" "twitter"
                           :settings {:index {:number_of_replicas 1}}
                           :mappings {:tweet {:properties {:user {:type :string :index :not_analyzed}
                                                           :post_date {:type :date :format "YYYY-MM-DD'T'HH:mm:ss"}
                                                           :message {:type :string}}}})
{"ok" true, "acknowledged" true}

doc-create, doc-index, doc-delete perform the create, index, and delete operations (respectively) for a document.

pjstadig.es> (doc-create "http://localhost:9200/" "twitter"
                         {:_type "tweet"
                          :_id 425444752310276096
                          :user "wm"
                          :post_date "2014-01-20T20:48:00"
                          :message "Really excited about what I'm reading to the kids tonight. pic.twitter.com/BafBIAbHSU"})
{"ok" true
 "_index" "twitter"
 "_type" "tweet"
 "_id" "425444752310276096"
 "_version" 1}

search-once will return a single page of raw Elasticsearch results.

pjstadig.es> (search-once "http://localhost:9200/" "twitter" (q/match-all))
{"hits"
 {"hits"
  [{"_score" 1.0,
    "_index" "twitter",
    "_type" "tweet",
    "_source"
    {"message" "Really excited about what I'm reading to the kids tonight. pic.twitter.com/BafBIAbHSU",
     "user" "wm",
     "post_date" "2014-01-20T20:48:00",
     "_type" "tweet",
     "_id" 425444752310276096},
    "_id" "425444752310276096"}],
  "total" 1,
  "max_score" 1.0},
 "timed_out" false,
 "_shards" {"total" 5, "successful" 5, "failed" 0},
 "took" 1}

You can use the convenience function search-hits to get back the hits from the results. By default, if no fields are specified, the entire _source will be returned.

pjstadig.es> (search-hits (search-once "http://localhost:9200/" "twitter"
                                       (q/match-all)))
({"post_date" "2014-01-20T20:48:00",
  "message" "Really excited about what I'm reading to the kids tonight. pic.twitter.com/BafBIAbHSU",
  "_id" 425444752310276096,
  "user" "wm",
  "_type" "tweet"})

Otherwise the fields specified will be returned.

pjstadig.es> (search-hits (search-once "http://localhost:9200/" "twitter"
                                       (q/match-all)
                                       :fields [:message]))
({"message" "Really excited about what I'm reading to the kids tonight. pic.twitter.com/BafBIAbHSU"})

doc-bulk will perform bulk operations. The doc-bulk-create, doc-bulk-index, and doc-bulk-delete convenience functions will create a create, index, and delete bulk operation (respectively) for use with doc-bulk.

pjstadig.es> (doc-bulk "http://localhost:9200/" "twitter"
                       [(doc-bulk-index {:_type "tweet"
                                         :_id 425444752310276096
                                         :user "wm"
                                         :post_date "2014-01-20T20:48:00"
                                         :message "Really excited about what I'm reading to the kids tonight. pic.twitter.com/BafBIAbHSU"})])
{"took" 3,
 "items" [{"index"
           {"_index" "twitter"
            "_type" "tweet"
            "_id" "425444752310276096"
            "_version" 2
            "ok" true}}]}

doc-bulk consumes, encodes, and transmits the bulk operations to Elasticsearch incrementally. So you can produce a lazy sequence of bulk operations and it will not be realized into memory all at once.

NOTE: Depending on how much data you are sending, you will probably need to tweak the :socket-timeout configuration value in *default-http-options*. Otherwise the socket may timeout waiting for ES to execute the operations.

pjstadig.es> (binding [*default-http-options* (assoc *default-http-options* :socket-timeout 45000)] 
               (doc-bulk "http://localhost:9200/" "twitter"
                         (map doc-bulk-index
                              (map #(hash-map :_type "tweet"
                                              :_id %
                                              :post_date "2014-01-01T12:00:00"
                                              :message (str "foo" %))
                                   (range 500000)))))
{"took" 15761,
 "items"
 [{"index"
   {"_index" "twitter",
    "_type" "tweet",
    "_id" "0",
    "_version" 7,
    "ok" true}}
  {"index"
   {"_index" "twitter",
    "_type" "tweet",
    "_id" "1",
    "_version" 7,
    "ok" true}}
  {"index"
   {"_index" "twitter",
    "_type" "tweet",
    "_id" "2",
    "_version" 7,
    "ok" true}}
  {"index"
   {"_index" "twitter",
    "_type" "tweet",
    "_id" "3",
    "_version" 7,
    "ok" true}}
  {"index"
   {"_index" "twitter",
    "_type" "tweet",
    "_id" "4",
    "_version" 7,
    "ok" true}}
  ...]}

If you have a large non-bulk operation you can also stream it using the :stream? option.

pjstadig.es> (doc-index "http://localhost:9200/" "twitter"
                        {:_type "tweet"
                         :_id 425444752310276096
                         :user "wm"
                         :post_date "2014-01-20T20:48:00"
                         :message "Really excited about what I'm reading to the kids tonight. pic.twitter.com/BafBIAbHSU"}
                        :stream? true)
{"ok" true
 "_index" "twitter"
 "_type" "tweet"
 "_id" "425444752310276096"
 "_version" 3}

Instead of search-once you can use search to return a lazy sequence of all of the pages of results for a search query. You can also use the search-hits convenience function to get the hits from the result.

pjstadig.es> (search-hits (search "http://localhost:9200/" "twitter"
                                  (q/match-all)
                                  :fields [:message]))
({"message" "foo10000"}
 {"message" "foo10005"}
 {"message" "foo10012"}
 {"message" "foo10017"}
 {"message" "foo10024"}
 ...)

License

Copyright © 2013-2014 Paul Stadig. All rights reserved.

This Source Code Form is subject to the terms of the Mozilla Public License,
v. 2.0. If a copy of the MPL was not distributed with this file, You can
obtain one at http://mozilla.org/MPL/2.0/.

This Source Code Form is "Incompatible With Secondary Licenses", as defined
by the Mozilla Public License, v. 2.0.