Spotlight search with ElasticSearch on Rails

by-elasticsearch, rails

If you need to add a Spotlight-like search feature to your Rails site, ElasticSearch is a good answer. However, there are several things you need to cover.

Working on a project, the client asked us to implement a search feature that should work pretty much like OS X Spotlight search: instant results on a popup and search-as-you-type, sugared with results highlighting. We decided to use ElasticSearch. Here's the tale of what we did, the problems we found and how they were solved.

The scenario

The following image shows the intended design requested:

Design

Search results should be taken from Enquiries and Bookings both active and past (called Alumni), and should show the Student name, his email, the reference the Enquiry or Booking is idenfitied with, and an icon that allows us to know its status. Also, Alumni should be shown in a different section of the results, to help the user differentiate. In any field, the text matching the search input must be hihglighted, and finally, when hovering over a search result, we want it to change color and clickable, so we can visit the desired post or comment.

The following Cucumber feature test gives an idea of the desired behaviour of the feature:

  @javascript
  Feature: Global search
    In order to access Bookings and Enquiries
    as a blog administrator
    I want to use a search form that returns classified rich results in real time

    Scenario: Search is present
      Given I am in the home page
      Then I should see a search field in the top menu

    Scenario: Use Search
      Given a booking exists
      Given an alumni exists
      Given an enquiry exists
      And I am in the home page
      Then I should see a search field in the top menu
      When I type in stud
      Then I should see search results for stud
      And I should see the highlighted stud
      And I type in student
      Then I should see search results for student
      And I should see the highlighted student

    Scenario: Empty results
      Given I am in the home page
      Then I should see a search field in the top menu
      When I type in Cinderella
      Then I should not see results for Cinderella

Setting up ElasticSearch

Local development

ElasticSearch is available for OSX, Unix and Windows platforms. You can get it working via package managers (and homebrew) or downloading it from their website. Note that you need Java installed and running on your machine to make ES work.

Using ElasticSearch on a Ruby on Rails application

We chose to use the gem Tire, as we found it had more documentation and related posts on the web that would potentially help to troubleshoot any problem we could find. There are alternatives, such as ElasticSearch-Ruby, which is very similar to Tire (in fact, both are developed by Karmi)

Continous Integration Server

This project uses CircleCI as continuous integration server, and using ES on it is very easy. From CircleCI documentation:

Several services are disabled by default because they're not commonly used, or because of memory requirements. We try to detect and enable them automatically, but in case we fail (or don't have inference in your language), you can enable them by adding to your circle.yml:

machine:
  services:
    - elasticsearch #version 0.90.2

Production server

The project is deployed on EngineYard, so we made use of this Engineyard recipe

Adapting The Backend

Models

In order to get ElasticSearch to show results, we need to index our info first. This means, our search does not attack our application database, but another different database stablished by ES, in which the selected information from our database is stored and updated by our application.

To get our Bookings and Enquiries indexed in the ES database, we need to modify our models:

class Enquiry

  # This indicates that we are going to use Tire to index this model
  include Tire::Model::Search
  # This allows the model to be reindexed with its new info when modified
  include Tire::Model::Callbacks

  # We'll need this further on to specify which fields from the indexed data
  # we want to perform our search on
  SEARCHABLE_FIELDS = [:student_name, :student_email, :reference]
  SEARCHABLE_STATUS = ['new', 'open', 'sent_for_payment', 'requires_approval', 'rejected']

  # We setup a mapping to use ElasticSearch built-in tokenizers and analyzers
  # for certain fields
  settings ElasticSearchSettings do
    mapping do
      indexes :author_email, type: 'string', analyzer: 'email_analyzer'
    end
  end

  # This is used to limit and control which attributes are indexed
  # and the info indexed depending on the model status
  def to_indexed_json
    reduced_info = {
      id: _id.to_s,
      student_name: self.student_name,
      student_email: self.student_email,
      category: "enquiries",
      reference: self.reference,
      status: self.status
    }
    reduced_info.to_json
  end
  ...
end

The changes made to the Booking model are very similar:

class Booking

  include Tire::Model::Search
  include Tire::Model::Callbacks

  SEARCHABLE_FIELDS = [:student_name, :student_email, :reference]

  settings ElasticSearchSettings do
    mapping do
      indexes :author_email, type: 'string', analyzer: 'email_analyzer'
    end
  end

  def to_indexed_category
    if self.end_date_inclusive.past? && !self.cancelled?
      "alumni"
    else
      "bookings"
    end
  end

  def to_indexed_json
    reduced_info = {
      id: _id.to_s,
      student_name: self.student_name,
      student_email: self.student.try(:email),
      category: self.to_indexed_category,
      reference: self.reference,
      status: self.cancelled? ? "cancelled" : "active"
    }
    reduced_info.to_json
  end
  ...
end

Configuration

We'll need to configure the aforementioned email_analyzer on config/elastic_search.yml to use the proper ES tokenize and be case insensitive:

analysis:
 analyzer:
   email_analyzer:
     tokenizer: "uax_url_email"
     filter: ["lowercase"]

And also, we need a config/initializers/elasticsearch.rb_:

ElasticSearchSettings = YAML.load_file('config/elastic_search.yml').with_indifferent_access

After that, in order to get all our DB data into ES, we need to call a rake task:

$ rake environment tire:import:all

The controller

Apart from the Cucumber scenario, there were some requirements we needed to satisfy on our search, these search_controller#perform tests were designed to deal with this. Note that for each test regarding specific aspect of the Bookings results, there is a very similar one regarding Enquiries and past bookings, named as Alumni.

it "#perform will not respond to an empty query" do
  post :perform
  assert_response 406
end

it "#perform will not produce results for a query with less than 3 chars" do
  post :perform, :query => 'Joh'
  assert_nil assigns(:results)
end

it "#perform will not produce results for something that doesn't exist on the database" do
  post :perform, :query => 'John Doe'
  assert_nil assigns(:results)
end

it "#perform will produce results for bookings" do
  booking = FactoryGirl.create(:booking)
  post :perform, :query => booking.student_name
  assigns(:results)[:bookings].first[:student_name].must_equal comment.student_name
end

it "#perform will not produce results for queries that don't have matches in the database" do
  booking = FactoryGirl.create(:booking)
  post :perform, :query => "Student 0"
  assert_nil assigns(:results)
end

it "#perform on Bookings will return the correct fields" do
  booking = FactoryGirl.create(:booking)
  post :perform, :query => booking.student_name
  expected_keys = ["id", "category", "link", "student_name", "student_email", "reference", "status", "status_icon"]
  assigns(:results)[:bookings].first.keys.must_equal expected_keys
end

it "#perform won't return more than six results in Bookings search" do
  bookings = FactoryGirl.create_list(:booking, 10)
  post :perform, :query => "Student"
  assigns(:results)[:bookings].size.must_be :>=, 1
  assigns(:results)[:bookings].size.must_be :<=, 6
end

 # We need one like this for each field returned
it "#perform on Bookings will return a link to the the post" do
  booking = FactoryGirl.create(:booking)
  post :perform, :query => booking.student_name
  assigns(:results)[:bookings].first[:link].must_equal edit_or_show_path(booking)
end

  # We need one like this for each field returned that uses highlight
it "#perform on Bookings will return the email correctly highlighted" do
  booking = FactoryGirl.create(:booking)
  post :perform, :query => booking.student_name
  assigns(:results)[:bookings].first[:student_email].must_equal "<b>#{booking.student_email}</b>"
end

Below is a dissected version of our our perform search method on the search_controller.rb:

  # As per requirements, we won't be performing any search if the query input is smaller than 3 characters
if query.present? && query.length >=3

  booking_formatted_query = query_from_tokens(Booking::SEARCHABLE_FIELDS, query, "category:bookings")

Let's stop there. To create a query processable by Tire and give us the right results, we need to combine the query with the searchable fields we defined on the model, and then combine it all with any possible scope limit we want to impose. So if we want to search for "John", and in this case we don't mind the status of the comment, what query_from_token will do is return a query string like this:

(student_name:John OR student_email:John OR reference:John) AND (category:bookings)

No we have the real query that is fed to ElasticSearch. We use Tire DSL and methods to get our results:

  bookings = Booking.search do
    query { string booking_formatted_query, :default_operator => "AND" }
    # Limits the number of results we will get, we had six as per requirements
    size 6
  end

We perform a similar search on the Enquiries and Alumni bookings:

  enquiry_formatted_query = query_from_tokens(Enquiry::SEARCHABLE_FIELDS, query, Enquiry::SEARCHABLE_STATUS.map{|s| "status: #{s}"}.join(" OR "))
  enquiries = Enquiry.search do
    query { string enquiry_formatted_query, :default_operator => "AND" }
    size 6
  end

  alumni_formatted_query = query_from_tokens(Booking::SEARCHABLE_FIELDS, query, "category:alumni")
  alumni = Booking.search do
    query { string alumni_formatted_query, :default_operator => "AND" }
    size 6
  end

After that, we get our results together, and put them in a hash object that we will feed as a JSON object to our front-end:

  returned_results = [enquiries, bookings, alumni].flatten.compact

  @results = {}

  query_regexp = query.split(/\s+/).map{|e| "(#{e})"}.join("|")

  returned_results.each do |result|
    @results[result["category"].to_sym] ||= []
    @results[result["category"].to_sym] << {
      id: result["id"],
      category: result["category"],
      link: edit_or_show_path(result),
      student_name: result["student_name"].gsub(/#{query_regexp}/i, '<b>\+</b>'),
      student_email: result["student_email"].gsub(/#{query_regexp}/i, '<b>\+</b>'),
      reference: result["reference"].gsub(/#{query_regexp}/i, '<b>\+</b>'),
      status: result["status"],
      status_icon: status_to_icon(result["status"]).first.capitalize
    }
  end

Why are we not using nGrams and Elastic search highlight?

It's worth noting here that our index is only composed of names, emails and reference numbers.

We've tried to use nGrams but because of our index, the results where very unpredictables. Also our index being quite small and targeted, we feel confident using regexp search won't have a massive performance impact.

We've also used Elastic search highlight, but it will highlight the whole word found. For exeample searching for 'cind' would highlight the whole world 'Cinderella' which we didn't want. This is why we reverted to using our own highlight system.

Changing The Frontend

To the controller and back again

Adding a search field to our layout was fairly easy, but we needed it to work in a "search-as-you-type" fashion. For this, we wanted to control the keyup event using JavaScript, triggering an AJAX search request upon each keystroke, when we have more than 3 characters written:

  $("#spotlight-search-query").on "keyup", (e) ->
    if $.trim($(this).val()).length > 2
      $("form#spotlight-search").submit()
    else
      $(".search-results-popover").hide()

The submit function is in charge of doing the request, receive the JSON with the results and passing them to the view template.

  $("form#spotlight-search").on "submit", (e) ->
    e.preventDefault()

    $.post($(this).attr("action"), {query: $("#spotlight-search-query").val()}, (data) ->
      content = $("<div class='search-results'>")
      receiver = $(".search-results-popover")

      searchResults = $("#spotlight-search-query")
      if data?
        for result in ['posts', 'comments']
          if data[result]?
            content.append(
              JST['search_results']({
                results: data[result]
                _type: result
              })
            )
      else
        content.append("<p class='no-results'>No results found.</p>")

      receiver.html(content)
      receiver.show()

    , "json").done(->
    ).fail(->
    ).always(->
    )

Testing ALL THE THINGS!

Testing our Rails application with the ES dependence can be tricky to start with. We followed this article on BitsAndBit and this post on StackOverflow. Here are the highlights:

Indexes uniqueness

To avoid our development search indexes to be wiped out on each test run, we needed to define exclusive test search indexes. This can be solved by expanding your elastic_search.rb initializer:

Tire::Model::Search.index_prefix "#{Rails.application.class.parent_name.downcase}_#{Rails.env.to_s.downcase}"

Clean indexes on each run

Also, when testing, we need to have clean search indexes indexes. If you're using Rspec, you can do this on a before :all block, but we are using Minitest and it doesn't allow :all blocks, so we chose to add it a the top of our search_controller_test.rb. In any case, be sure to do it just once (i.e.: not in a before :each block):

[...]
describe SearchController do

  #Set up the test indexes
  indexed_models_list = [Booking, Enquiry]
  indexed_models_list.each do |klass|
    # make sure that the current model is using tire
    if klass.respond_to? :tire
      # delete the index for the current model
      klass.tire.index.delete
      # the mapping definition must get executed again. for that, we reload the model class.
      # silenced warning to avoid warning due to constant redefinition on files load.
      silence_warnings do
        load File.expand_path("../../../app/models/#{klass.name.downcase}.rb", __FILE__)
      end
    end
  end

  before do
  [...]

Manually refresh indexes

Note that the ES indexes are automatically refreshed each second. When testing, multiple assertions take place every second, so there's a chance that the changes we are expecting to test have not yet been persisted in the ElasticSearch database. To solve this, we need to manually refresh the search index after earch change on our test database. So, a test like this:

it "#perform on Bookings will return the email correctly highlighted" do
  booking = FactoryGirl.create(:booking)
  post :perform, :query => booking.student_email
  assigns(:results)[:bookings].first[:student_email].must_equal "<b>#{booking.student_email}</b>"
end

Turns into this:

it "#perform on Bookings will return the email correctly highlighted" do
  booking = FactoryGirl.create(:booking)
  Booking.tire.index.refresh
  post :perform, :query => booking.student_email
  assigns(:results)[:bookings].first[:student_email].must_equal "<b>#{booking.student_email}</b>"
end

Allow calls to ES

Finally, we will need configure Capybara on our test helper to accept petitions to ElasticSearch from within our tests:

WebMock.disable_net_connect!(:allow => "localhost:9200")

Wait for JS to do its thing

When doing the feature test, we realized we needed to give time to the Javascript call to execute, return, and present the results. The method wait_until was really helpful with this. It's also worth mentioning that wait_until is not present on recent Capybara versions.

When /^I type in (.*)$/ do |search_term|
  page.fill_in "query", :with => search_term
  wait_until {page.find(".search-results").visible?}
end

Then /^I should see search results for (.*)$/ do |search_term|
  wait_until {page.find(".search-results").visible?}
  within('.search-results') do
    page.should have_selector('li', text: /#{search_term}/i)
  end
end

Testing for the highlighted result is easy, you just need to look for the right tag:

And /^I should see the highlighted (.*)$/ do |search_term|
  wait_until {page.find(".search-results").visible?}
  within('.search-results') do
    page.should have_selector('b', text: /#{search_term}/i)
  end
end

Thanks for reading. To continue the discussion contact me: or

Related posts