If you need to add a Spotlight-like search feature to your Rails site, ElasticSearch is a good answer. However, there are several things you need to cover.
Working on a project, the client asked us to implement a search feature that should work pretty much like OS X Spotlight search: instant results on a popup and search-as-you-type, sugared with results highlighting. We decided to use ElasticSearch. Here’s the tale of what we did, the problems we found and how they were solved.
The scenario
The following image shows the intended design requested:
Search results should be taken from Enquiries and Bookings both active and past (called Alumni), and should show the Student name, his email, the reference the Enquiry or Booking is idenfitied with, and an icon that allows us to know its status. Also, Alumni should be shown in a different section of the results, to help the user differentiate. In any field, the text matching the search input must be hihglighted, and finally, when hovering over a search result, we want it to change color and clickable, so we can visit the desired post or comment.
The following Cucumber feature test gives an idea of the desired behaviour of the feature:
@javascript
Feature: Global search
In order to access Bookings and Enquiries
as a blog administrator
I want to use a search form that returns classified rich results in real time
Scenario: Search is present
Given I am in the home page
Then I should see a search field in the top menu
Scenario: Use Search
Given a booking exists
Given an alumni exists
Given an enquiry exists
And I am in the home page
Then I should see a search field in the top menu
When I type in stud
Then I should see search results for stud
And I should see the highlighted stud
And I type in student
Then I should see search results for student
And I should see the highlighted student
Scenario: Empty results
Given I am in the home page
Then I should see a search field in the top menu
When I type in Cinderella
Then I should not see results for Cinderella
Setting up ElasticSearch
Local development
ElasticSearch is available for OSX, Unix and Windows platforms. You can get it working via package managers (and homebrew) or downloading it from their website. Note that you need Java installed and running on your machine to make ES work.
Using ElasticSearch on a Ruby on Rails application
We chose to use the gem Tire, as we found it had more documentation and related posts on the web that would potentially help to troubleshoot any problem we could find. There are alternatives, such as ElasticSearch-Ruby, which is very similar to Tire (in fact, both are developed by Karmi)
Continous Integration Server
This project uses CircleCI as continuous integration server, and using ES on it is very easy. From CircleCI documentation:
Several services are disabled by default because they’re not commonly used, or because of memory requirements. We try to detect and enable them automatically, but in case we fail (or don’t have inference in your language), you can enable them by adding to your
circle.yml
:
machine:
services:
- elasticsearch #version 0.90.2
Production server
The project is deployed on EngineYard, so we made use of this Engineyard recipe
Adapting The Backend
Models
In order to get ElasticSearch to show results, we need to index our info first. This means, our search does not attack our application database, but another different database stablished by ES, in which the selected information from our database is stored and updated by our application.
To get our Bookings and Enquiries indexed in the ES database, we need to modify our models:
class Enquiry
# This indicates that we are going to use Tire to index this model
include Tire::Model::Search
# This allows the model to be reindexed with its new info when modified
include Tire::Model::Callbacks
# We'll need this further on to specify which fields from the indexed data
# we want to perform our search on
SEARCHABLE_FIELDS = [:student_name, :student_email, :reference]
SEARCHABLE_STATUS = ['new', 'open', 'sent_for_payment', 'requires_approval', 'rejected']
# We setup a mapping to use ElasticSearch built-in tokenizers and analyzers
# for certain fields
settings ElasticSearchSettings do
mapping do
indexes :author_email, type: 'string', analyzer: 'email_analyzer'
end
end
# This is used to limit and control which attributes are indexed
# and the info indexed depending on the model status
def to_indexed_json
reduced_info = {
id: _id.to_s,
student_name: self.student_name,
student_email: self.student_email,
category: "enquiries",
reference: self.reference,
status: self.status
}
reduced_info.to_json
end
...
end
The changes made to the Booking model are very similar:
class Booking
include Tire::Model::Search
include Tire::Model::Callbacks
SEARCHABLE_FIELDS = [:student_name, :student_email, :reference]
settings ElasticSearchSettings do
mapping do
indexes :author_email, type: 'string', analyzer: 'email_analyzer'
end
end
def to_indexed_category
if self.end_date_inclusive.past? && !self.cancelled?
"alumni"
else
"bookings"
end
end
def to_indexed_json
reduced_info = {
id: _id.to_s,
student_name: self.student_name,
student_email: self.student.try(:email),
category: self.to_indexed_category,
reference: self.reference,
status: self.cancelled? ? "cancelled" : "active"
}
reduced_info.to_json
end
...
end
Configuration
We’ll need to configure the aforementioned email_analyzer on config/elastic_search.yml to use the proper ES tokenize and be case insensitive:
analysis:
analyzer:
email_analyzer:
tokenizer: "uax_url_email"
filter: ["lowercase"]
And also, we need a config/initializers/elasticsearch.rb_:
ElasticSearchSettings = YAML.load_file('config/elastic_search.yml').with_indifferent_access
After that, in order to get all our DB data into ES, we need to call a rake task:
$ rake environment tire:import:all
The controller
Apart from the Cucumber scenario, there were some requirements we needed to satisfy on our search, these search_controller#perform tests were designed to deal with this. Note that for each test regarding specific aspect of the Bookings results, there is a very similar one regarding Enquiries and past bookings, named as Alumni.
it "#perform will not respond to an empty query" do
post :perform
assert_response 406
end
it "#perform will not produce results for a query with less than 3 chars" do
post :perform, :query => 'Joh'
assert_nil assigns(:results)
end
it "#perform will not produce results for something that doesn't exist on the database" do
post :perform, :query => 'John Doe'
assert_nil assigns(:results)
end
it "#perform will produce results for bookings" do
booking = FactoryGirl.create(:booking)
post :perform, :query => booking.student_name
assigns(:results)[:bookings].first[:student_name].must_equal comment.student_name
end
it "#perform will not produce results for queries that don't have matches in the database" do
booking = FactoryGirl.create(:booking)
post :perform, :query => "Student 0"
assert_nil assigns(:results)
end
it "#perform on Bookings will return the correct fields" do
booking = FactoryGirl.create(:booking)
post :perform, :query => booking.student_name
expected_keys = ["id", "category", "link", "student_name", "student_email", "reference", "status", "status_icon"]
assigns(:results)[:bookings].first.keys.must_equal expected_keys
end
it "#perform won't return more than six results in Bookings search" do
bookings = FactoryGirl.create_list(:booking, 10)
post :perform, :query => "Student"
assigns(:results)[:bookings].size.must_be :>=, 1
assigns(:results)[:bookings].size.must_be :<=, 6
end
# We need one like this for each field returned
it "#perform on Bookings will return a link to the the post" do
booking = FactoryGirl.create(:booking)
post :perform, :query => booking.student_name
assigns(:results)[:bookings].first[:link].must_equal edit_or_show_path(booking)
end
# We need one like this for each field returned that uses highlight
it "#perform on Bookings will return the email correctly highlighted" do
booking = FactoryGirl.create(:booking)
post :perform, :query => booking.student_name
assigns(:results)[:bookings].first[:student_email].must_equal "<b>#{booking.student_email}</b>"
end
Below is a dissected version of our our perform search method on the search_controller.rb:
# As per requirements, we won't be performing any search if the query input is smaller than 3 characters
if query.present? && query.length >=3
booking_formatted_query = query_from_tokens(Booking::SEARCHABLE_FIELDS, query, "category:bookings")
Let’s stop there. To create a query processable by Tire and give us the right results, we need to combine the query with the searchable fields we defined on the model, and then combine it all with any possible scope limit we want to impose. So if we want to search for “John”, and in this case we don’t mind the status of the comment, what query_from_token
will do is return a query string like this:
(student_name:John OR student_email:John OR reference:John) AND (category:bookings)
No we have the real query that is fed to ElasticSearch. We use Tire DSL and methods to get our results:
bookings = Booking.search do
query { string booking_formatted_query, :default_operator => "AND" }
# Limits the number of results we will get, we had six as per requirements
size 6
end
We perform a similar search on the Enquiries and Alumni bookings:
enquiry_formatted_query = query_from_tokens(Enquiry::SEARCHABLE_FIELDS, query, Enquiry::SEARCHABLE_STATUS.map{|s| "status: #{s}"}.join(" OR "))
enquiries = Enquiry.search do
query { string enquiry_formatted_query, :default_operator => "AND" }
size 6
end
alumni_formatted_query = query_from_tokens(Booking::SEARCHABLE_FIELDS, query, "category:alumni")
alumni = Booking.search do
query { string alumni_formatted_query, :default_operator => "AND" }
size 6
end
After that, we get our results together, and put them in a hash object that we will feed as a JSON object to our front-end:
returned_results = [enquiries, bookings, alumni].flatten.compact
@results = {}
query_regexp = query.split(/\s+/).map{|e| "(#{e})"}.join("|")
returned_results.each do |result|
@results[result["category"].to_sym] ||= []
@results[result["category"].to_sym] << {
id: result["id"],
category: result["category"],
link: edit_or_show_path(result),
student_name: result["student_name"].gsub(/#{query_regexp}/i, '<b>\+</b>'),
student_email: result["student_email"].gsub(/#{query_regexp}/i, '<b>\+</b>'),
reference: result["reference"].gsub(/#{query_regexp}/i, '<b>\+</b>'),
status: result["status"],
status_icon: status_to_icon(result["status"]).first.capitalize
}
end
Why are we not using nGrams and Elastic search highlight?
It’s worth noting here that our index is only composed of names, emails and reference numbers.
We’ve tried to use nGrams but because of our index, the results where very unpredictables.
Also our index being quite small and targeted, we feel confident using regexp search won’t have a massive performance impact.
We’ve also used Elastic search highlight, but it will highlight the whole word found. For exeample searching for ‘cind’ would highlight the whole world ‘Cinderella’ which we didn’t want. This is why we reverted to using our own highlight system.
Changing The Frontend
To the controller and back again
Adding a search field to our layout was fairly easy, but we needed it to work in a “search-as-you-type” fashion. For this, we wanted to control the keyup event using JavaScript, triggering an AJAX search request upon each keystroke, when we have more than 3 characters written:
$("#spotlight-search-query").on "keyup", (e) ->
if $.trim($(this).val()).length > 2
$("form#spotlight-search").submit()
else
$(".search-results-popover").hide()
The submit function is in charge of doing the request, receive the JSON with the results and passing them to the view template.
$("form#spotlight-search").on "submit", (e) ->
e.preventDefault()
$.post($(this).attr("action"), {query: $("#spotlight-search-query").val()}, (data) ->
content = $("<div class='search-results'>")
receiver = $(".search-results-popover")
searchResults = $("#spotlight-search-query")
if data?
for result in ['posts', 'comments']
if data[result]?
content.append(
JST['search_results']({
results: data[result]
_type: result
})
)
else
content.append("<p class='no-results'>No results found.</p>")
receiver.html(content)
receiver.show()
, "json").done(->
).fail(->
).always(->
)
Testing ALL THE THINGS!
Testing our Rails application with the ES dependence can be tricky to start with. We followed this article on BitsAndBit and this post on StackOverflow. Here are the highlights:
Indexes uniqueness
To avoid our development search indexes to be wiped out on each test run, we needed to define exclusive test search indexes. This can be solved by expanding your elastic_search.rb initializer:
Tire::Model::Search.index_prefix "#{Rails.application.class.parent_name.downcase}_#{Rails.env.to_s.downcase}"
Clean indexes on each run
Also, when testing, we need to have clean search indexes indexes. If you’re using Rspec, you can do this on a before :all block, but we are using Minitest and it doesn’t allow :all blocks, so we chose to add it a the top of our search_controller_test.rb. In any case, be sure to do it just once (i.e.: not in a before :each
block):
[...]
describe SearchController do
#Set up the test indexes
indexed_models_list = [Booking, Enquiry]
indexed_models_list.each do |klass|
# make sure that the current model is using tire
if klass.respond_to? :tire
# delete the index for the current model
klass.tire.index.delete
# the mapping definition must get executed again. for that, we reload the model class.
# silenced warning to avoid warning due to constant redefinition on files load.
silence_warnings do
load File.expand_path("../../../app/models/#{klass.name.downcase}.rb", __FILE__)
end
end
end
before do
[...]
Manually refresh indexes
Note that the ES indexes are automatically refreshed each second. When testing, multiple assertions take place every second, so there’s a chance that the changes we are expecting to test have not yet been persisted in the ElasticSearch database. To solve this, we need to manually refresh the search index after earch change on our test database. So, a test like this:
it "#perform on Bookings will return the email correctly highlighted" do
booking = FactoryGirl.create(:booking)
post :perform, :query => booking.student_email
assigns(:results)[:bookings].first[:student_email].must_equal "<b>#{booking.student_email}</b>"
end
Turns into this:
it "#perform on Bookings will return the email correctly highlighted" do
booking = FactoryGirl.create(:booking)
Booking.tire.index.refresh
post :perform, :query => booking.student_email
assigns(:results)[:bookings].first[:student_email].must_equal "<b>#{booking.student_email}</b>"
end
Allow calls to ES
Finally, we will need configure Capybara on our test helper to accept petitions to ElasticSearch from within our tests:
WebMock.disable_net_connect!(:allow => "localhost:9200")
Wait for JS to do its thing
When doing the feature test, we realized we needed to give time to the Javascript call to execute, return, and present the results. The method wait_until was really helpful with this. It’s also worth mentioning that wait_until is not present on recent Capybara versions.
When /^I type in (.*)$/ do |search_term|
page.fill_in "query", :with => search_term
wait_until {page.find(".search-results").visible?}
end
Then /^I should see search results for (.*)$/ do |search_term|
wait_until {page.find(".search-results").visible?}
within('.search-results') do
page.should have_selector('li', text: /#{search_term}/i)
end
end
Testing for the highlighted result is easy, you just need to look for the right tag:
And /^I should see the highlighted (.*)$/ do |search_term|
wait_until {page.find(".search-results").visible?}
within('.search-results') do
page.should have_selector('b', text: /#{search_term}/i)
end
end