Full text search for document attachments with Rails & ElasticSearch
I've started working on a project that requires full text search on uploaded documents using ElasticSearch. Lucky enough, ElasticSearch has this Mapper Attachments Type. It is a plugin and can be easily installed. There are few important things to note here: ES accept attachment as an encoded ...
I've started working on a project that requires full text search on uploaded documents using ElasticSearch. Lucky enough, ElasticSearch has this Mapper Attachments Type. It is a plugin and can be easily installed. There are few important things to note here:
- ES accept attachment as an encoded string in base64
- By default only 100,000 chars are extracted from attachments. You need to config if you need more
- It handles a lots of file types, not just document. See here
So far, there are several gems that make it easy to work with ElasticSearch such as Tire, Chewy, ElasticSearch Rails and Searchkick. Except Tire which has been retired for a long time, I believe that any of the other three gems will work well. I chose Chewy because it has a dedicate wiki that gives an example of configuration for attachment full text search.
And of course CarrierWave is used to handle upload process.
Following is a sample code for a Product model with two fields: name and attachment
class ProductsIndex < Chewy::Index define_type Product do field :name field :attachment, type: "attachment", value: ->product { if product.attachment.present? Base64.encode64 open(product.attachment.path).read else "" end } end end
A shortcut for quick search:
class << self def search keyword fields = %w[name attachment] ProductsIndex.query multi_match: {query: keyword, fields: fields} end end
Link for demo https://github.com/nguyenducgiang/chewy-demo
But it's not the only solution
What if we just extract text content from document ourself before passing it to ES as a normal string? It is possible using gem like Yomu