Skip to content

Data Indexing (Trigram)

I have to tackle the problem of how I am going to quickly provide partial matches to the user. My first thought therefore was using an Index. The issue here is that since the app will be providing suggestions in real time it will have to search for entries based on partial inputs (The user will enter “goog” and is searching for google and I don’t want to wait to give him suggestions until he types the whole word but provide them at this stage already). This is where I remembered that I read about a similar issue in an academic paper [1] a while ago and that there, they split the strings up into substrings and indexed those. After some research I found that this is called Trigram Indexing and is a perfect fit for my application. Trigram Indexing works by splitting every Key (in this case the tickers and names of the stocks) into all possible adjacent three character combinations and indexing them. By breaking down strings into trigrams the backend can quickly compare a search query to an indexed text field to find approximate or partial matches. I now can implement this functionality in the deployment script which will create the database, create the table, inject the data from the JSON file and create the trigram indexes automatically. I save the stock_index model into the models folder alongside the main database and then the database part of the recommendation backend is done.