Decoding Search: What it means for JavaScript Developers

Introduction and Welcome

The speaker welcomes the audience to the conference and appreciates the organizers for providing excellent food. Transitioning from a previous session on generative AI, the current talk will focus on the importance of search in software applications, emphasizing performance aspects and the integration of search functions in both frontend and backend development.

The Importance of Search Performance

Discusses the significance of search in enhancing user experience, particularly in the context of e-commerce sites, where delayed search results lead to poor user experiences. The session targets engineers across various programming disciplines, aiming to impart knowledge on designing efficient search experiences using Generative AI principles to improve user interfaces.

Introduction to Search Techniques

The speaker introduces the session's agenda, which includes exploring different search techniques and their applications in JavaScript, addressing use cases, and design considerations, and highlighting the importance of understanding the nature of data and search performance requirements.

Keyword Search Explained

An explanation of keyword search is provided, highlighting how it matches query keywords with words in a dataset. The limitations of keyword search are addressed, especially in cases requiring nuanced understanding beyond exact keyword matches, with humorous references to the "Game of Thrones" series used for illustrative purposes.

Semantic Search and Contextual Understanding

Semantic search is described as leveraging natural language processing to interpret user queries for more contextually relevant search results. Through examples, the speaker demonstrates how semantic search improves understanding of a query's intent, offering a detailed explanation of its benefits over simple keyword searching.

Vector Search and Mathematical Representation

Vector search is introduced as a method for converting queries and dataset entries into mathematical vectors to determine similarity. The technique's reliance on mathematical principles, like cosine similarity, helps in achieving highly relevant search results, especially in systems requiring similarity-based document retrieval.

Comparing Search Methods and Introducing Hybrid Search

An explanation of how different search techniques, including keyword, semantic, and vector searches, can be integrated to form a hybrid search approach. This combination allows for more accurate, contextually enriched results by leveraging the strengths of each method across different applications and queries.

Implementation Tips and Open Source Libraries

The speaker offers practical advice for implementing search functionality, including using indexing and specific libraries for different search types. A discussion on maintaining consistency in data representation and the selection of open-source tools for integrating and enhancing search capabilities in applications is also provided.

So again, welcome.

And I know this has been a power packed day and I've, I really enjoyed the food.

How many of you enjoyed the food?

By the way, I think like Web Directions has some of the best conference food, and this coming from a person who attends a lot of technical conferences.

And I can say with a guarantee that huge shout out to every organizer for making such amazing food.

But I hope like now we have had food and we had such an energy packed session by Phil.

He's amazing.

And he spoke a lot about, generative AI.

And of course, there will be some sections about how you will use vector search.

So of course, he gave a really basic, a really nice primer on what vector search is.

But of course, today we are going to be talking more about search in general.

And of course, when we talk about search, we also talk about performance, right?

Because a lot of times we say that when you're visiting a website, it should load very quickly.

So that is why we had so many performance stocks, but at the same time, search is such a vital experience because if you're on an e commerce site and you're searching for a particular product and it's taking 20, 25 seconds to actually load your desired product, it's a very bad user experience.

So when you're designing applications that have search as an integral part, having a performance search is very important.

And of course, as software engineers, this particular talk is not just for JavaScript developers, but in general in, designing a good search experience.

So whether, regardless of the fact that you might be a JavaScript or a frontend developer, backend developer, this will be very helpful for everyone just understanding the core basics of what are the different type of search that you can use and today in the world of Gen AI, how you can adopt Gen AI principles to make more compelling and better user experiences that really capture the sense of being able to have as much freedom to the end user to get the desired results that they want.

Through having a good search engine experience.

So with that, I'll quickly start.

So I'm sure why, thanks for the introduction, by the way.

And I'm a relations engineer at Couchbase.

So Couchbase is a no SQL database platform.

And of course, as with a lot of the other database platforms, we were, In that adoption line.

So we adopted vector search as part of the core database search service back in this year, early, probably in April.

With that in mind, let's quickly get started.

Of course, you might have probably come across, especially during the early 2000s, like a custom built in Google search that used to be there in the PHP or the WordPress built in sites.

And you might have seen that how much terrible that experience was when you were trying to use, you're not getting any sort of relevant search results.

So it's a good thing that we have come a lot further apart, a lot further, and we have improved the overall experience of using search.

Now, throughout this entire slide, I'll be using a reference from Game of Thrones, of course.

And the question is.

Is winter coming?

Now it's probably an oxymoron because we are probably on the hottest day in Australia, but of course there is winter in the rest of the world.

I was, I just spent a day in Yosemite and there was a lot of nice snowfall.

A couple of weeks ago I was in Salt Lake City and Park City in Utah where there is a lot of snowfall.

So coming into bright and sunny, in, Australia is probably an oxymoron, but let's just stick with winter so that all of us feel chilly in this room.

So we'll keep it themed around, Game of Thrones.

So how many of you know about Jon Snow, by the way, the character over here?

I'll be making some references, but just for everyone's awareness, like he's a central figure in the entire game of thrones, timeline.

And, he talks about his winter coming to, that particular town.

Again, I have not watched a lot of game of thrones, just some references, meme references.

So that's how I know about Jon Snow.

So let's imagine that, Jon is asking this question is winter coming.

Now Jon goes and clicks on search.

Of course, and searches through different kinds of search experiences.

Now, of course, the question is, what does Jon Snow want to know?

Of course, he wants to know about the search experience that is winter coming to the town of Westeros, which is a town fictional town in the game of thrones.

But of course, do you think that if you were to just do if Jon Snow today would do a simple Google search.

On will winter is winter coming.

What do you think?

Would the result be that it's coming to the town of Westeros?

How many of you think that would be the result?

This, is a simple question.

Is winter coming?

How many of you can you raise hand?

And okay.

So no one raises their hand.

So perfect.

So of course it wouldn't.

Do that, right?

It would probably say that yes, in Australia, based on your geographical location, winter, is going to be coming, in the month of probably, May, right, in Australia.

And, if you were in India or if you're in the U S it's already there.

So the thing with Google search is that.

Yes, it's a really great way to retrieve contact, context, but of course it has become a lot more difficult to get the relevant search results on the first page itself, because of course it will be having a lot of sponsored content and you might have to probably go to the second or the third page.

So it has become a lot more convoluted, right?

And, of course like that's not probably the best experience that you want when you're trying to search.

And of course, kudos to Google, search that now they have an AI powered search where it automatically summarizes your query and automatically generates the content for you.

And a lot of times it's very relevant because of course it is using vector search and a lot of advanced, Gen AI search techniques that gives you the result that you want, but of course it might still not be accurate, a hundred percent accurate.

So what we're going to be covering today is why is search relevant?

Of course, we have covered that.

It is one of the most important user experiences when you are designing some sort of an application.

And we'll be exploring the different kinds of search techniques and how you can apply them in JavaScript with the help of some code samples.

And of course, some of the libraries that you can use in order to incorporate these search experiences.

And we also learned that when you should actually use each of these search experiences and some quick do's and don'ts when adopting all of these search techniques.

Of course, first of all, like, why does this matter to all of us?

One morning you see this Jira ticket by your management, which says add search functionality.

And the management wants you to build a search bar and build it is ASAP and, it should work.

Now, does this give you a lot of context into what you want to build?

Not really.

Because of course, search could be as simple as just doing a simple document search, but it could become extremely complex, especially let's, if you're trying to build an e commerce search where you are not just adding search based on the product name, but you're also probably adding a lot of advanced filters, for example, based, searching based on certain product category or adding filters where you want to search for products based on their ranking, having a review of greater than or rating greater than 3.5.

So of course, search can become very complex.

So that is why you have to ask certain questions when you're designing a search experience, such as what are the performance requirements?

What is the context of search?

What is the use case?

Like, why are we even building a search in the first place?

What is the nature of the data that you're going to be dealing with?

Because when you're dealing with different kinds of data, your search experience will vary vastly.

And what is this problem that we are trying to solve?

These questions will help you to better understand what is the type of implementation you want to use and what is the type of search you want to use.

So of course, these design questions are what you should ask your management, and then that's how you should implement the search experience.

So let's start by covering the different types of searches.

The first one is keyword search.

Now, as the word suggests, it's very straightforward, which is matching the search query keyword with the data set or the data that exists in your data against which you're running the search query.

So again, let's go back to Jon and he asks, is winter coming?

Now, as I mentioned that you are basically matching the words in the query, line by line with the word set of words inside of your documents.

Now, how does this work?

So you take your search query.

Is winter coming and you tokenize them tokenize means that you break down each and every single word into separate tokens.

Similarly, of course, on your dataset side, you are also breaking them down into the different vectors.

So for example, one of the search results that you have in your dataset is winter is coming to Westeros and you break down all of them into the different tokens.

Now, you might have some additional dataset inside of your database, which might be winter clothing sale is coming.

Of course, in Australia, it's summer clothing, but you break again, this down into the different tokens.

So let's say that these were the two different types of datasets that you have in your database.

So the idea is again, you split your query into the different words, you calculate the relevant score based on how many of these tokens actually match and you order them based on the descending order of the highest number of tokens that match with each other.

So that's the most relevant search result.

And that's like a basic JavaScript code that you see on the right hand side that basically is just doing what I just mentioned step by step.

And again, like this is a sample implementation.

I didn't want to focus on a particular type of search engine.

It's a very raw implementation that you would probably implement in your own solution, right?

So of course, now considering that we had those two different type of data sets, you have the documents, which are winter is coming to Westeros and winter clothing is coming and you have the query as winter coming.

Now, if you see that the results, both of the scores are actually equal because in both of the cases, you get the text winter is coming in both of these, right?

So of course, Jon will get confused that, okay, for him, he really meant to see the result as winter is coming to Westeros, but he's getting this another search result with this.

Winter clothing sale is coming.

And probably during the time when, when, Game of Thrones was set in the land of ages of the dragons, something like clothing, they didn't even know that they were wearing clothes.

Like sales and markets was probably not a concept back then.

So just imagine that Jon does this get, gets confused at what's this winter clothing sales coming.

So it's probably not that relevant for him.

So as you can see that in this case, for Jon specifically, that while the search results are great, it's probably not as relevant for Jon, right?

So keyword search, as I mentioned is really effective for straightforward queries, but it struggles when you want to deal with more complex and more nuanced searches where there is a lot of context added as well.

So that is why let's go into the second type of search with the semantic search.

So the core idea behind semantic search as we'll explore.

So let's say again, Jon is looking now, he goes to, so the first website that he went was called as full text search dot com.

Now he goes to semantic search dot com and tries to search for the same type of query, which is winter coming.

In the case of semantic search, we are leveraging what is known as natural language processing or NLP, which means that the ability for the computer systems, or in this case, the AI, to be able to interpret your natural language that the users are speaking and be able to derive the intent and the context from your query.

So that it's better to be able to understand what's happening inside the query and how is it related to the data.

And then it gives you more relevant search results.

And so in this idea, if you look at the query, we are expanding it into is winter coming.

So of course, winter word could be related to either cold season or when all of the winter season and coming could be related to, or if you drive the context, it's arriving or approaching.

So the two types of questions that, you, can probably derive from over here is that's related to the context is.

Is cold season arriving or is winter season approaching?

So we are really looking at, each and every token that you have in the query and trying to derive the contextual meaning that's happening inside of the query.

And beyond that, you also try to understand the context based on your documents.

So for example, based on the documents that you already have in your data, this could relate to game of Thrones, which is winter is coming to a Westros, which is related to game of Thrones, or it's related to the season sales, right?

So these are the context internally that you know how the natural language processing is able to derive when you're using semantic search.

So again, if you look at how you might implement this in JavaScript, so you analyze your query with the help of some sort of NLP technique.

There are a bunch of libraries as of course Phil covered, which allow you to do this analysis, with TensorFlow JS and a bunch of other, libraries that you can use inside of JavaScript.

And then of course you expand the query with the related concepts that it is related to.

And then you basically calculate a relevant score, which again, something that you can set and you iterate over the semantic relevance score for each of the different documents, and you come together with a particular score.

And again, then you order your results based on the semantic relevance that they have inside of the documents.

So again, let's say that Jon does that particular search inside of the semantic search dot com.

And these are the same set of documents, but this time he's running the semantic search instead of the full text search.

And this time that the score is actually a bit different.

So you see that a winter is actually coming to Westeros has a higher score and winter clothing sale is coming, has a much lower score.

So in terms of the relevance, you immediately see.

That for Jon Snow, because he's probably related to the entire Game of Thrones, Game of Thrones series, for him now he's seeing much more relevant search results.

Now, of course, the, so that's a high level overview of that where semantic search works.

So it really excels at understanding the context and the intent that you have behind your search queries, which is why it's more effective when you have more complex and more nuanced queries, right?

So for example, if your query was 'a black chair', that's perfect for keyword search, but if you have a much more complex, I was giving that example yesterday with, Jason, because Jason wore a really, impressive, jacket, he wears it usually at his conference talks.

I'm searching for this kind of a black leather or cell.

Is it like svelte?

It's felt material.

So black felt material kind of jacket.

And that's where keyword search would probably fail miserably, but a semantic search will actually work because it will know what I'm trying to search for based on my intent that I have inside of my search query.

So now, of course, vector search, Phil gave a wonderful description of what vector search is.

So not take a lot of time over here, but the idea of we had again is now let's say Jon visits vector search dot com and he's trying to make the same search query.

So of course, with vector search, we are representing your entire query as mathematical object.

So you are converting your query into the mathematical vectors by using embedding based models.

And of course, these mathematical vectors carry the context, or the understanding of the features that you have inside of your data.

And now what you do is that how the, how vector search basically works is that, of course, as you mentioned that you take your query, you convert it into the word embeddings, and you do that for both your documents and your query.

And by the way, like the relevant thing to do is that for your existing documents, you can store the embeddings directly inside of your data.

So that you don't have to generate the vectors again and again.

And now you would do things like compute cosine similarity, Manhattan distance, Euclidean distance, a bunch of different techniques that allow you to find the relevant distance between the two different results, which are the query and the search results.

So we are, mainly looking at the mathematical distance or the closest between the different vectors, which represent that how closely the real world features of your data exist.

And then we get the similarity scoring based on the results.

So of course, one of the questions would be that is, okay, we spoke about semantic search and, we spoke about vector search.

So are both of them same or are both of them different?

So we'll answer that particular question in just a couple of minutes, but again, yeah, very similar architecture of converting your query into the vectors and then you use cosine similarity.

Here I didn't really use a library to generate the vectors, but you can see again, a reference implementation where we are writing a function to do the cosine similarity, where we use the dot product for the vectors that we have generated.

Again, you can use whatever embedding model that you want to use and generate the vectors and then do the compute to find the similarity vectors.

And again, very similarly, this time Jon Snow is using the vector search function.

And again, that there are different results that you get.

And in this case as well, you get more relevant search results when you are doing vector search based on the query that you are sending.

So of course the question arises that even vector search actually does capture the semantic relationships by converting these words and phrases into the numerical representations.

So it's really great when you're trying to find similar type of documents or closely related documents.

And that's the reason why vector search is probably the number one approach that you should adopt when you're building retrieval based systems, when you're trying to find similar related documents, because it does a very good job at finding the relevant or the similar type of documents that you might have in your system.

Of course, the question then is that, what's the difference between semantic search and vector search?

So very simple understanding is semantic search plays on the fact of NLP.

Now, of course, NLP is also mathematics.

Like there, there is a lot of similarities between the two, but then there are some slight differences in the way that they operate.

In a nutshell, semantic search works through the content and intent, the context and the intent from the NLP.

But whereas vector search is more purely mathematical related and how you understand the context or the relevance between the different search results.

So of course, as we are now coming towards the end of the session, let's wrap it up and see that how you can benefit from the example of this combining all of them, right?

Because we saw that, while, the full text search or the, just the full text search is really great at the keyword matching.

So the keyword search really works well when you want to just get exact search results.

And whereas semantic search or vector search really work well with understanding the contextual relevance of your search results.

So why not just adopt both of them and have what we call as hybrid search.

So hybrid search, again, we are taking example of Jon Snow.

And he now goes to hybrid search dot com and tries to find the search results.

So the idea of behind hybrid search is that it will utilize all of the different search techniques that we just talked about, and it will provide you more precise, contextually relevant and similar or, and semantically similar search results, right?

So it gets the precise results from doing the keyword matching.

And adding your filters, and then you can add your vector search or the semantic search to get more relevant search results, which again, improves the developer experience or the user experience for the users.

So here again, the process usually would be that again, and this is more for more of a design test on how you implement the hybrid search, but typically most times you'll see that first you would do a keyword search.

So in this case, a very simple example could be that first you do, you add a filter, for example, filter based on the years or, or the rating, let's say you're in an e commerce app and you just want to find search results, for ratings above four.

So first you will add your full text search filter, which is, I only want search results, which are higher than rating of four, and then you apply your vector search where you find relevant search results for that particular product.

So this way you can combine the benefits of both the, both the, keyword search and the vector search and get more relevant, search results and quicker search results.

Again, we are seeing an example where you combine, as I mentioned, you combine all of these search results or search examples, you then use a relevant score.

So this is where this is up to you how much weightage do you want to give to the keyword search versus the vector search?

So this is more of a design choice that if you want to give more weightage to the vector search versus to the keyword search, you can decide that filter.

I'll also quickly show you an example of how, that might look like in terms of a design.

And again, you'll implement that hybrid search function and you'll come across again, you'll find a different search result.

So again, what you're probably seeing is the question is, all of these search results are a bit different.

So of course, this will depend on your design consideration for your experience, like what you're trying to build.

And that's what we are basically now going to talk about when to use which type of search, right?

So keyword search, as I mentioned, just as a quick summary, that it's really great for straightforward queries.

Now, of course, the implementation offers of a keyboard search is very simple.

Like it's a lot more straightforward and usually it'll have a lot less compute as well.

'Cause you're not using any AI models to generate embedding, so you're not using NLP, which of course will require you to use an AI model.

So in this case, generally, this will be a lot more, safer and a lot more easier to implement and a lot more cost effective.

And of course, when you should use it, like when you're doing direct search matches.

So for example, in an e commerce app, or if you're having an inventory based management app, searching on the base of the SKU, because SKU is unique for every product.

So if you're just doing a SKU match, keyword match is perfect example for that.

Now, semantic search, of course, as you mentioned, it works with the help of natural language and it can basically also understand the context and the intent which are relevant for your particular query.

So an example of here would be FAQ, right?

So if you have an FAQ based chatbot or, a frequently asked question based section in your website, a question could be, how do I implement this thing?

And of course, the user does not always going they're not always ask the question in the same structure, they might restructure the way that they ask the question, but here having a semantic search, which understands the context of the query is a really good example, because, the semantic search will not care what type of question the user is asking or how the user is asking the question.

It will always be able to find the relevant context of that question and respond to you in a desired format.

And then of course vector search based on the mathematical vectors.

Hence, it really helps you to understand the complex queries with the semantic relationship.

Perfect example is Netflix.

So of course, a lot of us love to watch YouTube, Netflix during our free time.

So Netflix works on the content system, right?

So based on the content that you have just watched, it will show you relevant content that really closely matches the type of content that you have just watched.

So there it's using a lot of content recommendation.

And in fact, if you look at cosine similarity or Pfizer algorithm from Facebook, these have been around for almost 15, 20 years, ever since Facebook dot com actually came into the picture when Facebook was implementing these cosine similarity algorithms to find the relevant people that you should connect with, it could be on LinkedIn or on Facebook.

And they were already using these type of algorithms, but now we are just using it for search so that you find the relevant search results that, you can find based on your content, preferences that you are seeing on these platforms.

And of course, hybrid search, as you mentioned, that it combines the benefits of all of the different kinds of search techniques that we have mentioned.

And of course, one thing to keep in mind is that you'll always have to balance the compute cost and the accuracy, when you're using hybrid search.

And of course, the perfect example is an e commerce site.

Where you are not just having a search bar, but you have all of the different kind of search filters.

For example, you have the search slider for, finding, like certain categories or you have a filter, like a search filter where you can filter based on the rating.

So it can become more, a lot more complex, right?

You're not just finding results based on the exact keyword matching.

So just a quick recap, I know I'm running a bit out of time, but this is a quick recap of what we covered in the session.

Now, just to probably talk about some implementation tips, if you have come across, some tips that you might want to implement.

So do the, do these things and do not do certain things when you're implementing, when you're doing keyword search, always ensure that you use indexing because indexing allows you to make your search queries faster because essentially you're breaking down your queries into tokens and then you're able to do the search queries much faster.

And of course, always, also use stemming and limitization.

So what does that mean is that you basically, for example, if you have a word running, you convert it into its base form called run.

And then limitization is that you have something like buying, it will be like get.

So these are some techniques that allow you to just make it more efficient when you're using keyword search.

And of course, when, what the things that you should not do, you should always ensure that, you should not ignore the stop words, like things like stop, don't, these words are often ignored.

But they can often change the meaning of your entire sentences.

So ensure that you always take them into the relevance and into the context.

Similarly for semantic search, you can use pre trained NLP models like BERT, all of these, existing models, OpenAI models, open source models that you can use.

And always ensure that you regularly fine tune your models based on your content because your content might be changing dynamically.

And of course, don't forget the compute because you are using AI models.

And this is also similar to a similar for vector serch, where you should always regularly update your vectors based on how your data is changing.

Because a lot of times what might happen is that your data gets changed, but you forget to change your vectors.

And then that can cause a discrepancy when you're trying to do your vector search.

And of course, some of the things to keep in mind is that again, building effective vector search indexes and ensuring that you use relevant type of embeddings so that you have the proper context, being stored inside of the vectors is also really important.

And with hybrid search, like the main thing to keep in mind is that you have to monitor, the weights of the different search components and how much weight is you want to give to your full text search, versus your vector search or the hype or the semantic search that you might use as part of your search technique.

And that will be more focused on the user experience and will be something that you might have to tweak in order to make your user experience much better.

Of course, as we close on, I just wanted to also cover some of the open source libraries.

Now I divide them into two major parts.

One are the core search engines, which are just dedicated search, libraries that you can use.

So there are both closed source libraries like Algolia.

Or open source libraries like MeilieSearch, Typesense, Elastic Search that you can use.

But of course, there are a lot of different databases that have built in capabilities.

So of course, Phil just spoke about Cassandra.

And then of course, I work at Couchbase where, the full text search is basically like an, like a built in functionality inside of the core database itself.

So you can use databases that come with inbuilt search so that you don't have to use an external score search engine.

Although you can still use them side by side as well.

So those are some of the libraries that you can consider using when you are trying to implement a full text search or a search experience inside of your code base.

With that, I'll conclude.

Now before that, doing that, I just wanted to also quickly show you an example.

So in this case, like what do you see is that I have over here, like a movie dataset.

So of course, if I'm trying to search for a movie like Shrek, Or probably I can search for TopGun.

Of course it's doing like a keyword search where you see, probably in my dataset, it's not, it doesn't have that, but it's, probably a show that's perhaps related to TopGun.

I might not be a hundred percent correct because it's like working on limited dataset.

Okay.

But now this is where, for example, if I want to add a filter, I can add a filter based on when was that particular movie released, right?

So I can add filters based on that.

I only want to search results based on movies released between nine to 1901 and 1989.

And then I search specifically for, fighting movies.

So hopefully what it will do is that you can see that the way that it's adding that search result.

So in the query, it's adding this concept, which is basically like a, another, like, a sub query where it says that I only want the search results based on the minimum value of the year being 1901.

And the maximum being 1989 and then I can have more such kind of search filters that I might want to allow that I might want to basically adapt.

So you can see that in this case, the first research result is from 1976 because I added the filter of, that the movie should exist between 1901 and 1989.

So this way you can, and if I just quickly show you how this has been implemented.

So this is where I'm basically adding the filter.

In which I provide the year range, the minimum rating, if I want to search in the title and then, when I'm finally going ahead and doing my search query, the simple thing that I'm doing over here is that inside of my filter options, I'm providing the actual query, like with the context, which are the different type of filter, operations that I'm giving to my code.

And of course, this is an example of a vector search.

So I'm providing the embeddings.

I'm using this case, Couchbase vector store to generate the embeddings and store them inside of my database.

And then finally, when I'm trying to use the similarity search, I'm also passing the filter option.

So the search filter, to have that hybrid search capability where I'm using both the keyword search to do the filters.

And then after I, add the filters, I will use the vector search.

So that's a quick example of how you would see this inside of a code base.

So this is an example of a NextJS app that I just showed you.

But yeah, these are the links to my slides and I'll be around over here to answer any queries.

Thank you so much.

Thanks.

Bye.

Some of the common search experiences

Custom builtin Google Search

- Slow search

Image of a Google Custom Search box with a search bar labeled "Search this site:" and another search bar with the text "Sweetshirt".
A snowy path lined with bare trees covered in snow.

IS WINTER COMING?

John Snow dressed in medieval armor and fur cloak.
Close-up image of a computer screen displaying a web interface with a "Search" button and a drop-down menu, with a cursor pointing to the button.

WHAT DOES JON SNOW WANT TO KNOW?

WHAT DOES JON SNOW WANT TO KNOW?

What will the search return?

Using Google once felt like magic, and now it’s a lot more difficult to find search results due to sponsored content.

The Tragedy of Google Search, The Atlantic, September 2023

What we are going to cover

  • Why does this matter for you?
  • Exploring the different types of search:
    • Keyword
    • Semantic
    • Vector
    • Hybrid
  • When to use each type of search
  • Implementation Quick Dos and Do Nots
Image of Daenerys Targaryen from Game of Thrones, with the context of the presentation slide "What we are going to cover".

Why does this matter to you?

Screenshot of a project management interface with a task titled "Add Search Functionality" and related task details.

What are the performance requirements?

What is the context of the search?

What is the use case for search?

What is the nature of the data?

What problem is it solving?

These questions (and more!) will help guide you to an implementation solution.

Illustration of two characters from a medieval fantasy setting, engaged in conversation by a table.

Exploring the different types of search

Image of a person touching a virtual search bar icon on a screen.

Keyword Search

IS WINTER COMING?

The image of John Snow repeated

Word Match

Keyword Search

Match the words in the query to words in a set of documents.

Illustration of two arrows pointing horizontally to the right, placed next to the words "Keyword Search".

Tokenization

Three interconnected orange circles labeled "Is," "Winter," and "Coming," suggesting the breaking down of the phrase "Is Winter Coming?" into tokens.

WINTER IS COMING TO WESTEROS

Visual with the text "Winter Is Coming To Westeros" stylized similar to Game of Thrones logo on the left. On the right, four orange circles with arrows pointing left. Text in circles reads "To", "Westeros", "Is", "Winter".

WINTER CLOTHING SALE IS COMING!

Image of a billboard in a snowy landscape with text 'WINTER CLOTHING SALE IS COMING!', accompanied by orange circles containing the words 'Sale Coming Is Winter Clothing' and two large, stylized arrows.

Keyword Search in JavaScript

  • Query is split into individual words (tokens)
  • Calculate a relevance score based on the number of matching tokens in the document text
  • Iterate over each document, calculate the relevance score, and filter out documents with a score of zero
  • Order the results by relevance score in descending order
// Keyword search function
function keywordSearch(query, documents) {
   // Tokenize the query
   const queryTokens = query.toLowerCase().split(/s+/);
   // Function to calculate relevance score
   const calculateScore = (text, tokens) => {
       let score = 0;
       const textTokens = text.toLowerCase().split(/s+/);
       tokens.forEach(token => {
           if (textTokens.includes(token)) {
               score += 1;
           }
       });
       return score;
   };
   // Search through the documents
   const results = documents.map(doc => {
       const score = calculateScore(doc.text, queryTokens);
       return { ...doc, score };
   }).filter(doc => doc.score > 0)
     .sort((a, b) => b.score - a.score);
   return results;
}

Keyword search is effective for straightforward queries, but it struggles with complex or nuanced searches.

Semantic Search

IS WINTER COMING?

Natural Language Processing

Semantic Search

Utilizes natural language processing (NLP), and context understanding to interpret queries

Illustration of two arrows pointing right, adjacent to the words 'Semantic Search'.

Query Expansion

Is Winter Coming?
  • Cold Season
  • Winter Season
  • Arriving
  • Approaching

Is Cold Season Arriving? / Is Winter Season Approaching?

Flowchart illustrating query expansion for the question "Is Winter Coming?" showing branches to "Cold Season," "Winter Season," "Arriving," and "Approaching."

Context Understanding

Semantic Analysis

Context Recognition

Game of Thrones "Winter is Coming to Westeros"

Seasonal Context "Winter Clothing Sale is Coming"

Related to Game of Thrones / Related to Seasonal Sales

Diagram showing relationships between semantic analysis and context recognition. Includes examples related to "Game of Thrones" and seasonal sales context.

Semantic Search in JavaScript

  • Analyze the query to understand intent using NLP techniques
  • Expand the query with related concepts
  • Calculate the relevance score based on semantic similarity between the query and documents
  • Iterate over each document, calculate the semantic relevance score, and filter out documents with a score below a certain threshold
  • Order the results by semantic relevance in descending order
// Semantic search function
async function semanticSearch(query, documents) {
    const queryAnalysis = await analyzeQuery(query);
    return documents.map(doc => {
        const score = computeSemanticSimilarity(doc.text, queryAnalysis);
        return { ...doc, score };
    }).filter(doc => doc.score > 0.5)
      .sort((a, b) => b.score - a.score);
}
// NLP query analysis
async function analyzeQuery(query) {
    const synonyms = {
        "winter": ["cold season", "winter season"],
        "coming": ["approaching", "arriving"]
    };
    return query.toLowerCase().split(/s+/).flatMap(token => synonyms[token] || [token]);
}
// Compute semantic similarity
function computeSemanticSimilarity(text, queryTokens) {
    let score = 0;
    const textTokens = text.toLowerCase().split(/s+/);
    queryTokens.forEach(token => {
        if (textTokens.includes(token)) score += 1;
    });
    return score / queryTokens.length;
}
Image shows JavaScript code snippets demonstrating a semantic search function using natural language processing techniques and relevance scoring.

Is Winter Coming

const documents = [
  { id: 1, text: "Winter is coming to Westeros" },
  { id: 2, text: "Winter clothing sale is coming" }
];
const query = "Is winter coming?";
semanticSearch(query, documents).then(results =>
console.log("Search Results:", results));

Semantic search excels at understanding the context and intent behind queries, making it highly effective for complex or nuanced searches.

Vector Search

Illustration of an orange icon with overlapping arrows and a red geometric shape.

IS WINTER COMING?

Math, and more math

Vector Search

Utilizes mathematical representations (vectors) to capture semantic relationships and similarities between queries and documents

Illustration of two arrows aligned with the text "Vector Search" on a presentation slide.

Vector Search Process

“Is Winter Coming?”

Word Embeddings

Convert Query and Documents to Vectors

Compute Cosine Similarity

Similarity Scoring

What are embeddings? Numerical representations that capture the meaning of words or phrases.

What are vectors? Array of numbers representing words or phrases

Flowchart illustrating the vector search process with steps: Word Embeddings, Convert Query and Documents to Vectors, Compute Cosine Similarity, and Similarity Scoring.

Vector Search in JavaScript

  • Convert query and documents to vectors.
  • Calculate relevance scores using cosine similarity.
  • Iterate over each document, calculate the relevance score, and filter out low-scoring documents.
  • Order the results by relevance in descending order.
// Vector search function
async function vectorSearch(query, documents) {
    const queryVector = await getVector(query);
    return documents.map(async doc => {
        const docVector = await getVector(doc.text);
        const score = cosineSimilarity(queryVector, docVector);
        return { ...doc, score };
    }).filter(doc => doc.score > 0.5)
     .sort((a, b) => b.score - a.score);
}
// Mock function to get vector representation
async function getVector(text) {
    // Placeholder for actual vector generation
    const vectors = {
        "Is winter coming?": [0.1, 0.2, 0.3],
        "Winter is coming to Westeros": [0.1, 0.2, 0.4],
        "Winter clothing sale is coming": [0.1, 0.3, 0.2]
    };
    return vectors[text] || [0, 0, 0];
}
// Compute cosine similarity
function cosineSimilarity(vec1, vec2) {
    const dotProduct = vec1.reduce((sum, v, i) => sum + v * vec2[i], 0);
    const normA = Math.sqrt(vec1.reduce((sum, v) => sum + v * v, 0));
    const normB = Math.sqrt(vec2.reduce((sum, v) => sum + v * v, 0));
    return dotProduct / (normA * normB);
}
The right side of the slide contains a code snippet that demonstrates vector search, vector representation retrieval, and cosine similarity computation in JavaScript.

Is Winter Coming

const documents = [ { id: 1, text: "Winter is coming to Westeros" }, { id: 2, text: "Winter clothing sale is coming" } ]; const query = "Is winter coming?"; vectorSearch(query, documents).then(results => console.log("Search Results:", results)); [ { id: 1, text: "Winter is coming to Westeros", score: 0.99 }, { id: 2, text: "Winter clothing sale is coming", score: 0.89 } ]

Vector search captures semantic relationships by converting words and phrases into numerical representations, making it highly effective for finding similar documents.

SEMANTIC SEARCH

Understanding Content and Intent Through NLP

VECTOR SEARCH

Mathematical similarity of words and phrases

Image of a Tyrion Lannister from Game of Thrones wearing armor, alongside a graphic with the words 'NOW I UNDERSTAND' on a note.

Hybrid Search

Illustration of interlocking orange arrows and abstract symbols.

IS WINTER COMING?

Why choose only one?

Hybrid Search

Utilizes keyword matching, semantic analysis, and vector search to provide precise, contextually relevant, and semantically similar search results.

Illustration of two arrows aligned horizontally next to the text "Hybrid Search".

Hybrid Search Process

"Is Winter Coming?"
Illustration of a winding road leading to process steps: Keyword Match, Word Embeddings, Convert Query and Documents to Vectors, Compute Cosine Similarity, Similarity Scoring, Contextual Understanding.

Hybrid Search in JavaScript

  • Combine keyword matching, semantic analysis, and vector similarity.
  • Analyze the query and expand with related concepts.
  • Convert query and documents to vectors.
  • Calculate relevance scores using cosine similarity and context understanding.
  • Iterate over each document, calculate the hybrid relevance score, and filter out low-scoring documents.
  • Order the results by hybrid relevance in descending order.
async function hybridSearch(query, docs) {
    const queryTokens = query.toLowerCase().split(/s+/);
    const queryVec = await getVector(query);

    return docs.map(async doc => {
        const docTokens = doc.text.toLowerCase().split(/s+/);
        const docVec = await getVector(doc.text);
        const keywordScore = queryTokens.filter(t => docTokens.includes(t)).length;
        const vectorScore = cosineSimilarity(queryVec, docVec);
        const score = (keywordScore + vectorScore) / 2;
        return { ...doc, score };
    }).filter(doc => doc.score > 0.5).sort((a, b) => b.score - a.score);
}

async function getVector(text) {
    const vectors = {
        "is winter coming?": [0.1, 0.2, 0.3],
        "winter is coming to westeros": [0.1, 0.2, 0.4],
        "winter clothing sale is coming": [0.1, 0.3, 0.2]
    };
    return vectors[text.toLowerCase()] || [0, 0, 0];
}

function cosineSimilarity(v1, v2) {
    const dot = v1.reduce((sum, v, i) => sum + v * v2[i], 0);
    const norm1 = Math.sqrt(v1.reduce((sum, v) => sum + v * v, 0));
    const norm2 = Math.sqrt(v2.reduce((sum, v) => sum + v * v, 0));
    return dot / (norm1 * norm2);
}
Screenshot of a presentation slide with a title about hybrid search in JavaScript, a bulleted list on hybrid search techniques, and JavaScript code snippets for relevant functions.

IS WINTER COMING

const documents = [
    { id: 1, text: "Winter is coming to Westeros" },
    { id: 2, text: "Winter clothing sale is coming" }
];

const query = "Is winter coming?";

hybridSearch(query, documents).then(results =>
    console.log("Search Results:", results)
);
[
    { id: 1, text: "Winter is coming to Westeros", score: 0.95 },
    { id: 2, text: "Winter clothing sale is coming", score: 0.72 }
]
Image of a magnifying glass and arrow surrounding text "IS WINTER COMING" with visual connectors to blocks of text.

Hybrid search combines keyword matching, semantic analysis, and vector similarity, providing precise, contextually relevant, and semantically similar search results.

When to use each type of search

When To Use

KEYWORD SEARCH
  • Straightforward queries
  • High precision on exact matches
  • Simple implementations
  • Lower computational cost

SEARCHING BY SKU

Image of a person scanning a barcode on a package using a smartphone.

When To Use

SEMANTIC SEARCH

  • Queries with natural language
  • Context and intent are relevant
  • Handling synonymy and polysemy
A slide showing three checkmarked bullet points related to using semantic search.
Image of a finger touching a digital search button overlay on top of a wooden texture with blocks spelling FAQ and a woman pointing upwards. There is also a plant in the background.

When To Use

VECTOR SEARCH
  • Finding similar documents or content
  • Recommendations based on similarity
  • Handling complex queries with semantic relationships

NETFLIX

Canva
Image of a person touching a virtual search icon and a red mug with "Just One More Episode" in front of a blurry laptop screen.

When To Use

HYBRID SEARCH

  • Combine benefits of: keyword, semantic and vector
  • Needing precision and context
  • Balance computational cost and accuracy
Image of a miniature shopping cart containing boxes marked 'Fragile' next to a small orange shopping basket, with a person typing on a laptop in the background.

Recap

Keyword Search

Document titles, product codes, exact titles

Semantic Search

Customer support queries, FAQ search, ambiguous meaning

Vector Search

Similar types of documents, content recommendations, image search, fraud detection

Hybrid Search

E-commerce platform search portal, healthcare information systems

Implementation Tips

DO THESE THINGS

& DO NOT DO THESE THINGS

A scene with two characters sitting at a table in an old stone room

Tips for Keyword Search

  • Rely solely on keyword search for complex queries
  • Neglect performance optimizations for large datasets
  • Ignore stop words without considering the context
  • Overlook user experience
red prohibition icons next to each bullet point.

Tips for

Semantic Search

  • Overlook importance of context in interpreting queries
  • Rely solely on semantic search for highly specific queries
  • Forget the compute cost for running NLP models
  • Forget ambiguity and polysemy
red prohibition icons next to each bullet point.

Tips for

HYBRID SEARCH

  • Balance computational cost and search accuracy
  • Monitor and adjust the weighting of different search components
  • Combine the search techniques best for your use case
Illustrations of check marks next to each tip.

Tips for

HYBRID SEARCH

  • Forget users are important!
    Solicit feedback
  • Overly rely on one technique in your build
  • Make your code overly complex and hard to maintain
three red prohibition symbols next to bullet points about hybrid search tips.

Open Source Libraries / Closed Source Libraries for Search with JavaScript

Core Search Engines

  • Algolia
  • Meilisearch
  • Typesense
  • Elastic Search

Databases built in with Search

  • Couchbase

Thank you!

shivay.lamba@couchbase.com
@howdevelop

A QR code on the right side of the slide links to the slides.
Screencsst of a web application titled "Movie Search - Streamlit". It shows search options including "Use LangChain" and "Enable filters", a section to find a movie with "Shrek" as an example, and details about the movie "Shrek" such as a synopsis, score, release year, IMDB rating, and runtime, alongside an image of the Shrek movie poster.

Thank you!

shivay.lamba@couchbase.com
@howdevelop

QR code on the right side of the slide.
  • AI powered search
  • JavaScript
  • JavaScript code
  • natural language processing (NLP)
  • cosine similarity
  • vector search
  • cosine similarity function
  • vector search
  • vector search indexes
  • full text search