Create your own search engine

In the Hugging Face community event I learned how to use FAISS(Facebook AI Similarity Search) to find documents that are most semantically similar to a given query. The goal of this project is to extend this idea to build a retrieval and reranking system, where the retriever returns possibly relevant results, while the reranker evaluates the how relevant these hits are to the query.

An example of the architecture might looks as follows (taken from the sentence-transformers library):

Model(s):

The sentence-transformers models on the Hub are great for the reranking task.

Datasets:

Wikipedia is usually a good corpus to test retrieval systems on and you can find a dump in various languages here:

Challenges:

Implementing the full retriever-reranking architecture might be a challenge, so a simpler place to start is with a single long document. You can then chunk that document into paragraphs and compute the relevancy scores across each paragraph

Complete code that is used to develop this app during the event can be found at my git hub:
https://github.com/abhibisht89/hf_course_event_adr/tree/main/NSE

Desired project outcomes:

  • Create a Streamlit or Gradio app on  Spaces that allows a user to enter a search query about a document (or a whole corpus of documents), and returns the top 5 most relevant paragraphs.

Demo for this app can be found at Hugging face space :
https://huggingface.co/spaces/abhibisht89/neural-search-engine

Additional resources:

And I am thrill to get some very positive Response from community on the space Demo. Here are some !!!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s