Interesting post! But as general,I was wondering, since we can host Chroma, Faiss, or Qdrant on our own infrastructure without relying on APIs, wouldn’t they offer better performance and scalability compared to Postgre for large-scale vector search? Is my understanding correct, or am I missing something?
Thank you for this valuable insight! While MongoDB also offers hybrid search, I'm particularly intrigued by PostgreSQL's native capabilities without external dependencies. Based on your explanation about its built-in BM25 + vector search, I tried mongodb in my one of the project but now I'm eager to explore PostgreSQL for my next project.
Cool, yeah - I don't know if it came through clearly in the video - but a lot of platforms do hybrid search where they try to combine both. This can lead to bad results because they are effectively orthogonal methods (bm25 and vector search). Better to grab the top chunks with each method and inject them all (rather than trying to create a hybrid metric).
1. For the keyword/bm25 part (which I think is critical to good overall retrieval performance), Postgres offers a lot natively. So Postgres is quite a good option if you want hybrid search (dense + sparse), which I think you should want because it improve retrieval performance. That's not to say you can't do sparse/bm25 in chroma/faiss/qdrant, but they may not provide as good native support in certain cases.
2. The biggest thing about Postgres is that it's just common and a lot of people have data in that form. If you do, then it means you can keep your tech stack simpler (and more transferrable) if you just stick to Postgres.
Interesting post! But as general,I was wondering, since we can host Chroma, Faiss, or Qdrant on our own infrastructure without relying on APIs, wouldn’t they offer better performance and scalability compared to Postgre for large-scale vector search? Is my understanding correct, or am I missing something?
What do you think Prashant?
Thank you for this valuable insight! While MongoDB also offers hybrid search, I'm particularly intrigued by PostgreSQL's native capabilities without external dependencies. Based on your explanation about its built-in BM25 + vector search, I tried mongodb in my one of the project but now I'm eager to explore PostgreSQL for my next project.
Cool, yeah - I don't know if it came through clearly in the video - but a lot of platforms do hybrid search where they try to combine both. This can lead to bad results because they are effectively orthogonal methods (bm25 and vector search). Better to grab the top chunks with each method and inject them all (rather than trying to create a hybrid metric).
Nice comment:
1. For the keyword/bm25 part (which I think is critical to good overall retrieval performance), Postgres offers a lot natively. So Postgres is quite a good option if you want hybrid search (dense + sparse), which I think you should want because it improve retrieval performance. That's not to say you can't do sparse/bm25 in chroma/faiss/qdrant, but they may not provide as good native support in certain cases.
2. The biggest thing about Postgres is that it's just common and a lot of people have data in that form. If you do, then it means you can keep your tech stack simpler (and more transferrable) if you just stick to Postgres.