Hacker Newsnew | past | comments | ask | show | jobs | submit | mtricot's commentslogin

Great to see you here!

Let me see what's up and fix that!

Just want to call out a couple of nuances in our methodology. In general, we tried our best to do apples-to-apples comparisons where we could, and gave ourselves a discount where we couldn’t. Unsurprisingly, it’s a challenge to find MCPs for various vendors (which is another reason we are trying to solve this). Here’s a video walkthrough of the benchmark harness:https://www.loom.com/share/9d96c8c64c1a4b7fad0356774fc54acc

Where the comparison wasn't valid or not apples-to-apples:

Gong and Zendesk: no official native MCP exists, so we used the most popular community implementations we could find. We were only able to benchmark Gong Search as the Gong MCP does not have a Get tool call.

While our Search testing yielded the same number of records on either path, vendor-specific search implementations means results aren’t identical. Contents are similar in general, so the ratios remain directionally correct.

The general test set:

2 scenarios (Retrieval and Search) across 4 connectors isn’t a huge test set. While we hope to extend this over time, we’ve made the harness public so anyone can contribute in the meantime. Let us know if you find any MCP with better results!

Where the vendor MCP wins or ties:

Salesforce showed the smallest win at 16%. This is primarily because Salesforce, unlike many vendors, uniquely provides great search support out of the box with their SOQL.

We see identical records for Get. As noted, Search returns different sets of identical counts. Airbyte uses fewer tokens because the Salesforce records contain mandatory metadata (type and url).

Where the vendor MCP is costly to context:

Zendesk is a great example of this. The extreme gap is because the Zendesk MCP (reminder - a community alternative) returns the entire API response in search results. This averages to 9KB per record against our production Zendesk account!

Airbyte’s implementation provides filtering, which allows agents to retrieve the minimal data needed to achieve the outcome, explaining the drastic gap.


Talking about going back memory lane :) The initial name of the project was "conduit"...


Not at the moment but let me bring that to the team so we can brainstorm what it could look like.


When reading the tutorial, we are describing one stack to build a specific app. But the stack is made of building blocks that you can replace with others if you need to.

- Airbyte has two self-hosted options: OSS & Enterprise

- Langchain: OSS

- OpenAI: you can host an OSS model if you want to

- Pinecone: there are OSS/self-hosted alternatives


> - OpenAI: you can host an OSS model if you want to

Just to confirm: you mean models like Facebook's Llama 2 and variants right? Since OpenAI hasn't released any OSS models.


correct


What about the embedding?


No good reason. Does "it made the post's title too long" work?


Works for me!


Isn't it the dream? Today there is a lot of stack that needs to be built to enable what you're describing. This is actually what we are doing with that post. What foundations do we need to build so that the UX for the end user is what you're describing. Will take some time to get there :)


It depends.

Airbyte comes in 3 flavors: OSS, Cloud, Enterprise.

For OSS & Enterprise, data doesn't leave your infra since Airbyte is running in your infrastructure. For Cloud, you would have to allow some IPs to allow us to access your local db.


For the purpose of the tutorial that we built, it really comes down to the type of data that you're using.

If you have data with PII:

One option would be to use Airbyte and bring the data into files/local db rather than directly to the vector store, add an extra step that strips the data from all PII and then configure Airbyte to move the clean file/record to the vector store.

The option that jmorgan mention is relevant here, using a "self-hosted" model.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: