Saying grok is uncensored is like saying that deepseek is uncensored. If anything deepseek is probably less censored than grok. The doplin family has given me the best results, though mostly in niche cases.
Thanks for testing this. The Bannon email from June 30, 2019 is in there (HOUSE_OVERSIGHT_029622). Good stress test idea.
Couple things happening:
Semantic search limitation: Less-famous names don't have strong embeddings, so it defaults to general connections rather than specific mentions
Keyword search gap: You're right — raw grep can catch exact names I'm missing
I saw a similar problem. Roger Schank had some conversations with Epstein and the emails can be seen in Epsteinvisualizer.com but your site claimed there was no emails or connection. To be fair to Roger, who was an AI legend of his time and someone I knew personally before his untimely death, he really was not a pedo, and most likely never got involved with the girls, I think him and Epstein just talked about AI and education mostly.
Shareable conversations would definitely make the tool more useful yeah.
I really like the query parameter approach over UUIDs so it would make links human-readable
On the limited dataset: Completely agree - the public files are a fraction of what exists and I should have mentioned that it is not all files but all publicly available ones. But that's exactly why making even this subset searchable matters. The bar right now is people manually ctrl+F-ing through PDFs or relying on secondhand claims. This at least lets anyone verify what is public.
On LLMs vs traditional NLP: I hear you, and I've seen similar issues with LLM hallucination on structured data. That's why the architecture here is hybrid:
- Traditional exact regex/grep search for names, dates, identifiers
- Vector search for semantic queries
- LLM orchestration layer that must cite sources and can't generate answers without grounding
"can't" seems like quite a strong claim. Would you care to elaborate?
I can see how one might use a JSON schema that enforces source references in the output, but there is no technique I'm aware of to constrain a model to only come up with data based on the grounding docs, vs. making up a response based on pretrained data (or hallucinating one) and still listing the provided RAG results as attached reference.
It feels like your "can't" would be tantamount to having single-handedly solved the problem of hallucinations, which if you did, would be a billion-dollar-plus unlock for you, so I'm unsure you should show that level of certainty.
Trump famously told New York Magazine in 2002: "I've known Jeff for 15 years. Terrific guy. He's a lot of fun to be with. It is even said that he likes beautiful women as much as I do, and many of them are on the younger side."
Trump and Epstein were social acquaintances in Palm Beach and New York circles during the 1990s-early 2000s. They socialized together at Mar-a-Lago and other venues
If you want to understand these people, watch the Daily Beast podcast "Inside Trump's Head" with Michael Wolff. It's a little slow but will paint the picture of their motivations, friendship, falling out, etc etc.