If you require to have an account to try the web search out (which you have all the right in the world, it's your service), tell us before we enter the service and type in our search. This comes around as sneaky. You should be clear upfront.
fanzhang 18 hours ago [-]
If they put the sign up first, you wouldn't know how much more information you'd need to give before doing a search (just a string), and that seems like it would undersell themselves.
This seems a lot better than those quizzes or quotes that ask a bunch of questions first and then ask for your email at the end -- or worse -- a payment.
vetleen 12 hours ago [-]
I didn't mind, and I think alot of people don't mind.
a_n 16 hours ago [-]
why do people complain so much, just be happier
OsrsNeedsf2P 15 hours ago [-]
In case you're serious, continuous dark patterns make the web exhausting to use. When I get an email I didn't want, I report it as spam so the sender's domain reputation decreases[0]. Making a comment calling out the practice is the closest you can do here.
I searched for 'data providers that start with the letter R that sell job postings data', and it's been 15 minutes and it barely verified the first row.
But if it filtered it first to "start with the letter R", it would only have to look at perhaps 5% of the results it's trying to verify!
So it's doing needless verification of results that will be thrown out by another filter that should've been applied first!
liam-hinzman 1 days ago [-]
We were down for a bit! Ran your search, got 8 matches after analyzing 100 results. Took 40 seconds for the first match, and another 80 seconds for the other matches.
We use an agentic search planner that adapts its search strategy as matches are found, but it could be smarter with substrings.
I think you guys nailed the "selling shovels during a gold rush" as the biggest issue with LLMs currently is their reliability/hallucinations, not their capabilities. If I can use websets to back up LLM responses through your API, that's super useful.
Since you were part of YC 21, could you share a bit about your pivots/product iterations you went through over the last 4 years?
willbryk 1 days ago [-]
Mission of Exa has always been to build much better web search. The evolution has been:
- 2022: Consumer-facing embeddings search (back when we were known as Metaphor)
- 2023: Web search for AIs - once the AI ecosystem heated up, we made a business out of web search + crawling API. This is still our primary business.
- Now: Websets, a useful product built on top of our search tech
If you're curious, our company right now is fully devoted to:
1. Dramatically improving Websets quality
2. Building the best general search engine in the world
gavinward 1 days ago [-]
It must be heartening for a startup trying to build the best general search engine in the world to know that Google has absolutely no interest in competing with you.
willbryk 1 days ago [-]
Because Google makes money from ads, they're not actually optimized to build the best general search engine in the world, they're optimized to build the search engine that makes the most from ads, which is correlated with being a good search engine but not perfectly aligned. Our business model (paying directly for the search) incentivizes us to try to return the highest quality results, without any bias toward making money from ads. It also enables us to do things like pour a ton of compute/resources into a query to get the best possible results we can find, because someone would pay us a lot for that, and that's hard to do under an ads-based model.
cannonpalms 19 hours ago [-]
Can you provide more information (or links) about that billing model you describe?
The incentive structure behind paying by the search has diminishing returns, as I see it. You need the results to be of a high enough quality to drive the user to want to run another search with you. Beyond that point, though, in the absence of a direct competitor, where is the incentive for you to continue improving search result quality? M
xp84 1 days ago [-]
This is super cool! It took a while, but did a great job of evaluating the results, and the airtable-like results UI feels great.
Congrats on your launch. With the natural way this lends itself to comparison shopping this is an amazing tool for people trying to find "the best X for me" whether that's a TV, a school, etc. So much content that you find on Google when trying to answer that type of query, is designed to trick, bamboozle, and to hide the facts that you might use to answer this question (but most of all to get you to click affiliate links).
wdrw 3 hours ago [-]
I was trying to submit some feedback using your "Feedback" button on the top right, but got an error when trying to submit it :(
Anyway, the model used doesn't seem to be very good, it did not understand a basic "OR" criteria. I asked for a list of companies with an office in Toronto that are involved in hardware development such as custom silicon, robotics, satellites or drones. It completely misunderstood the "or" part (and the "such as" part). E.g. I see many robotics companies marked as a "Miss" because they only do robotics but not any of the other things on my list.
Overall though I love the idea, I would pay for your service (on a pay-as-you-go per-query basis) if the underlying model was smart enough for me to actually rely on the results.
byearthithatius 1 days ago [-]
I was so excited for this, but sadly it doesn't work at all, not even UI feedback for the error:(
The UI showed literally no change. So I checked and the console shows:
```
Try: 14 Not Found 681-7df1b139fa2dc9f0.js:14:3379
Try: 15 Not Found 681-7df1b139fa2dc9f0.js:14:3379
Try: 16 Not Found 681-7df1b139fa2dc9f0.js:14:3379
Try: 17 Not Found 681-7df1b139fa2dc9f0.js:14:3379
Try: 18 Not Found 681-7df1b139fa2dc9f0.js:14:3379
Try: 19 Not Found 681-7df1b139fa2dc9f0.js:14:3379
Try: 20 Not Found 681-7df1b139fa2dc9f0.js:14:3379
Gave up after 10 seconds. 681-7df1b139fa2dc9f0.js:14:3379
filteredSuggestions
Array(3) [ {…}, {…}, {…} ]
681-7df1b139fa2dc9f0.js:14:3379
```
Also your table doesn't fit in the viewport so I can't see the results.
Firefox Ubuntu.
pilingual 1 days ago [-]
When OpenAI was rumored to acquire Windsurf last week I went to their site and switched languages. When I tried to switch back it got into a weird state and didn't display the original language. Not sure what to think of that other than vibe coding may need a little more oversight. (Who is working on AI QA? Winning pickaxe and shovel business right there.)
tibbar 1 days ago [-]
I also thought the UX had silently died on me, but over the course of a few hours, results slowly rolled in. And they were pretty good, for what it's worth! It's clear they have far more demand than supply, at least than can be reasonably offered for free.
liam-hinzman 1 days ago [-]
We were down for a bit, back up now! Lmk how the search quality turns out for you
joshstrange 1 days ago [-]
I think it might be a good idea to give some kind of indication that work is being done in the background (or perhaps mine stalled out?).
The initial search/experience is good but then I got dumped here [0] and it's not clear to me if things are still happening or if it broke (it's been at least 5 min with no UI updates.
I can't see the full results yet but this is very interesting and a task I ask OpenAI's Deep Research to attempt periodically. It makes a good show of doing the work but the results are not great IMHO (for asking it generate lists/tables of data like this). I can see this tool being incredibly useful for lead generation (how I am testing it out).
btw I like how you host screenshots on your personal website
mbeavitt 1 days ago [-]
This is super cool. You provide examples of “searches that work” - can you give an idea of the limitations here? What kind of searches won’t work?
willbryk 1 days ago [-]
We're a startup, so most of our resources go towards use cases that our users care most about. So the search should work best for - people, companies, papers, high quality written content (e.g., blogs, news). It should work well at more than just those (try Github repo search, it's quite good :D), but those are the best supported.
Types of searches Websets doesn't currently do well at:
- products (e.g., ecommerce sites)
- Content that requires authentication/permissions to access
- non-English content
Some of the above are on our roadmap, and let us know if there's some type of data you'd like us to support!
and it did nothing to the page at all, choosing to still show the "Full-stack engineers in SF that are great at design, and have worked at an AI startup" example table
I'm open to the fact that "I'm holding it wrong" or whatever, but the response payload included things that are clearly not GitHub Repositories
It seems it is about 30/70 on finding the things I asked for, so I don't mean to imply it's worthless, but it is yet another example of "turns out, AI does not solve all problems"
---
I make a habit out of having the dev-tools open when interacting with things where the comments have explicitly called out "we were down and we don't check our response.statusCode" and that's the only reason I am able to offer you any feedback whatsoever
liam-hinzman 20 hours ago [-]
The API response you were looking at is the preview search, the full search linked below found 25 matches in a minute.
> github repos that are implementations of ReBAC authorization servers
I don't know what "preview search" means, as I felt that I described that if I didn't have the dev tools open I wouldn't have "previewed" anything. I also didn't understand that one needed to put the search term "github repos" in the actual query
Anyway, two things which may interest you:
- please don't reimplement <table> in whatever whizbang JS framework-o-the-day; your results have the columns fixed at 180px, truncating all descriptions and URLs. Maybe it's an upsell for all I know
- your cURL in the Get Code is demonstrably wrong and I have no idea how it escaped a basic straight-face test; -d '{\"foo\":1}' literally sends brace backslash doublequote
And then, just like my first experience, the matches do not all return repos matching the query criteria. My colleague at work has to tell Cursor "try harder" so maybe you can benefit from including that in your prompt, too
colkassad 1 days ago [-]
Geospatial data would be great. This stuff is notoriously annoying to search for. For example:
"Give me a list of free imagery service endpoints I can use in a maplibre style sheet. Include information such as name, description, service endpoint, service type, extent (global/regional)."
willbryk 1 days ago [-]
This might be possible if you specify geospatial location as an enriched column. The visualization of it as a map though is not supported in the UI, but can be built by giving an LLM access to the Websets API
jacojaco 22 hours ago [-]
[dead]
vetleen 12 hours ago [-]
You did get me to click the 'upgrade' button, but the pricing is too high for me.
I did one search with 4 criteria, then added the two free columns, and at this point i had spent 750 of my 1000 free credits. The next tier being $49 with only 8000 credits, which means only 10 searches a month.
The search I did was super useful, and I would love to use the product, and reccomend it to my coworkers. But the pricing is what stops me.
Best of luck. I'll probably use it once a month if I can remember :)
drob518 8 hours ago [-]
Some feedback for you.
1. I love the idea.
2. The UI needs to work on smaller screens (e.g., tablets). The current layout is VERY cramped.
3. Its ability to search for businesses in a given geography is poor. I asked it to search for businesses in a city and it was giving me results that were obviously incorrect from halfway across the country.
4. For a homepage URL for a business, it once gave me a parked domain name at GoDaddy's "domain for sale" page. That seemed like a blunder. Is that because it's pulling in WHOIS information and it connected some addresses?
5. Performance is quite poor. Perhaps that's because you're getting "Hackernews'd" with a surge of people consuming all your capacity.
campl3r 7 hours ago [-]
The UI is perfect for my phone. Love the information density
drob518 7 hours ago [-]
Interesting. I haven't tried on my phone. Perhaps my iPad resolution falls in the middle of a switch between two different layouts and it's trying to use the desktop version.
liam-hinzman 6 hours ago [-]
I’ll add better support for tablet breakpoints in an hour, thanks for flagging!
theamk 1 days ago [-]
Did my favorite search query, and the result were pretty bad, as expected:
"robotics servo motors with two-directional control for under $100"
2. https://www.pololu.com/ - this is huge store, but they do have some motors like that. Pass, but wish it linked to specific page and not top top-level one.
3. dh-robotics.com - no prices, but some products on open market are few K$. Likely fail as well.
5. https://www.lynxmotion.com/ - another huge store, most two-directional motors are expensive but there are some under $100... Pass, but wish it linked to specific page and not top top-level one.
> So the search should work best for people, companies, papers, high quality written content.
> Types of searches Websets doesn't currently do well at: products, content that requires authentication/permissions to access, and non-English content
gertlex 1 days ago [-]
Curious: what is an example of a robotics servo motor with one-directional control?
My experience around such started with pwm hobby servos, includes dynamixels, and I've worked with larger stuff using harmonic drive gearboxes. Can't recall encountering a "servo" that is one-directional.
theamk 1 days ago [-]
PWM-controlled hobby servo (1-2mS pulse evert 20mS or so) is the one-directional control I had in mind. When you are under $100 range a surprising number of servos use the same simple 1-wire protocol, even large-ish 150 kg-cm / 100W units.
Dynamixels are two-way, and they are an exact thing I'd wanted to see in search results.
gertlex 19 hours ago [-]
Ahh, you seem to be referring to two-directional (two-way) communication, and I took it to mean rotational direction. Was imagining servos analogous to devices that simply power a motor in on/off states, so can't reverse.
esafak 1 days ago [-]
I suggest caching and enabling the sharing of results. I am not signed in so I don't know if that is feature I am missing.
I searched for "alternatives to jq with a functional API" and one of the criteria it came up with was "Provides technical details or comparisons relevant to the alternatives" but the table only listed the repo's url and description. And the description was truncated with ellipses with no way for me to resize the columns. Also, it missed the opportunity to tell me that some shells can replicate jq's functionality. Finally, it would have to be faster to be a daily driver. At this speed, it is something I would reserve for backup, for when the workhorse fails. Which means I would not want to pay $49/month.
Hope that helps. Interesting idea.
willbryk 1 days ago [-]
Thanks for the feedback!
Yeah we'd love to make the product as accessible and cheap as possible, but as of state of AI costs of 2025, it's a very expensive product to run and so we have it login gated. If you're willing to log in though, you'll find a lot of the features that you're mentioning :)
seektable 4 hours ago [-]
Websets are cool - I remember that 2 decades ago there was a project in Google Labs that tried to return google search results as 'objects' x 'properties' but it never left their research sandbox (cannot remember project's name unfortunately).
Searches that give tabular results can be cheap if you already have structured datasets (extracted from crawled data), so LLM can simply convert the user's natural language query to SQL query (or SQL-like query) which can be cost-efficiently executed - say, with DuckDB. This approach can also give more correct results - as values in these structured datasets can be validated in the background, not as an individual 'deep research' task.
I understand that this is another kind of search service, however, this can be a way to offer free/cheap searches for users who don't need expensive individual research tasks.
liam-hinzman 1 days ago [-]
Without signing in you’re only able to view the preview table, which is just Exa’s regular search.
If you sign in each result will be graded by an LLM, supporting references will be found, you can get agents to add arbitrary data to each result, and the table UI is much better.
Understand if you don’t want to sign up, I’d just look at the examples linked in the OP in that case
dbuxton 1 days ago [-]
Hey! Congrats on the launch. I just signed up for a trial account and I’m pretty impressed with the search API (haven’t used websets yet but looks cool).
Our experimental use case is enabling quick and dirty integration of web-based docs into an employee service agentic chatbot - lots of the questions are around “how do I max out my 401k”, which connects to internal information, but some are more like “how do I link a calendar to calendly”.
The one thing I’d love to have in the search product is a cruft cleaner for the results of web queries. Where you have cached the data presumably this wouldn’t add much overhead. Reduces what you have to feed to the LLM downstream and might improve the embeddings performance.
willbryk 1 days ago [-]
By cruft cleaner, do you mean cleaning the HTML well? Right now, we do 2 things to help with that, a pretty robust parsing stack as well as a "summaries" feature that returns an LLM-generated query-biased text output for every webpage returned.
If something else though, curious.
frankramos 1 days ago [-]
The Exa LinkedIn webset is something very innovative. Many current providers make it difficult if not against "Terms of Service" to build a product using their data. The irony is that they simply scraped LinkedIn.
willbryk 1 days ago [-]
Thanks for the support - we're getting hug of death though so please bear with us while we scale up!
srameshc 1 days ago [-]
So the crawlers are feeding to database and also something is classifying the data stream and organizing the data and everything is open as a very large dataset. This is an interesting concept.
cobertos 1 days ago [-]
What your describe is the same concept as what https://hash.ai purports to be
willbryk 1 days ago [-]
Yup exactly! And we expose this as a regular search API as well as in the Websets product.
ixxie 9 hours ago [-]
Seems awesome, but let me know when your entry level plan is under $10. I'd love to be able to prepay for credits rather than have a subscription!
jackienotchan 1 days ago [-]
AI crawlers have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN (and experienced myself).
Do you have any built-in features that address these issues?
antoniojtorres 1 days ago [-]
I work in the adtech ad verification space and this is very true. the surge in content scraping has made things very very hard in some instances. I can’t really fault the website owners either.
whoisjuan 1 days ago [-]
Did you guys change the pricing of Exa?
When I checked this a year or so ago, I might have gotten the impression that it was cheaper. Now, it costs the same as what Perplexity charges for search-grounded queries, which is the same as Google charges for Gemini queries with search.
So basically, one player sets a price, and everyone is anchored on that as the pricing for the entire category? I'm just genuinely interested in why every offering in this space is priced like this.
It seems a bit misaligned with how pure LLM queries are priced.
I have a product that would benefit from search grounding, but this pricing wouldn't work with my volume of queries.
liam-hinzman 1 days ago [-]
We charge $5 per 1000 requests with our search and answer endpoints.
Perplexity charges the same on their lowest tier model, and three times as much for their more expensive models.
What product of yours would benefit, if I dont mind asking?
upcoming-sesame 1 days ago [-]
This is a nice alternative for my Gemini Deep Research use case.
Most of the time I want to find some vendors / companies and Deep Research does that but also responds with a wall of unnecessary text where I just want the table
mfrye0 1 days ago [-]
Congrats on the launch!
How do you dedupe entities, like companies and people? I've noticed ChatGPT tends to provide "great" results when asking about different entities, but in reality it just groups similar sounding entities together in its answer.
For example, I asked ChatGPT about a well known startup. It gave me a confident answer about how much they raised, their current status, etc. When looking at the 3 sources they cited though, it was actually 3 different companies that all had similar sounding names that it just grouped together to form its answer.
Basically, how do I trust the output of your system?
liam-hinzman 1 days ago [-]
We find supporting references when evaluating the search criteria / enrichments of each result, and you can view these citations
I understand Clay, which your Websets product is clearly inspired by, does a fair amount of matching based on domain name or LinkedIn url.
If Websets is doing fuzzy or naive matching, that's okay. I'm just trying to understand the limitations and potential uses cases of your current system.
liam-hinzman 1 days ago [-]
Deduplication is mainly driven by LLMs with search results as context. Our entity resolution works well because Exa’s main business is crawling and indexing the web at scale, and we can control how we search across that within Websets.
As far as I know ChatGPT’s search is primarily a wrapper around another company’s search engine, which is why it often feels like it’s just summarizing a page of search results and sometimes hallucinates badly.
mfrye0 1 days ago [-]
Thanks for the info. That makes sense.
Looking forward to trying out the product more when I have a moment.
waterproof 19 hours ago [-]
I love the enrichments feature. Have you considered making it available separately from the initial web search?
I often have projects where the enrichments feature alone would be super useful: I would provide, say, a list of company names, and then use enrichments to qualify them based on location, age, founder experience etc etc.
willbryk 18 hours ago [-]
stay tuned!
foobahhhhh 1 days ago [-]
Very nice! Like a Databricks for Google, or perhaps think of it as Google backend as a service (at least their AI like backend not the main search)
It disrupts anyone who merely does one thing this does. E.g. contact building app can be done by this. I imagine many "wrapper" apps can be built on this.
I am serious though. It felt like using databricks a little bit, obviously without all the functionality but that will come.
I'm bullish! Modulo competition. Someone who does this makes their billion.
AznHisoka 24 hours ago [-]
Disagree. Sure, there are a few general companies that need “good enough” results, but whats more likely is they need extremely high quality results for their specific need/niche.
bhl 23 hours ago [-]
> The second is that LLMs provide the last-mile intelligence needed to verify every result. Each result and piece of data is backed with supporting references that we used to validate that the result is actually a match for your search criteria.
Evals on this would be great to benchmark the gap between using websets versus a generic web search tool. Otherwise to a developer, it's just marketing.
The search engine was impressive enough but I think this implementation was a nice cherry on top.
ByteAtATime 1 days ago [-]
This is really cool! Just a small nitpick: on a low-powered device, the hero globe is really laggy (it's fine if I scroll past it, though).
herpdyderp 1 days ago [-]
Even with it working I initially thought it was trying to convey some meaning, but it's just a bunch of logos not really doing anything.
koakuma-chan 1 days ago [-]
And it doesn't work at all if you have WebGL disabled, just shows "Application error: a client-side exception has occurred (see the browser console for more information)."
Upvoted for your rational stance and thanks for pointing that out. I could have expressed myself in a better way anyways.
arglebarnacle 20 hours ago [-]
A dead project that has been replaced with a fork with a different name? Maybe I’m missing something but this really doesn’t seem so bad as far as name collisions go
wormius 20 hours ago [-]
Yeah I realized that as I went to the site. Egg on face, I'm a big disgrace... (etc etc). Also boo to me for going against the "site vibes" here. I will admit I could have been a bit more chill about the issue.
forthwall 1 days ago [-]
Really novel idea but - I think there's a bug for the first example, when I land on the websets page, it searches "Engineers with startup experience based in california" but whats returned are a bunch of tennis websites
liam-hinzman 24 hours ago [-]
That was an unmaintained landing page with old iframes, thanks for catching that! Removed all links to it, and set it to redirect to the correct page at websets.exa.ai
henryway 1 days ago [-]
Perhaps the word “set” is difficult to disambiguate between web sets and tennis sets.
Gamester 23 hours ago [-]
Congratulations on the launch
Very helpful for candidates searching but still a bit slow for every day use
Like - “what are the events happening today in my city”
But I believe you guys will crack it soon and make it better
tibbar 1 days ago [-]
Wow, this is such an exciting product to me, a great application of modern tools. I'm using it to search for people who have very specific backgrounds that I would be interested to talk to. Thank you for building this.
thm 1 days ago [-]
Now that you've got some money in the bank, you should get a license for the serif on your website (font-family: RecklessTrial-Regular;).
saadatq 1 days ago [-]
This looks really great.
And also how “internal” business intelligence/operations tools should work. search first to find relevant artifacts - “top 10 customers in AMEA”, followed by agentic verification and enrichment.
Congrats on the launch!
willbryk 1 days ago [-]
Thanks! Let us know how you find it :)
mh- 1 days ago [-]
Congrats on the launch!
Can it perform searches that rely on the rendered (JS-executed) state of the website? If so, does it have access to the DOM?
Example use case: "The 10 most trafficked e-commerce sites that load Adobe Analytics tag(s)."
willbryk 1 days ago [-]
We render JS and then parse pages, but that process will definitely parse out Adobe Analytics tags unfortunately.
Noting this though!
BiraIgnacio 24 hours ago [-]
Love the idea, keep up the work and I think this can be really be something between a "standard web search engine" and WolframAlpha
tcbtcb 1 days ago [-]
This is so cool! What are the top use cases you’re seeing rn? The semantic heavy search is something most sourcing platforms fail consistently on, especially around people search
twostorytower 1 days ago [-]
Congrats on the launch! Given you were in YC S21, when AI was much more under the radar, did you recently pivot? I'm guessing it wasn't a 4 year road to launch.
willbryk 1 days ago [-]
Not a pivot - Websets is just a new product!
Mission of Exa has always been to build much better web search. The evolution has been:
- 2022: Consumer-facing embeddings search (back when we were known as Metaphor)
- 2023: Web search for AIs - once the AI ecosystem heated up, we made a business out of web search + crawling API. This is still our primary business.
- Now: Websets, a useful product built on top of our search tech
If you're curious, our company right now is fully devoted to:
1. Dramatically improving Websets quality
2. Building the best general search engine in the world
orliesaurus 1 days ago [-]
Nice! This feels like Clay(.com) interface (sales people love it) but for every piece of data that needs adjacent information.
Mockapapella 1 days ago [-]
Honestly I thought you guys had launched already (and didn't know you were a part of YC), been aware of you guys for years now it seems. Congrats on the launch! Hope the twitter issues aren't causing you guys too many problems.
Normally I'd send this as a DM or email, but I think it could be useful for others to learn about how to use your service/the limitations of it. A couple weeks ago I made a search for:
In early 2023, Andrej Karpathy said something like "large training runs are a good test of the overall health of the network." Something something resilience as well I think. I need you to find it.
Unfortunately it wasn't able to find it, but it was either in a tweet or a really long presentation, neither of which are good targets for search. It was around the same time that this (https://www.youtube.com/watch?v=c3b-JASoPi0) video was posted, like within a couple weeks before or after. How could I have improved my query? Does exa work over videos?
liam-hinzman 1 days ago [-]
I think I found it! Unfortunately we do not include tweets in our search index
> TLDR LLM training runs are significant stress-tests of an overall fault tolerance of a large computing system acting as a biological entity.
Holy shit I think that might be it! I have been looking for that tweet for like a year now. Thanks!
jppope 1 days ago [-]
I really love the concept here. Lots of utility. Going to play around with it tonight and see if it can work for some usecases.
ing33k 17 hours ago [-]
Quick question : How does it compare to what Diffbot offers?
smolder 18 hours ago [-]
IMO, we should stop abusing personal data for profit. What does this bring to the table that doesn't advance the surveillance state? Does it help individuals without hurting them?
oofbaroomf 1 days ago [-]
How big do you think your index is compared to Google?
philipkglass 1 days ago [-]
A smaller index could actually be a benefit if it's missing all the "mailing list archives rehosted with more ads" sites that pollute my Google search results in recent years.
adi_lancey 9 hours ago [-]
looks great, nice work
justanotheratom 1 days ago [-]
can websets enrich a column with images?
willbryk 1 days ago [-]
There aren't currently any vision LLMs involved. But if you asked for image links, it'd probably find you something!
xena 1 days ago [-]
Do you respect robots.txt? How can I block your crawlers?
skylerwiernik 1 days ago [-]
Do you find this to be worse than googlebot somehow?
alecdewitz 1 days ago [-]
Congrats guys!
willbryk 1 days ago [-]
ty!
mschrage 24 hours ago [-]
Congrats on the launch!
moralestapia 1 days ago [-]
I wish you all the best, exa is pretty much Perplexity done right.
So nice!
benatkin 23 hours ago [-]
This sounds promising.
I tried "Full-stack web frameworks started 2023 or later" and the first result was FastHTML which is a very good answer. I was hoping for Dioxus but that I think is actually a little bit older. Of course Google's results, including Gemini, were useless. MeteorJS was not started in 2023 or later. LOL.
mkrishnan 1 days ago [-]
Congratulations! great idea,
some issues I noticed, I searched "lucid air touring models available for sale Under 20,000 miles" and tried to add column "sale price", but did get the price details, same for other cars as well
willbryk 1 days ago [-]
Hm! I'll try this out. Sometimes info like price are hard to parse out because the data may be on ecommerce-style websites that have many crawling protections
artembugara 1 days ago [-]
Will, Jeff, I am a BIG Exa fan. Congrats on finally doing your HN Launch.
I think NewsCatcher (my YC startup) and Exa aren’t direct competitors but we definitely share the same insight — SERP is not the right way to let LLM interact with web. Because it’s literally optimized for humans who can open 10 pages at most.
What we found is that LLMs can sift through 10k+ web pages if you pre-extract all the signals out of it.
But we took a bit of a different angle. Even though we have over 1.5 billion of news stories only in our index we don’t have a solution to sift through as your Websets do (saw your impressive GPU cluster :))
So what we do instead is we do bespoke pipelines for our customers (who are mostly large enterprise/F1000). So we fine-tune LLMs on specific information extraction with very high accuracy.
Our insight: for many enterprises the solution should be either a perfect fit or nothing. And that’s where they’re ok to pay 10-100x for the last mile effort.
P.S. Will, loved your comment on a podcast where you said Exa can be used to find a dating partner.
willbryk 1 days ago [-]
Thanks Artem! That makes sense to specialize for the biggest customers. Yes, a lot of problems in the world would be improved by better search, including dating.
I hate name collisions and this sort of thing only reinforces my ire. It doesn't help that I'm already team anti-AI, but it would annoy me regardless of the tech. Why don't people even bother to look and be original? (I feel like I'm going "against the ideals of the site" when I get angry like this, but come on, people, it's a simple google search. If you can't be arsed to do that, why should we even give you money - would be my FIRST question as an investor, but I'm just an idiot not a world famous inventor of a non-released LISP and checks list - uh. Yahoo Storefront.
Still though, come on man, why people why. I remember when we had "domainsquatting" but I guess AI doesn't give a fuck about people's copyrights/trademarks anyway.
(Sorry to vent as a reply, but it was nice to see SOMEONE mention it at least, and had to give a hard agree on pointing it out).
wormius 20 hours ago [-]
(ugh, while my point stands I guess technically it's a dead project, so I got egg on my face, gloat everyone gloat at the pathetic clown :P)
rushingcreek 1 days ago [-]
Congrats on the launch!
willbryk 1 days ago [-]
Thanks!
koakuma-chan 1 days ago [-]
> We’d love to hear your feedback!
I gave it a try and my first search got one match, 14 misses, and all other results are "Verifying..." but it seems stuck (it's been minutes). I can see why you cut your demo (please don't try to hide that it's so slow, especially since you seem to imply to be a Google competitor ("Google has gotten worse over time"), while your product is incomparably slower than Google; it's more like deep research).
85392_school 1 days ago [-]
> while your product is incomparably slower than Google
Exa was originally just a search engine. They try to hide it these days to promote Websets, but you can still use it at https://exa.ai/search.
liam-hinzman 1 days ago [-]
We hid it because the UI wasn’t maintained.
I joined recently, and live-streamed myself designing and shipping the new search frontend in 5h32m
Added a link to it on our homepage, thanks for pointing that out!
85392_school 1 days ago [-]
Oh, good to hear. I've been waiting for its return.
dang 1 days ago [-]
Edit: the parent has edited their comment several times, which is fine since I invited them to, but the edits obscure the original comment, which was "I gave it a try and my first search got one match, 14 misses, and all other results are "Verifying..." but it seems stuck (it's been minutes). I can see why you cut your demo.".
Everyone is familiar with how often software launches run into glitches, and there's no need to be uncharitable.
(If you didn't mean it as a swipe and I just misread you, feel free to edit your comment and I'll delete this when I'm back online.)
koakuma-chan 1 days ago [-]
What is a swipe?
dang 1 days ago [-]
A bit of gratuitous nastiness.
Edit: adding "no offense" doesn't change this.
koakuma-chan 1 days ago [-]
Is it better now?
dang 1 days ago [-]
It's marginally better because it explains what you mean, and that at least eliminates other nasty interpretations.
However, I don't think it's fair for you to assume they're "trying to hide that it's so slow". There's no need to impute bad motives to people, and you don't have nearly enough information to justify such a claim.
What's wrong with simply reporting the problem that you're experiencing with the software? That would make your comment helpful, with no trace of a putdown.
This seems a lot better than those quizzes or quotes that ask a bunch of questions first and then ask for your email at the end -- or worse -- a payment.
[0] https://lemmy.ml/post/13850772
But if it filtered it first to "start with the letter R", it would only have to look at perhaps 5% of the results it's trying to verify!
So it's doing needless verification of results that will be thrown out by another filter that should've been applied first!
We use an agentic search planner that adapts its search strategy as matches are found, but it could be smarter with substrings.
https://websets.exa.ai/cmad36arq009fl30i4dvkc7wn
Since you were part of YC 21, could you share a bit about your pivots/product iterations you went through over the last 4 years?
- 2022: Consumer-facing embeddings search (back when we were known as Metaphor)
- 2023: Web search for AIs - once the AI ecosystem heated up, we made a business out of web search + crawling API. This is still our primary business.
- Now: Websets, a useful product built on top of our search tech
If you're curious, our company right now is fully devoted to:
1. Dramatically improving Websets quality
2. Building the best general search engine in the world
The incentive structure behind paying by the search has diminishing returns, as I see it. You need the results to be of a high enough quality to drive the user to want to run another search with you. Beyond that point, though, in the absence of a direct competitor, where is the incentive for you to continue improving search result quality? M
Congrats on your launch. With the natural way this lends itself to comparison shopping this is an amazing tool for people trying to find "the best X for me" whether that's a TV, a school, etc. So much content that you find on Google when trying to answer that type of query, is designed to trick, bamboozle, and to hide the facts that you might use to answer this question (but most of all to get you to click affiliate links).
Anyway, the model used doesn't seem to be very good, it did not understand a basic "OR" criteria. I asked for a list of companies with an office in Toronto that are involved in hardware development such as custom silicon, robotics, satellites or drones. It completely misunderstood the "or" part (and the "such as" part). E.g. I see many robotics companies marked as a "Miss" because they only do robotics but not any of the other things on my list.
Overall though I love the idea, I would pay for your service (on a pay-as-you-go per-query basis) if the underlying model was smart enough for me to actually rely on the results.
The UI showed literally no change. So I checked and the console shows:
``` Try: 14 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 15 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 16 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 17 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 18 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 19 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Try: 20 Not Found 681-7df1b139fa2dc9f0.js:14:3379 Gave up after 10 seconds. 681-7df1b139fa2dc9f0.js:14:3379 filteredSuggestions Array(3) [ {…}, {…}, {…} ] 681-7df1b139fa2dc9f0.js:14:3379 ```
Also your table doesn't fit in the viewport so I can't see the results.
Firefox Ubuntu.
The initial search/experience is good but then I got dumped here [0] and it's not clear to me if things are still happening or if it broke (it's been at least 5 min with no UI updates.
I can't see the full results yet but this is very interesting and a task I ask OpenAI's Deep Research to attempt periodically. It makes a good show of doing the work but the results are not great IMHO (for asking it generate lists/tables of data like this). I can see this tool being incredibly useful for lead generation (how I am testing it out).
[0] https://cs.joshstrange.com/dySqK1mb
“List of food festivals on the east coast specializing in small dishes or encourages sampling from multiple vendors. features more than 20 vendors”
https://websets.exa.ai/cmad3sonh001zhx0i1h7t692f
btw I like how you host screenshots on your personal website
Types of searches Websets doesn't currently do well at: - products (e.g., ecommerce sites) - Content that requires authentication/permissions to access - non-English content
Some of the above are on our roadmap, and let us know if there's some type of data you'd like us to support!
Since you called it out, I gave it a whirl:
https://websets.exa.ai/api/trpc/getPreview?batch=1&input=%7B...
and it did nothing to the page at all, choosing to still show the "Full-stack engineers in SF that are great at design, and have worked at an AI startup" example table
I'm open to the fact that "I'm holding it wrong" or whatever, but the response payload included things that are clearly not GitHub Repositories
and its .text contains no mention of ReBAClater on it came closer
but, of course, no ReBAC in its .text eitherIt seems it is about 30/70 on finding the things I asked for, so I don't mean to imply it's worthless, but it is yet another example of "turns out, AI does not solve all problems"
---
I make a habit out of having the dev-tools open when interacting with things where the comments have explicitly called out "we were down and we don't check our response.statusCode" and that's the only reason I am able to offer you any feedback whatsoever
> github repos that are implementations of ReBAC authorization servers
https://websets.exa.ai/cmadcu6st004fmg0iofbytsfh
Anyway, two things which may interest you:
- please don't reimplement <table> in whatever whizbang JS framework-o-the-day; your results have the columns fixed at 180px, truncating all descriptions and URLs. Maybe it's an upsell for all I know
- your cURL in the Get Code is demonstrably wrong and I have no idea how it escaped a basic straight-face test; -d '{\"foo\":1}' literally sends brace backslash doublequote
And then, just like my first experience, the matches do not all return repos matching the query criteria. My colleague at work has to tell Cursor "try harder" so maybe you can benefit from including that in your prompt, too
"Give me a list of free imagery service endpoints I can use in a maplibre style sheet. Include information such as name, description, service endpoint, service type, extent (global/regional)."
I did one search with 4 criteria, then added the two free columns, and at this point i had spent 750 of my 1000 free credits. The next tier being $49 with only 8000 credits, which means only 10 searches a month.
The search I did was super useful, and I would love to use the product, and reccomend it to my coworkers. But the pricing is what stops me.
Best of luck. I'll probably use it once a month if I can remember :)
1. I love the idea.
2. The UI needs to work on smaller screens (e.g., tablets). The current layout is VERY cramped.
3. Its ability to search for businesses in a given geography is poor. I asked it to search for businesses in a city and it was giving me results that were obviously incorrect from halfway across the country.
4. For a homepage URL for a business, it once gave me a parked domain name at GoDaddy's "domain for sale" page. That seemed like a blunder. Is that because it's pulling in WHOIS information and it connected some addresses?
5. Performance is quite poor. Perhaps that's because you're getting "Hackernews'd" with a surge of people consuming all your capacity.
"robotics servo motors with two-directional control for under $100"
1. https://mjbots.com/ - their motor are $1369. FAIL.
2. https://www.pololu.com/ - this is huge store, but they do have some motors like that. Pass, but wish it linked to specific page and not top top-level one.
3. dh-robotics.com - no prices, but some products on open market are few K$. Likely fail as well.
4. https://www.robotarticulation.com/ - The product is not for sale (early beta), and it looks likely much more than $1K. FAIL.
5. https://www.lynxmotion.com/ - another huge store, most two-directional motors are expensive but there are some under $100... Pass, but wish it linked to specific page and not top top-level one.
> So the search should work best for people, companies, papers, high quality written content.
> Types of searches Websets doesn't currently do well at: products, content that requires authentication/permissions to access, and non-English content
My experience around such started with pwm hobby servos, includes dynamixels, and I've worked with larger stuff using harmonic drive gearboxes. Can't recall encountering a "servo" that is one-directional.
Dynamixels are two-way, and they are an exact thing I'd wanted to see in search results.
I searched for "alternatives to jq with a functional API" and one of the criteria it came up with was "Provides technical details or comparisons relevant to the alternatives" but the table only listed the repo's url and description. And the description was truncated with ellipses with no way for me to resize the columns. Also, it missed the opportunity to tell me that some shells can replicate jq's functionality. Finally, it would have to be faster to be a daily driver. At this speed, it is something I would reserve for backup, for when the workhorse fails. Which means I would not want to pay $49/month.
Hope that helps. Interesting idea.
Yeah we'd love to make the product as accessible and cheap as possible, but as of state of AI costs of 2025, it's a very expensive product to run and so we have it login gated. If you're willing to log in though, you'll find a lot of the features that you're mentioning :)
Searches that give tabular results can be cheap if you already have structured datasets (extracted from crawled data), so LLM can simply convert the user's natural language query to SQL query (or SQL-like query) which can be cost-efficiently executed - say, with DuckDB. This approach can also give more correct results - as values in these structured datasets can be validated in the background, not as an individual 'deep research' task.
I understand that this is another kind of search service, however, this can be a way to offer free/cheap searches for users who don't need expensive individual research tasks.
If you sign in each result will be graded by an LLM, supporting references will be found, you can get agents to add arbitrary data to each result, and the table UI is much better.
Understand if you don’t want to sign up, I’d just look at the examples linked in the OP in that case
Our experimental use case is enabling quick and dirty integration of web-based docs into an employee service agentic chatbot - lots of the questions are around “how do I max out my 401k”, which connects to internal information, but some are more like “how do I link a calendar to calendly”.
The one thing I’d love to have in the search product is a cruft cleaner for the results of web queries. Where you have cached the data presumably this wouldn’t add much overhead. Reduces what you have to feed to the LLM downstream and might improve the embeddings performance.
If something else though, curious.
Do you have any built-in features that address these issues?
When I checked this a year or so ago, I might have gotten the impression that it was cheaper. Now, it costs the same as what Perplexity charges for search-grounded queries, which is the same as Google charges for Gemini queries with search.
So basically, one player sets a price, and everyone is anchored on that as the pricing for the entire category? I'm just genuinely interested in why every offering in this space is priced like this.
It seems a bit misaligned with how pure LLM queries are priced.
I have a product that would benefit from search grounding, but this pricing wouldn't work with my volume of queries.
Perplexity charges the same on their lowest tier model, and three times as much for their more expensive models.
Gemini charges $35 per 1000 requests.
https://exa.ai/pricing
https://docs.perplexity.ai/guides/pricing
https://ai.google.dev/gemini-api/docs/pricing
Most of the time I want to find some vendors / companies and Deep Research does that but also responds with a wall of unnecessary text where I just want the table
How do you dedupe entities, like companies and people? I've noticed ChatGPT tends to provide "great" results when asking about different entities, but in reality it just groups similar sounding entities together in its answer.
For example, I asked ChatGPT about a well known startup. It gave me a confident answer about how much they raised, their current status, etc. When looking at the 3 sources they cited though, it was actually 3 different companies that all had similar sounding names that it just grouped together to form its answer.
Basically, how do I trust the output of your system?
https://imgur.com/dsGK5dS
My question is how you can confirm the entity you're referencing in each source is actually the entity you're looking for?
An example I ran into recently is Vast (https://www.vastspace.com/). There are a number of other notable startups named Vast (https://vast.ai/, https://www.vastdata.com/).
I understand Clay, which your Websets product is clearly inspired by, does a fair amount of matching based on domain name or LinkedIn url.
If Websets is doing fuzzy or naive matching, that's okay. I'm just trying to understand the limitations and potential uses cases of your current system.
As far as I know ChatGPT’s search is primarily a wrapper around another company’s search engine, which is why it often feels like it’s just summarizing a page of search results and sometimes hallucinates badly.
Looking forward to trying out the product more when I have a moment.
I often have projects where the enrichments feature alone would be super useful: I would provide, say, a list of company names, and then use enrichments to qualify them based on location, age, founder experience etc etc.
It disrupts anyone who merely does one thing this does. E.g. contact building app can be done by this. I imagine many "wrapper" apps can be built on this.
I am serious though. It felt like using databricks a little bit, obviously without all the functionality but that will come.
I'm bullish! Modulo competition. Someone who does this makes their billion.
Evals on this would be great to benchmark the gap between using websets versus a generic web search tool. Otherwise to a developer, it's just marketing.
Would love to see how its setup: the questions you linked to a ChatGPT chat, but the system prompt, tool calls would all be useful.
The search engine was impressive enough but I think this implementation was a nice cherry on top.
Very helpful for candidates searching but still a bit slow for every day use Like - “what are the events happening today in my city”
But I believe you guys will crack it soon and make it better
And also how “internal” business intelligence/operations tools should work. search first to find relevant artifacts - “top 10 customers in AMEA”, followed by agentic verification and enrichment.
Congrats on the launch!
Can it perform searches that rely on the rendered (JS-executed) state of the website? If so, does it have access to the DOM?
Example use case: "The 10 most trafficked e-commerce sites that load Adobe Analytics tag(s)."
Noting this though!
Mission of Exa has always been to build much better web search. The evolution has been:
- 2022: Consumer-facing embeddings search (back when we were known as Metaphor)
- 2023: Web search for AIs - once the AI ecosystem heated up, we made a business out of web search + crawling API. This is still our primary business.
- Now: Websets, a useful product built on top of our search tech
If you're curious, our company right now is fully devoted to:
1. Dramatically improving Websets quality
2. Building the best general search engine in the world
Normally I'd send this as a DM or email, but I think it could be useful for others to learn about how to use your service/the limitations of it. A couple weeks ago I made a search for:
Unfortunately it wasn't able to find it, but it was either in a tweet or a really long presentation, neither of which are good targets for search. It was around the same time that this (https://www.youtube.com/watch?v=c3b-JASoPi0) video was posted, like within a couple weeks before or after. How could I have improved my query? Does exa work over videos?> TLDR LLM training runs are significant stress-tests of an overall fault tolerance of a large computing system acting as a biological entity.
https://x.com/karpathy/status/1765424847705047247
I tried "Full-stack web frameworks started 2023 or later" and the first result was FastHTML which is a very good answer. I was hoping for Dioxus but that I think is actually a little bit older. Of course Google's results, including Gemini, were useless. MeteorJS was not started in 2023 or later. LOL.
some issues I noticed, I searched "lucid air touring models available for sale Under 20,000 miles" and tried to add column "sale price", but did get the price details, same for other cars as well
I think NewsCatcher (my YC startup) and Exa aren’t direct competitors but we definitely share the same insight — SERP is not the right way to let LLM interact with web. Because it’s literally optimized for humans who can open 10 pages at most.
What we found is that LLMs can sift through 10k+ web pages if you pre-extract all the signals out of it.
But we took a bit of a different angle. Even though we have over 1.5 billion of news stories only in our index we don’t have a solution to sift through as your Websets do (saw your impressive GPU cluster :))
So what we do instead is we do bespoke pipelines for our customers (who are mostly large enterprise/F1000). So we fine-tune LLMs on specific information extraction with very high accuracy.
Our insight: for many enterprises the solution should be either a perfect fit or nothing. And that’s where they’re ok to pay 10-100x for the last mile effort.
P.S. Will, loved your comment on a podcast where you said Exa can be used to find a dating partner.
Still though, come on man, why people why. I remember when we had "domainsquatting" but I guess AI doesn't give a fuck about people's copyrights/trademarks anyway.
(Sorry to vent as a reply, but it was nice to see SOMEONE mention it at least, and had to give a hard agree on pointing it out).
I gave it a try and my first search got one match, 14 misses, and all other results are "Verifying..." but it seems stuck (it's been minutes). I can see why you cut your demo (please don't try to hide that it's so slow, especially since you seem to imply to be a Google competitor ("Google has gotten worse over time"), while your product is incomparably slower than Google; it's more like deep research).
Exa was originally just a search engine. They try to hide it these days to promote Websets, but you can still use it at https://exa.ai/search.
I joined recently, and live-streamed myself designing and shipping the new search frontend in 5h32m
https://x.com/LiamHinzman/status/1911244983291514941
Added a link to it on our homepage, thanks for pointing that out!
---
> I can see why you cut your demo.
Can you please edit out swipes, as the site guidelines request (https://news.ycombinator.com/newsguidelines.html)? Your comment would be just fine without that bit.
Everyone is familiar with how often software launches run into glitches, and there's no need to be uncharitable.
(If you didn't mean it as a swipe and I just misread you, feel free to edit your comment and I'll delete this when I'm back online.)
Edit: adding "no offense" doesn't change this.
However, I don't think it's fair for you to assume they're "trying to hide that it's so slow". There's no need to impute bad motives to people, and you don't have nearly enough information to justify such a claim.
What's wrong with simply reporting the problem that you're experiencing with the software? That would make your comment helpful, with no trace of a putdown.