Invisible Search: Optimizing for AI Traffic in the GenAI Era
How Generative AI is Killing Clicks, Reinventing Funnels, and Reshaping Online Discovery
The other day, I realized I could go days without opening a web browser or even Google. Generative AI tools like Luzia, ChatGPT, or Perplexity have completely transformed how I learn, shop, and discover content online. I wrote first about this idea in 2023, and now, with agents, MCPs1, and other exciting AI developments, it's becoming a mainstream reality, so much so that the first lawsuits are already flying:
In February 2025, we witnessed a watershed moment in the AI-search relationship when education platform Chegg filed a landmark lawsuit against Google. The company alleged that Google's AI Overviews had transformed the search giant from a 'search engine' into an 'answer engine,' keeping users on Google's platform rather than directing them to original sources. Chegg reported their non-subscriber traffic plummeted by 49% in January 2025, compared to just an 8% decline in mid-2024. This legal battle highlights the existential threat some businesses face as AI reshapes discovery patterns, with Chegg even exploring going private as a direct result of this traffic diversion. The case represents the first major antitrust lawsuit specifically targeting AI-generated summaries and could set precedents for how content creators and platforms negotiate the new invisible search landscape.
I have been asked several times to talk about SEO in the genAI era, and this post is my attempt to compile my thoughts on how online businesses should adapt to a new reality where the traditional commercial funnel is being reshaped—from browsing tens of pages to receiving customized answers. Additionally, I'll discuss my thesis that synthetic data—generated by other models—will increasingly dominate AI training, traditional SEO positioning and brand discovery could temporarily become more challenging. Warning, this post will be less technical than what I normally write.
Hard Data On Generative AI as a New Channel
While generative AI-driven traffic is still small compared to traditional channels, it's rapidly becoming a significant force in online shopping.
Adobe Analytics reports that between July 2024 and February 2025, traffic from generative AI sources to U.S. retail websites surged by an astounding 1,200%. This influx is reshaping the entire consumer journey, with 55% of users leveraging AI for research, 47% seeking product recommendations, and 43% hunting for deals. Although the growth might seem outsized, it aligns with a rapidly accelerating trend—traffic from generative sources has doubled every two months since September 2024, with nearly 40% of consumers already adopting generative tools. One notable caveat is a 9% lower conversion rate, likely driven by selection bias and increased friction during the purchase process.

Interestingly, as we saw when we experimented with these ideas in Luzia, these shoppers are highly engaged (more time spent, lower bounce rates), which is for sure partly a result of the novelty effect, but I would argue that there are also many advantages to this new experience.
The Good, The Bad, and The Invisible Search
AI search is convenient, personalized, and educational—it saves users from endless tab-hopping. Remember that colleague with so many Chrome tabs you could barely see the logos (ehem, Javi)? GenAI kills that.
We, users, no longer explore—we receive direct answers, bypassing multiple discovery points. I call this "invisible search" (I probably didn't coin the term but couldn't find an original reference). It's an experience where relevant products and responses come straight to you.
The impact of this 'invisible search' paradigm is already measurable. Recent studies reveal that pages featured in Google's AI Overviews can experience traffic spikes up to 3.6x their normal clicks, creating new winners in the digital landscape. Conversely, high-ranking pages (positions 1-3) excluded from these AI Overviews suffer dramatically, seeing up to 50% fewer clicks compared to searches without AI summaries. This represents a fundamental shift in the SEO equation: ranking #1 organically is no longer enough if you're not selected for the AI Overview. The effect varies by intent as well—informational searches see traffic diverted from top positions but increased for positions 3-10, while transactional searches benefit featured pages regardless of their original position. This data confirms we're witnessing not just an evolution but a revolution in how visibility translates to traffic.
Yet, along with these advantages, users might notice a few trade-offs: information isn't always the most current, new content can be slower to surface, and occasional AI errors or hallucinations do occur. Nothing we can’t fix, but something to keep in mind.
This shift means businesses must rethink their strategies. While you might experience fewer direct website visits, adapting effectively ensures your overall online presence and sales remain robust.
How to Stay Relevant: Optimizing Across the AI Lifecycle
I find it useful to structure thinking around the different stages of the AI lifecycle, considering what we can do at each stage to stay relevant. A helpful analogy is imagining your company’s optimization through the lens of not an old-fashioned mechanistic algorithm, but rather a knowledgeable product or industry expert. Think of Google’s PageRank as the old algorithm, while Luzia represents the modern expert. When Luzia recommends a product or explains a topic, she incorporates more nuance and context into the decision.
This lifecycle has three main phases: pre-training, fine-tuning, and inference. Pre-training is when the AI model is initially trained on vast amounts of data, building its foundational knowledge—think of this as stocking the shelves of a library. Fine-tuning happens next and is about refining and specializing the model's knowledge based on additional, targeted training—similar to organizing that library into clearly marked, trusted sections based on reputation and accuracy. Finally, inference is when users directly interact with the AI model. At this stage, the model uses its existing knowledge -all the way up to its cuttoff date2- to provide real-time answers, but it can also incorporate additional, current information from external sources within a "context ”. The context window allows the AI model to temporarily include recent or dynamic information, such as current product prices, real-time availability, or breaking news. Practically, this means the model can enhance its responses by accessing scraped web data provided at the moment of our question. This technique makes the cutoff date less relevant, and what’s more important for the topic here is what makes it possible to use genAI for commercial purposes.
How Online Search Works in GenAI: When the model identifies it requires up-to-date information, it translates the user's query into one or more online searches—imagine it performing a quick Google search. The model then scrapes relevant websites, extracting timely information. Because it's impractical to share every detail, only the most pertinent parts are included in the model's context window (normally through a vector search). Finally, combining this fresh context with its existing knowledge, the model crafts an informed and accurate response. This process is fundamental in platforms like Google, Perplexity, OpenAI and Luzia.
Pre-Training (Get Noticed)
Pre-training is the phase where the AI model acquires its foundational knowledge from almost the entirety of the internet data. Essentially, it's the model's first impression of the world—and first impressions matter. What the model learns about your brand is fundamental.
Why does this matter? Research [1][2] consistently shows AI models perform significantly better with concepts they've encountered frequently during their initial training phase. Models struggle with less common or "long-tail" topics absent from their training data. Retrieval methods applied in inference—more on this later—but it can't fully replicate the depth of understanding gained during initial learning. Simply put, you're at a disadvantage if you're not visible during pre-training.
Recommendations to stay relevant
High-Quality: Ensure your content is referenced by reputable, authoritative websites, enhancing visibility during web crawls like Common Crawl, a key source for AI training.
Forget outdated SEO tactics, including all those senseless SEO blog posts that all companies do. As we said before, AI is not just a mechanistic algorithm, it has some level of intelligence, and this intelligence is able to discern fluff from real value. Low-quality training data—all those crappy internet SEO articles—is very often removed from the training dataset because it impacts the end model performance, and increases training cost without adding any value [3].
Focus instead on genuinely valuable content like in-depth guides that comprehensively address topics from multiple angles, original research offering unique data or insights that position your brand as a trusted authority, detailed case studies providing real-world examples of your ideas in action, and clear visual content—such as graphics, videos, and interactive elements—that significantly enhance recognition and encoding by AI models.
Proper Indexing: Confirm your website is fully indexed and accessible. Tools like Google Search Console help ensure visibility to crawlers.
Clear Content Timestamps: Dates help AI accurately contextualize your content, making it easier for models to assign reliability and relevance.
Fine-Tuning (Build Brand)
If pre-training is about being visible, fine-tuning is about being trusted.
This is the phase where models are refined using smaller, curated datasets—often with human feedback—to improve usefulness, tone, and safety. It’s less about scale and more about precision. Can the model handle nuance? Does it respond with expertise? Does it reflect helpfulness?
And because humans—and their preferences—are involved, how your brand is discussed online can have an indirect impact. Fine-tuning doesn’t target individual brands, but it does learn from examples of good answers, helpful explanations, and reputable sources. That’s where reputation becomes structure: the more your brand appears in high-signal, trusted content, the more likely it is to shape future outputs.
Unlike pre-training, which scrapes everything, fine-tuning is curated and intentional. The teams behind OpenAI, Perplexity, Luzia and others are selecting data that improves answer quality—often drawing on: high-quality Q&A pairs, product decision flows, topic-specific knowledge benchmarks, summaries of public sentiment, and internal corpora from reliable sources.
So, no—models aren’t hardcoding brand preferences. But models do learn patterns from how helpful answers are phrased, who’s referenced, and what tone is trusted.
Recommendations to stay relevant
Your goal isn’t to “get picked” in fine-tuning—but to be part of the answer patterns that fine-tuning reinforces. Here’s how:
Become a Referenced Authority: Be the kind of source smart people cite—research, tools, explainer content, etc.
Drive Positive and Consistent Sentiment: Ensure you show up reliably across trusted sources—app stores, Reddit, review sites.
Publish Structured, Domain-Specific Content: Particularly in high-trust categories like finance, health, and education.
Design for Human and Machine Readability: Clean formatting, semantic markup, and clarity matter—both for users and models.
TL;DR: Fine-tuning won’t make you famous, but it will reward brands that behave like experts. When someone asks “best [your product category],” your goal isn’t to be found—it’s to be expected. That happens when your brand becomes a common ingredient in high-quality answers.
Inference (Real-Time Relevance)
Inference is the stage where users directly interact with the AI model. At this point, the model leverages its existing knowledge (up to the cutoff date) and dynamically incorporates real-time data gathered via online searches. As explained earlier, when the AI identifies a need for current information—such as today's product pricing or breaking news—it translates the user's query into targeted online searches. It then selects relevant results and places them into its "context window," essentially a temporary memory space allowing the model to generate timely and accurate responses.
Here's how you can influence your relevance during inference:
1. Be Indexed and Accessible:
If your content isn't part of the model's existing knowledge or returned in search results, it effectively doesn't exist for the model. Ensuring your content is discoverable is crucial:
Structured Data: Clearly structure your content using schema.org metadata (products, FAQs, reviews). Structured data significantly boosts your visibility to AI-driven searches.
Technical Performance: Prioritize speed, reliability, and overall site performance. AI tends to skip slow-loading or unreliable sites.
2. Get Selected into the Context Window:
Once returned in search results, the model will scrape your website (ensure it is allowed!) and decide which content chunks—if any—are relevant. Typically, only the top results are included, similar to appearing on the first page of Google results. Fresh, relevant, and clear content is essential:
Semantic Clarity: Organize your content clearly around defined semantic clusters or topics, helping AI match your content precisely to user queries.
Content Freshness: Timestamp your content clearly and update it frequently. AI explicitly values recency, improving your chances of being selected.
Conciseness and Clarity: Avoid intrusive pop-ups, cluttered layouts, and excessive advertisements. AI favors easily extractable, direct answers.
3. Adapt Your Online Presence and Services for Machine Use:
I have an overdue task to research and write about MCP (Model Context Protocol), an emerging standard that allows LLMs to directly interact with online services. In the medium term, while indexing and context window selection remain relevant, direct MCP integrations will likely dominate the decision and purchase stages of user funnels. Imagine asking Luzia to buy Nike shoes for a marathon and Luzia completing the purchase seamlessly.
TL;DR. Optimizing for inference isn't just about visibility—it's about becoming the AI’s go-to source for timely answers, enabling agents to effectively utilize your site.
Embrace the new reality.
It is not only me, and all the other geeks that don’t visit Google every day. The way we discover and act online is changing fast—from browsing to getting things done, from explicit search, to invisible search. In this new reality, users don’t visit ten sites; they get one smart answer. The funnel collapses. But that doesn’t mean brands are powerless—far from it.
The main thesis in this post is brands can influence this set up. The earlier in the AI lifecycle you show up—pre-training, fine-tuning, inference—the more context models have about your brand. That context shapes how often you're surfaced and how you're represented. And while we’re still early in the game, now is the best moment to get involved. Why? Because the future may get trickier: as AI relies more on synthetic data—models trained on model outputs—it may become harder for your original content to stand out or influence meaningfully. That’s a working hypothesis, but one worth acting on.
In the era of intelligence as a service, quality wins. In a world where AI is the interface, models don’t just rank—they reason. They favor content that’s clear, valuable, and trustworthy. Old-school SEO hacks won’t cut it anymore. The models are smarter, and your content has to be too.
Last, the metrics we track need to evolve. Maybe it’s not just about traffic anymore. Maybe it’s how often you're cited by LLMs, how well agents can navigate your services, or how easily users can take action through AI. That’s the new game.
The web isn’t dying—it’s becoming intelligent.
The Model Context Protocol (MCP) is like a universal adapter for AI systems, allowing them to connect easily with various external data sources and tools, much like how a USB-C port lets different devices plug into a computer.
The cutoff date is the latest available date in the training dataset. Anything that happened in the world after that date is not incorporated in the model knowledge.





