Google MUM and Multi-Modal Search: Agency Site Impact
By Rome Thorndike
What MUM Does, Without the Marketing
MUM (Multitask Unified Model) is Google's multi-modal search infrastructure. It can read text, parse images, transcribe audio, and reason across all three at once. It powers Lens, Circle to Search, video search, and the multi-modal expansion of AI Overviews. The practical effect for site owners: Google now ranks pages based on what they show, not just what they say.
This sounds abstract until you see it in action. Search "running shoes for high arches with wide toe box" on mobile and Google's results include shoe images annotated with the relevant features. Search "what kind of plant is this" with a photo and Google pulls answers from pages that have clearly labeled plant images. Search "best meeting room layouts for 12 people" and Google returns image-heavy results from agency sites that captioned and structured their case studies properly.
Agency sites (web design, branding, marketing, architecture, interior design, consulting) sit right in the path of these changes because the work is visual and the buying intent is exploratory. Visitors want to see, then read, then contact. Sites that surface their visual work to MUM rank. Sites that hide their work behind unlabeled JavaScript carousels do not.
Image SEO Stopped Being Optional
For years, image SEO was a checklist item: add alt text, compress files, use descriptive filenames. Done. MUM raised the stakes. Google's image understanding now reads the surrounding context, the page text near the image, the schema annotations, and the visual content itself. Pages that align all of these signals rank in image search and increasingly in main search.
The agency-specific fixes we ship most often: replace stock photography with original work (Google's image embedding model recognizes stock images and discounts pages that rely on them), add captions under every portfolio image (not just alt text, visible captions), and use ImageObject schema with caption and creator fields filled in.
The single biggest win for agency sites: portfolio pages with one image per project page (not a gallery), 200+ words of context per image, and structured data linking the image to the project and the agency. We have seen agencies double their organic traffic from this change alone. The deeper playbook lives in our schema markup guide.
Video Transcripts and Captions
MUM transcribes uploaded video and matches the transcript to user queries. If your agency posts client testimonials, case study videos, or service explainers, the transcript is searchable content. A 3-minute testimonial without a transcript is invisible to Google. The same testimonial with a transcript embedded on the page can rank for the specific phrases the client used.
What we ship: every video on an agency site gets a written transcript in the page source, marked up with VideoObject schema, with timestamp cues if the video is longer than 2 minutes. YouTube embeds inherit YouTube's auto-transcript, which helps, but the transcript on your own page builds your domain's topical authority. Inline transcripts also help accessibility (Section 508, WCAG 2.1 compliance) which agencies serving government clients need anyway.
Multi-Format Pages
MUM treats a page as a bundle of signals: text, images, video, structured data, and the relationships between them. Pages that combine multiple formats with clear linking between them outrank text-only pages on visual-intent queries.
For an agency case study, the multi-format pattern that ranks: opening photo of the finished work, 300 words of project context, mid-page video walkthrough, results data in a clear table, additional photo gallery, client quote with photo, and a closing photo of the team. Each format reinforces the others. Each gets indexed differently. The page surfaces in image search, video search, AI Overviews, and traditional search results.
Compare that to the typical agency case study: hero image, 800 words of text, embedded Vimeo at the bottom that 80% of visitors never reach. That layout was fine in 2019. In 2026, it loses to multi-format competitors on every visual-intent query.
The Static Site Advantage Here Is Real
Multi-format pages are slow to load on WordPress. Every image, every video, every embed adds plugin overhead and database queries. Agency WordPress sites with portfolio plugins (Essential Grid, Portfolio Grid, JetEngine) routinely score in the 40s on mobile PageSpeed because the plugins ship 200KB+ of JavaScript per page to lazy-load images that should have been static.
Static HTML sites handle multi-format content with no overhead. Images are direct CDN fetches. Videos use native HTML5 or a single embed. Schema is in the page source. Layouts are pure CSS grid. The same agency portfolio that scores 48 on WordPress scores 96 as a static build. The MUM benefits compound: faster crawl, better extraction, more index, higher visibility. Our writeup on agencies moving off Webflow covers the migration pattern.
What Agency Sites Get Wrong
The three most common mistakes we see when auditing agency sites for MUM readiness:
One: portfolio images loaded via JavaScript carousels that Google cannot reach. The first image in a slick Slick.js carousel is in the DOM; the rest are loaded on click. Google indexes one image per project instead of ten. Switch to a CSS-only grid layout. Lose the carousel.
Two: case studies on a single /case-studies/ page with internal anchor links instead of individual pages. A 5,000-word omnibus case study page cannot be the canonical for ten different visual queries. Break each case study into its own URL.
Three: stock photography mixed into client work without distinction. Google's image embeddings recognize stock images. When your case study uses iStock photos alongside client photos, Google discounts the whole page's visual signal. Use original photography for client work, even if the photos are not magazine-quality. Authentic beats polished here.
Where to Start
Audit your portfolio pages first. Count how many client images Google has indexed via Search Console's Image Search performance report. If the count is under one image per project, your portfolio is structurally underexposed. The fix is usually a layout rebuild plus image-level schema, not a content rewrite.
If you want a baseline reading, our free site audit reports image SEO health alongside PageSpeed and structured data. For agency-style service businesses we work with frequently, see SharpPages for professional services. For the broader image SEO playbook, our multi-location SEO guide shares the same per-asset structure principles.
Frequently Asked Questions
What is Google MUM in plain terms?
MUM is Google's multi-modal search infrastructure. It reads text, parses images, transcribes audio, and reasons across all three. It powers Lens, image search, video search, and parts of AI Overviews. The practical effect: pages with strong visual content and clean structure rank better on visual-intent queries.
Do I need to write transcripts for every video?
For videos hosted on your own domain, yes. The transcript builds topical authority and lets Google match the video to specific queries. For YouTube embeds, the YouTube transcript helps, but a transcript on your own page is still valuable for accessibility and domain-level signals.
How does MUM treat stock photography?
Google's image embeddings recognize stock images and discount pages that rely on them. Original photography, even at lower polish, outperforms stock for ranking purposes. For agency portfolios, this means using real client work photos rather than illustrative stock images.
Should I split my case studies into separate pages?
Yes. Each case study should be its own URL with its own schema, its own images, and its own canonical. Omnibus case study pages dilute the visual signal and prevent individual projects from ranking for project-specific queries.
Does this matter for sites that are not agencies?
Yes, but the impact is largest for visual-intent businesses: agencies, architects, interior designers, photographers, contractors, restaurants, retailers. Text-heavy sites (law, finance, B2B SaaS) see smaller MUM impact, but image SEO still helps for branded queries and AI Overview citations.
Ready to Rebuild Your Portfolio for MUM?
Static HTML, image schema, video transcripts, multi-format pages. Built to surface your work.