We are in control of how Google crawls our sites. This means we can mess it up.
Index bloat- Site has a large number of low-value pages indexed
If you attempt to index everything:
- Google might not crawl them
- Google might not index them
- Valuable pages may not be crawled and may not rank
Q: What’s the high-level KPI we’re being measured against? A: Usually organic leads or revenue.
Divide page types into buckets. Find which buckets produce those KPIs.
Consider not indexing buckets that do not contribute significantly to KPIS, particularly if they represent a large portion of the indexed pages.
Divide into further buckets where that makes sense. For example, if you are indexing search results that contains filters, you might have many different permutations of those results indexed. Divide again by filter to see which filters should be indexed.
Before removing, answer these questions:
- Do they take up a lot of crawl budget?
- Do they rank for important terms?
- Do they have unique content?
- Do they have high quality backlinks?
Remember:
- Make sure sitemaps automatically remove no-index pages
- Pages won’t deindex overnight. Can take months or more, depending on size.
- Block from robots.txt after 1-3 months. If you do this too soon, the crawler won’t know they are de-indexed.
Set priority for changes based on SEO impact vs. tech effort