The following is a live blog from State Of Search. Please pardon any typos or grammar issues as we really didn’t have time for much proofing between sessions.

Alan Audit … errr … Bleiweiss is on the main stage after lunch chatting advanced ranking factors…

He starts by discussing arguably one of the most important aspects of search people don’t understand and that’s that we’re not dealing with “an algorithm” we are dealing with multiple algorithms and they don’t always play well together.

He talks about a client who had a million products but 11 million pages in Google’s index but a site: only produced 600k.  This disparity tells you Google has a problem understanding the site.  They client had 6 million URLs int he sitemap but again … only 1 million products.  Looking at core data is critical.

Duplicate content makes it difficult for Google to understand which is most important.  Also key is perceived duplicates.  That is, cases where the majority of content on the page is common (footers, widgets, etc.) so the rest may be perceived as thin.

Another issue is conflicting indexation signals …robots.txt, x-robots, canonical, meta robots, etc.  Consistency among these signals is critical.

Tools he uses are:

Don’t include a page in the sitemap and then noindex it or block it in robots.  What are the engines to do with that?

If you don’t want a page indexed:

  • Eliminate canonical
  • Don’t set URL parameters to “let Googlebot decide”
  • Eliminate OpenGraph tags
  • Don’t include it in the sitemap

If you do leave canonicals understand that Google may override this tag.  It’s not a directive, it’s a hint.

Two primary ways of dealing with indexation we don’t want to happen are: 301 or blocking.  If they have no value then a 404 is fine but if they have any value at all a 301 is key.

And if you want to remove it use the 410 not the 404.  If you 404 they don’t know if you were intending it (might be a mistake) whereas a 410 they deal with it faster.  410 basically means “yeah, we killed it on purpose”

When you 301, de-index or kill pages …

  • make sure internal links are updated
  • Make sure sitemap files are updates
  • Do inbound link outreach to reclaim links

Tip: Add &noidx=1 to the URL string and set the robots.txt to block all those instances.  This is a really good idea.

IA Flat Architecture

Is the idea that the closer to the root a page is the more important it is.  This is not nor was never valid according to Alan.  In reality it maters more how the pages and sections are built into the internal link hierarchy of your site.

Nested Categories

Helps topics group identification.  Example URL would be

Muliple copies of each product page

If a product exists in 5 categories you cannot have a unique URL for each.

Infinite Nesting

At a certain point the nesting gets too deep.


  • Avoid flat architecture
  • Use consistent nesting model
  • Use only duplication in gentle ways

Takeaway: use data to make decisions.  If you need to remove or change a page, make sure you’re making the right decision on how to deal with that.

Page Speed – not going to talk a lot about it but slow sites lose authority and trust.  Also, don’t trust insights alone, trust your visitors and real load time.

Really think about what you’re doing with speed.  When you’re told to minify and you have 30 scripts loading and some are over a meg you should really be thinking about ways to shrink or eliminate those not minify (though that should be done later).

When all these things are addressed, when here is consistency in quality, sites perform well.

SlideDeck – his full slide deck is available at

Tip in Q&A – gstatic slows sites.  Do not pull fonts from Google.


  1. Alan Bleiweiss says:


    Thank you for live blogging this! my hope is this info helps people who want to get to that next level and where they didn’t know how to get there until now.

  2. thanks .. good summary. reads like Alan..

  3. Thanks for the great presentation Alan and glad you enjoyed it Steve. 🙂