homepage Welcome to WebmasterWorld Guest from 54.242.18.190
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Matt Cutts Interviewed by Eric Enge
phranque




msg:4097584
 8:43 pm on Mar 14, 2010 (gmt 0)

eric enge interviewed matt cutts on several interesting subjects in a must read:
[StoneTemple.com...]
matt talks about pagerank, crawl budget, duplicate content, 301 vs 302 redirects, rel=canonical tag, javascript, pdfs, video, kml, navigation, and more.
no discussion about page speed or validation, however.

 

tedster




msg:4097621
 9:40 pm on Mar 14, 2010 (gmt 0)

We have a dedicated thread for these sub-topics that Matt touched on:

301 Redirect Means "Some Loss of PageRank" [webmasterworld.com]

Does Page Rank Still Matter? [webmasterworld.com]

I also appreciated Matt's comments about the challenges of "faceted navigation" - different sort orders of the same basic data. He talks about using rel="canonical" and using the "ignore parameter" choices in Webmaster Tools. And he also mentions that some users can get lost with faceted navigation. I agree - it happens to me on Google SERPs when I forget which options I've clicked on!

g1smd




msg:4097965
 1:45 pm on Mar 15, 2010 (gmt 0)

I wanted to say something about faceted navigation, as it is a topic that I touched on a few years ago, but which is still widely misunderstood.

Imagine that I sell a left-handed rotating widget, catalogue number 24981.

Take for granted that it will be listed in the Gadgets category, the Widgets category, and the New Products listings.

www.example.com/gadgets
www.example.com/widgets
www.example.com/new-products

On a vast number of sites, that product will have the following URLs:

www.example.com/gadgets/24981-left-handed-widget
www.example.com/widgets/24981-left-handed-widget
www.example.com/new-products/24981-left-handed-widget

There is no need to do this. That's a duplicate content nightmare.

The content page needs a single URL:

www.example.com/24981-left-handed-widget

The URL does not need to record the category hierarchy path the user took to get to that page.

On the content page, breadcrumb navigation can show the 'route'
Home > Gadgets > Left-Handed > Product 24981 that this particular visitor took to get to that page (tracked using cookies or database entries to reconstruct the breadcrumb links).

The content page can also show links to the category index pages for "find more gadgets", "find more widgets", and "find more new products" as applicable.

This stuff can be pulled from the database using a product table showing which products belong in which categories, it does not need to be reflected into the URL or the URL structure.

For sites that also check the product-name part of the URL matches the record number, there's other advantages.

That is, for the product at
www.example.com/24981-left-handed-widget if a user requests www.example.com/24981-left or www.example.com/24981-left-handed-widgets-are-great-buy-one-now the site should issue a 301 redirect to the correct URL for that product. Having got that functionality in place you can then deliberately post links like example.com/24981 to Twitter and other places that need 'short' URLs (Yes, I know the shortening rules have recently changed on Twitter) knowing that your site will redirect the user to the correct place, without having to rely on a third-party site for URL-shortening services. With an extra few lines of database wizardry you can then also track incoming traffic for those 'short' URLs.
JS_Harris




msg:4098257
 8:24 pm on Mar 15, 2010 (gmt 0)

I love reading juicy tidbits straight from the source, great article.

The best way to think about it is that the number of pages that we crawl is roughly proportional to your PageRank


the low PageRank pages on your site are competing against a much larger pool of pages with the same or higher PageRank.


Imagine we crawl three pages from a site, and then we discover that the two other pages were duplicates of the third page. We'll drop two out of the three pages and keep only one, and that's why it looks like it has less good content. So we might tend to not crawl quite as much from that site.


Eric Enge: Can you talk a little bit about Session IDs? Matt Cutts: Don't use them.


(on paid affiliate links:) "... we usually would not count those as an endorsement"


Good stuff.

alahamdan




msg:4098655
 10:02 am on Mar 16, 2010 (gmt 0)

on paid affiliate linkshappy! "... we usually would not count those as an endorsement"


Excuse my English please, what do endorsement means here? and how Google know the affiliate link as an affiliate?!

pageoneresults




msg:4098658
 10:23 am on Mar 16, 2010 (gmt 0)

Excellent interview with Matt Cutts by Eric Enge!

"... if you are trying to block something out from robots.txt, often times we'll still see that URL and keep a reference to it in our index. So it doesn't necessarily save your crawl budget"


Some may not have caught that little tidbit that floats off to the right of the discussion about KML Files. I've been involved in some recent robots.txt discussions and my stance is that you SHOULD NOT use them to block indexing of content. Google broke the protocol when they decided to show URI only listings. I've been reading that protocol top to bottom, left to right, etc. to see where it states that a UA can index a URI and display it while performing specific queries. IT DOESN'T!

So, folks are left with the BEST option which is to control the indexing and following of content at the page level either via META Robots or X-Robots-Tag (or whatever other methods you've conjured up). X-Robots-Tag seems to be the preferred method amongst some of me high tech peers, we're using it also for global NoArchive directives.

Back to this robots.txt and crawl equity. I'm working with a real world example now. I'll generalize it a bit but it goes like this. Site should ONLY have about 10k pages indexed, these are the final destination pages that have the meat. There is a Disallow for sub-directories which contain content that SHOULD NOT be crawled.

The internal linking structure of the website points to those Disallowed directories. Googlebot indexes the site and continually gets instructions on a large group of URIs that are Disallowed via robots.txt. How many URIs? Oh, about 40k+ that are now URI only listings.

Question, what do you think that does to Crawl Equity? I'd be interested to know your thoughts.

We're going to find out. The Disallows are coming out and we will be implementing page level directives to block indexing. My experience over the years shows me that the SE bots obey META Robots NoIndex, or NoFollow, or both NoIndex, NoFollow. NoIndex removes the page from the index - period. There appear to be no questions there. I see some folks stating otherwise but I've yet to see a real world example.

Note: I see a lot of people who Disallow: /search/ in their robots.txt files, that's like a MikeBoneŽ for Googlebot and others. Do a site:example.com/search/ and expand the results. How many URI only listings do you have? Do a site:example.com/****** for any items listed in robots.txt, expand the results. :(

julinho




msg:4098718
 12:51 pm on Mar 16, 2010 (gmt 0)

Another (half) answer to a question which has been asked before:

We absolutely do process PDF files. I am not going to talk about whether links in PDF files pass PageRank. But, a good way to think about PDFs is that they are kind of like Flash in that they aren't a file format that's inherent and native to the web, but they can be very useful.

phranque




msg:4098746
 1:43 pm on Mar 16, 2010 (gmt 0)

note that he makes a distinction between image-based pdfs and text-based pdfs.
then he kind of does half a backpedal re:OCR "in some situations".
then he points out the poor user experience with pdf's.

[sarcasm]"as clear as an azure sky of deepest summer"[/sarcasm]

tedster




msg:4098778
 2:13 pm on Mar 16, 2010 (gmt 0)

what do endorsement means here?


Google wants to consider a dofollow backlink as a "vote" or "endorsement" of the page it links to. As I see it, Matt is saying that they don't want to transfer PR or other link juice through an affiliate link because the link is not freely given -- it is there for financial purposes.

and how Google know the affiliate link as an affiliate?


They don't tell us, but I also don't think it would be too hard to build a list of affiliate sites and see how the tracking is done in most cases. Then only backlinks that don't involve the tracking technology would be treated as true editorial "votes".

pontifex




msg:4099003
 7:40 pm on Mar 16, 2010 (gmt 0)

but, by the way: do not care about PageRank

The best way to think about it is that the number of pages that we crawl is roughly proportional to your PageRank


yeah... DO care about it a lot, if you have a large site!

dstiles




msg:4099060
 8:43 pm on Mar 16, 2010 (gmt 0)

pageoneresults - yes, I noticed it. I put it down to just another google discourtesy: grabbing more pages to boost their publicised stats (we're the best!) whilst ignoring the robots.txt spec - or at least mis-interpreting it.

I get over it by adding a 405 to pages they ain't supposed to visit for detected non-browsers: ie bots. That also copes with quite a few scrapers and form-spammers.

aleksl




msg:4099244
 3:11 am on Mar 17, 2010 (gmt 0)

g1smd, but then you are missing a whole lot of situations with your single URL. What if a product naturally belongs to Widgets, Gadgets, and SuperGadgets on Sale categories?

With your single URL, you have no clue which particular section user is on.

And how would you display breadcrumbs for this product if categories are not inclusive? would you display as 3 separate breadcrumbs? sounds silly.

The world should NOT evolve around search engines in situation as this one. It is way more natural TO PEOPLE to have same product in different categories, and there should be a way to reflect that in your hierarchy. If SEs don't like that - tough luck, this is such a simple and COMMON example for them to follow and resolve, that I think they ought to figure it out by now.

tedster




msg:4099263
 3:36 am on Mar 17, 2010 (gmt 0)

If SEs don't like that - tough luck


But it's likely to be tough luck for the website if their business model depends on free search traffic from Google.

A URL does not need to reflect the site's logical structure - and coming as close as you can to "one URL for one unique bit of content" can help you in search.

Any site can do as it likes for their visitors, clearly that has always been so. So go ahead with as much complexity in your faceted navigation as you feel makes sense for your users. Still, understanding how the search engines work today can be very useful for many to get more traffic - today - and that was Matt's topic here.

g1smd




msg:4099801
 7:24 pm on Mar 17, 2010 (gmt 0)

What if a product naturally belongs to Widgets, Gadgets, and SuperGadgets on Sale categories?


Great. Link to those categories from the product page. Find more... 'Widgets', 'Gadgets', 'New Products'.

If the site uses a database and cookies, it is easy to build a separate breadcrumb trail for that visitor.

The multi-faceted hierarchy does not need to be included in the URL of the individual product pages.

pageoneresults




msg:4099856
 8:21 pm on Mar 17, 2010 (gmt 0)

My strategy is to provide only one URI per product. We do this by using NoIndex at the intermediary page levels. I've found that most carts are breadcrumb based and you end up with product URIs that represent each category which is not optimal. In fact, it will drag you down in the SERPs if you're not careful.

example.com/mfg/sku

^ That is the final destination URI. Don't give me any crap about not having keywords in the URI either. The breadcrumb leading up to the final destination had all the keywords I was targeting. ;)

We'll NoIndex all those category levels inbetween and send the bot directly to the final destination URIs. Oh, they do an excellent job of obeying protocol and not indexing the document, they will Follow links as that is the default behavior of just using NoIndex.

One of me peers on Twitter referred to it as a Reverse Pyramid. Note, there are other elements at play here like the link rel using next, prev, start, etc. There are ways to create a taxonomy for the bots and not have them bounce around all the intermediary.

You also need to make sure that any sitemaps are feeding only those final destination URIs. There is no need to waste crawl time on documents that serve no other purpose than to take the visitor one step further into the drilldown.

If you have 100k documents and you only have 50k crawl time credit, you surely don't want that bot wasting resources on intermediary (taxonomy drilldown) documents, do you?

Oh, we also Ajax the heck out of everything we don't want the bot getting into. We know from experience that if bots can get into dynamic content, they will do harm. They will find some flaw in your rewrites, whatever. I never thought I'd hear myself say that we block the bots from getting content that most others would think they want indexed. Think really hard, do those documents really serve a purpose? :)

Ya, I know, ya'll are going to raise a big stink over the above. There is a method to my madness. And no, we are not adding NoIndex to the upper level categories, those are your power docs. :)

Note: I think using robots.txt to block indexing of content is the Kiss of Crawl Death if the right environment is present.

TheMadScientist




msg:4099865
 8:35 pm on Mar 17, 2010 (gmt 0)

Ya, I know, ya'll are going to raise a big stink over the above.

Of course...
You just gave away the farm.

I hope all readers disregard your suggestions and those of g1smd...
Please don't post again, Either of You... Thanks! ;) (lol)

g1smd




msg:4099869
 8:42 pm on Mar 17, 2010 (gmt 0)

If you're using pretty much any of the popular forum, blog, cart, or CMS solutions out there they are full of the problems we're talking about here, as well as many other problems that I have mentioned over the last few years. I first raised some of these points in about 2004, and six years later they still exist. Not much danger of any of this stuff being fixed any time soon, so you're safe for quite a while yet.

TheMadScientist




msg:4099875
 8:51 pm on Mar 17, 2010 (gmt 0)

Not much danger of any of this stuff being fixed any time soon, so you're safe for quite a while yet.

Thanks for the reassurance...

I first raised some of these points in about 2004, and six years later they still exist.

That's exactly why I code all my own software.

tedster




msg:4100414
 3:41 pm on Mar 18, 2010 (gmt 0)

Don't give me any crap about not having keywords in the URI either.

I almost spit coffee on my keyboard, pageone! I just had a big go-round with a development team yesterday about this very topic. It's one of those "everyone knows" ideas that I say just isn't true. From what I can see, keyword-in-path (not keyword-in-domain, that's another story) is not a primary relevance signal, but rather it is used as a second level or merely reinforcing signal.

When URL rewriting gives you the ability to create any URL you want for any resource, how could it be otherwise? It's almost as vaporous as the keyword meta-tag. What I see websites doing is creating unnecessarily long URLs just to jam their keywords in -- and then what happens? Google truncates it in the visible SERP anyway, or gives you a breadcrumb trail instead. Have you noticed how short those display URLs are in the Google SERPs these days?

For my money, the URL is mostly one of your click magnets, and that is its most important function - for people, not for the algo. The important keyword signals for the algo are elsewhere -- in the title, on the page, in anchor text, in semantically co-occurring phrases, etc. And the URL does NOT need to reflect the breadcrumb navigation.

potentialgeek




msg:4100873
 7:24 am on Mar 19, 2010 (gmt 0)

Matt Cutts: "Our philosophy has not changed, and I don't expect it to change. If you are buying an ad, that's great for users, but we don't want advertisements to affect search engine rankings. . . . Our stance has not changed on that, and in fact we might put out a call for people to report more about link spam in the coming months. We have some new tools and technology coming online with ways to tackle that. We might put out a call for some feedback on different types of link spam sometime down the road."

Hey, Tedster, I see Mr. Enge has an SEO book out (2009). When are you going to write one?

p/g

phranque




msg:4100929
 10:46 am on Mar 19, 2010 (gmt 0)

...we might put out a call for people to report more about link spam in the coming months.

i found this part to be particularly alarming.
you could be generous and call this "crowd-sourcing of search quality" which actually sounds like an admission of failure.
closer to the truth would be to call it "ratting out your competition" which is simply bad karma in my opinion and incentivising snitches leads to unintended consequences.
google should instead strive to identify and reward quality.

g1smd




msg:4104075
 9:28 pm on Mar 24, 2010 (gmt 0)

@aleksl
g1smd, but then you are missing a whole lot of situations with your single URL. What if a product naturally belongs to Widgets, Gadgets, and SuperGadgets on Sale categories?

With your single URL, you have no clue which particular section user is on.

And how would you display breadcrumbs for this product if categories are not inclusive? Would you display as 3 separate breadcrumbs? sounds silly.

No. It's really very easy to build a personalised category tree for the visitor.

You need to set up cookies named Level_1, Level_2, Level_3, Level_4, and so on for however many navigation levels there could be on your site.

In your database and/or in your CMS/cart logic you assign a 'navigation level' to each page. When going to a 'lower level' within the site, the cookie data for the 'next level up' last visited page is sent to the browser.

For levels that involve a user search input, the cookie data is sent to the browser as the search results page is shown, that search result being 'one level down' from the search page. The cookie should contain the keywords that were typed in in that previous search.

On any page, anywhere on the site, the breadcrumb trail is built simply using each of the 'last visited category' URLs (gleaned from that visitor's cookie data) for each of the levels above the currently visited level.

For users accessing the pages, but where cookies are not stored and used (especially where users are search engine bots), the database data can be used to build a 'default' breadcrumb trail (i.e. not based on any previous visitation path) on the current page, and for any page of the site - and in so doing can herd the bot to follow the best navigation paths.

Sideways pointing 'customers also looked at/bought...' up-sell links, and 'find more category1', 'find more category2', 'find more category3' product links, complete the picture.

Robert Charlton




msg:4105839
 11:55 pm on Mar 27, 2010 (gmt 0)

I can see a careless reading of the interview leading to too many direct links from home to product pages and perhaps a loss of hierarchical structure on sites that really need it.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved