Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is it possible to over-block Google?

         

realmaverick

10:49 am on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been working hard on fixing up the errors in WMT.

I've been using a combination of robots.txt, 301 redirects and noindex, follow.

The errors are shrinking daily, but of course, the access denied messages are increasing. There are millions of pages either blocked by robots.txt or via noindex.

I have removed links, to most of the sources of the errors, as well as the links to profiles, which have been noindexed.

However, certain links still exist. For example, we have a report button on each download page, for users to report content. A unique ID is assigned, when the link is clicked, and the next time Google visits it, access is denied. I'm not sure how to get around this atm. (hide it from Google? :op)

All of the pages I've blocked, are similar to this, junk index.php? type links.

Anyway, the big question is, could blocking such a huge volume of pages be an issue? Making such changes always makes me anxious.

Logically, these changes seem in-fitting with Panda but still, I'd like a second opinion.

Thanks :)

lucy24

6:03 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does the unique ID go with a specific named parameter? If so you can go to GWT and tell it to ignore that parameter.

deadsea

6:31 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've worked on a site that had a million crawlable pages and at least ten unique links off each page that were restricted by robots.txt. Those links would also change frequently. Googlebot would see hundreds of millions of uncrawlable urls in a month. Never had a problem with it. Just make sure not to restrict something that you actually want crawled.

realmaverick

7:13 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Deadsea, useful to know.

Lucy, would this be preferable to blocking via robots.txt?

lucy24

8:02 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Blocking and ignoring both work. But if you block something in robots.txt, g### doesn't know that it isn't important.

There was a post within the last few days that showed the exact syntax for blocking query strings in robots.txt.

realmaverick

8:14 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In robots.txt at present, I'm using Wildcards which captures the query at any part of the URL.

In GWT, if I block the parameter "app" for example, will that block all other parameters that contain "app"?

netmeg

8:17 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I block tons of stuff. Tons and tons. If I added it up, I probly block more stuff than I allow (due to gunked up way many Ecommerce and CMS packages are structured) Hasn't hurt me, and I'd probly make the argument that it's helped.

realmaverick

8:26 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's my feeling, is that this should help. I have link juice leaking all over the place and literally junk pages being crawled over and over and even being indexed.

This should get my important content crawled much better, help keep link juice flowing to all the right places and get junk out of the index.

netmeg

8:58 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have always had the hunch, based on nothing whatsoever except that it's a hunch, that it gives off some kind of a quality signal to Google (and for that matter, maybe Bing too) if you are obviously keeping your less worthy stuff out of the index. Like maybe that someone is at least paying attention to the site, and working at it, as opposed to just tossing it out there all naked. I could be wrong. Probably am. It's just a hunch, but it's always worked for me.

deadsea

9:46 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Using robots.txt won't help conserve pagerank. Linking to something that is in robots.txt drops pagerank on the floor the same way that using nofollow does. I've tested it.

Using robots.txt will allow Googlebot to crawl your site more efficiently. That can lead to deeper indexing and fewer server resources to support Googlebot.

g1smd

9:47 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Due to the way MediaWiki is completely SEO unfriendly, I'm blocking >10 000 URLs and allowing <100 URLs.

Doesn't seem to be causing any issues.

realmaverick

10:49 pm on Mar 13, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Using robots.txt won't help conserve pagerank


No I realise that. It was a general statement about my overall optimisations.