Welcome to WebmasterWorld Guest from

Forum Moderators: Robert Charlton & andy langton & goodroi

Message Too Old, No Replies

Googlebot crawls bad links pulled from content

3:11 pm on Oct 31, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 13, 2004
votes: 10

First a reminder if you use webmaster tools
When you're research 404 Crawl errors in webmaster tools, frequently you must "view source" to see what link Googlebot actually used. In many cases there are hidden characters that were used in the link that are not displayed by the webmastertools html. A link in Webmaster Tools may look fine, but the href in the Crawl Errors page source code is BAD!

Googlebot is crawling content and creating BAD links

For me in many cases the text content of typically an MFA site displays a link in text only (not an href in the source code of the page). Googlebot is turning this text into links it is crawling with. The problem is Googlebot is not validating the link in any way at all. So, lots of 404 "file not found" errors are produced in my sites logs. (Again please see the reminder above).

This is getting tiresome. In one case an MFA site had 74 bad "text only" links over that many pages to a page on one of my sites. I don't know what Google's algo thinks of all this. Does Google's algo even realize it is being fed this crap by Googlebot? I don't even expect webmasters to keep text in a web page, which is not encoded as a link in the first place, as a legitimately formatted hyperlink. Sure it would be nice.

I'm sure Google's looking for links in javascript etc, but just grabbing any old text and using it to crawl with is JUST GOING TOO FAR!

Google invented "nofollow", but this type of "text" link crawling certainly circumvents this to some extent. Nofollow was a big mistake.

I do get tired of webmasters that actually declare a site as reference material, but then in the source code they tell Google this is untrusted content with a "nofollow"!

It seems like Google is trying to undermine their own invention.

I'm just wondering how pervasive this Googlebot bad link/ crawl phenomena is?
4:01 pm on Oct 31, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
votes: 0

This has been going on for quite a while, but has got really bad in the last few months.

If the URLs really don't exist and your site correctly returns "404 Not Found" then there's little to worry about - but it is a pain the rear to see lots of these links in WMT reports.

There are already several other threads here with much longer discussions of this topic.

Google does trawl through Javascript, and they now request example.com/$1 and example.com/folder/$2 on a regular basis on a site I recently worked on.
5:02 pm on Oct 31, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 13, 2004
votes: 10

Yes I did see the threads on Googlebot crawling javascript, but I must of missed comments regarding just crawling pure content and creating bad links. In javascript I would expect authors to be creating legitimate links, but in pure content I have no such expectation. (Of course it would be nice.)

I really do wonder if this could be related to some of the problems around Oct 13th. But probably just coincidence.