homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Googlebot crawls bad links pulled from content

WebmasterWorld Senior Member 10+ Year Member

Msg#: 4381626 posted 3:11 pm on Oct 31, 2011 (gmt 0)

First a reminder if you use webmaster tools
When you're research 404 Crawl errors in webmaster tools, frequently you must "view source" to see what link Googlebot actually used. In many cases there are hidden characters that were used in the link that are not displayed by the webmastertools html. A link in Webmaster Tools may look fine, but the href in the Crawl Errors page source code is BAD!

Googlebot is crawling content and creating BAD links

For me in many cases the text content of typically an MFA site displays a link in text only (not an href in the source code of the page). Googlebot is turning this text into links it is crawling with. The problem is Googlebot is not validating the link in any way at all. So, lots of 404 "file not found" errors are produced in my sites logs. (Again please see the reminder above).

This is getting tiresome. In one case an MFA site had 74 bad "text only" links over that many pages to a page on one of my sites. I don't know what Google's algo thinks of all this. Does Google's algo even realize it is being fed this crap by Googlebot? I don't even expect webmasters to keep text in a web page, which is not encoded as a link in the first place, as a legitimately formatted hyperlink. Sure it would be nice.

I'm sure Google's looking for links in javascript etc, but just grabbing any old text and using it to crawl with is JUST GOING TOO FAR!

Google invented "nofollow", but this type of "text" link crawling certainly circumvents this to some extent. Nofollow was a big mistake.

I do get tired of webmasters that actually declare a site as reference material, but then in the source code they tell Google this is untrusted content with a "nofollow"!

It seems like Google is trying to undermine their own invention.

I'm just wondering how pervasive this Googlebot bad link/ crawl phenomena is?



WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Msg#: 4381626 posted 4:01 pm on Oct 31, 2011 (gmt 0)

This has been going on for quite a while, but has got really bad in the last few months.

If the URLs really don't exist and your site correctly returns "404 Not Found" then there's little to worry about - but it is a pain the rear to see lots of these links in WMT reports.

There are already several other threads here with much longer discussions of this topic.

Google does trawl through Javascript, and they now request example.com/$1 and example.com/folder/$2 on a regular basis on a site I recently worked on.


WebmasterWorld Senior Member 10+ Year Member

Msg#: 4381626 posted 5:02 pm on Oct 31, 2011 (gmt 0)

Yes I did see the threads on Googlebot crawling javascript, but I must of missed comments regarding just crawling pure content and creating bad links. In javascript I would expect authors to be creating legitimate links, but in pure content I have no such expectation. (Of course it would be nice.)

I really do wonder if this could be related to some of the problems around Oct 13th. But probably just coincidence.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved