Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Weird URL in Webmaster Tools Crawl Report

Could it get me de-ranked?

         

dataguy

1:12 pm on Sep 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got a pretty major web site which took a hit from Google last night (Friday night), sending traffic from about a thousand visitors per hour to about 10 per hour. I have no idea what has brought this on, but when I logged into Google's Webmaster Tools I noticed that this site suddenly has thousands of crawl errors as recorded a few days earlier. The errors are "Not found" errors, and when I look at the URL's that were not found, they are URL's which are not and never have been in my site maps. The URL's appear like this:

"http://MyDomain.com/%3CA%20href='http://www.someoneelsesdomain.com/"

Any idea what would cause these URL's to show up in my crawl error reports? Could this be related to why my site has suddenly been de-ranked? Any advice on what I should do?

Thank you for your insight.

tedster

5:11 am on Sep 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google will report 2 types of crawl errors: URLs listed in your sitemap and also URLs in your domain returning errors that they found in their their regular web crawl.

The question is where are they finding these urls, which may have been designed in order to get someone-elses-domain indexed. When you look at the unescaped "extra" part of the bogus url, it is "<a href='anotherdomain'>. It looks something like a call for a script, the kind that is used to count clicks on links, or to stop passing PR through links -- something like that.

First thing I would do is verify that your server actually returns a 404 header response for these URLs, and nothing else. Verify the actual http header, don't just look at the resulting browser page. If by any chance your server is doing something else, get that fixed as fast as you can or the problem will never go away.

Next I would verify that these URLs are not even mentioned anywhere on your domain -- this would include looking for any server logs and analysis files that might accidentally be open to public crawling.

If you have verified that no hint of these urls exists anywhere on YOUR domain, then you've done your due diligence. Time to write to Google, I'd say -- thousands of these urls showing up all at once could well be part of a bigger picture that is hurting you.

gehrlekrona

5:23 am on Sep 17, 2006 (gmt 0)

10+ Year Member



That might be a problem. Last time, in June when they had another major srewup, I noticed something similar. The oaths were all wrong and there were nested paths and God knows what.
However, I think that it doesn't really have with your site to so, well in a way it does since you and I and tons of others have been hit with the hammer again, but I think it is a major flaw in googles algo. My thinling is that they are trying to get rid of spam sites and doesn't care if other sites also go to hell when they are doing it. If people don't complain to Google about then they don't care about it. They think they have done a great job and everybody is satisfied with the "great search results" they can show, even if it is other spam sites with stolen content at the top......
Google is just such a mess and if *I* had a search engine they would have been deleted ASAP for spreading all these spam site, promote stolen content and money from it. It's almost got to be illegal the way they do business?!?

g1smd

9:20 am on Sep 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another possibility is that Google is mis-spidering your site due to a HTML validation error on some other site that links to you.

It looks like the other site failed to provide anchor text or to close the </a> tag on the first link (to your site) and Google carried on parsing the next URL on their page as if it were still a part of the first URL that it was already considering. However, as already stated, as long your your site returns a 404 status in the HTTP header, then this will never be a problem.

I would say that they found something like this on some other site:

<a href="http://www.yoursite.com/a href="http://www.someothersite.com/">the anchor text</a>...

That looks like a simple cut and paste error, and I would expect there to be millions of similar such errors across the web. Notice the lack of closing quotes on the first URL. I also assumed that %3C is a / but I didn't actually look. It might be a > instead.

If the owner of the other site had used either the HTML validator, or something like Xenu LinkSleuth, then this error would have been very quickly found by them.

dataguy

5:56 pm on Sep 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the input, I found the problem.

I didn't realize that the Not found report included URL's on a web site which aren't listed in the sitemaps for the web site.

My site has a lot of user created content, and ocassionally someone will post an outbound link using incorrect HTML. It appears that Google is interpreting the link as a relative, local link and prepending the base URL of my site onto it, so that the link looks like 'http://mysite.com%20http://www.theirsite.com', which of course is totally incorrect.

I don't know why all of a sudden these are showing up in the Not found report on Webmaster Tools, they weren't there before. Regardless I don't think this is what cause my site to tank on Sept. 15th. For the record, my server does return a standard 404 when any of these URL's are viewed.

Now if someone can tell me why my site tanked on Sept. 15th, that would be the million dollar question.

Thanks for the help.

g1smd

6:29 pm on Sep 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I think you found a problem, one that is common on many sites, but I don't think you found the exact problem for the example you gave in your original post.

Look again, your original post contained ..a%20href="http://.. in the middle of the example URL. I still think that one is caused by duff HTML on some other site.

MrSpeed

9:05 pm on Sep 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have made the same mistake when linking out and get that same problem.

Have you ran the site through Xenu?
Even when I think things are pefect it seems to find things.

Sorry to hear about your troubles. I think I know what some of your sites are and they don't deserve to be tanked.