Forum Moderators: Robert Charlton & goodroi
"http://MyDomain.com/%3CA%20href='http://www.someoneelsesdomain.com/"
Any idea what would cause these URL's to show up in my crawl error reports? Could this be related to why my site has suddenly been de-ranked? Any advice on what I should do?
Thank you for your insight.
The question is where are they finding these urls, which may have been designed in order to get someone-elses-domain indexed. When you look at the unescaped "extra" part of the bogus url, it is "<a href='anotherdomain'>. It looks something like a call for a script, the kind that is used to count clicks on links, or to stop passing PR through links -- something like that.
First thing I would do is verify that your server actually returns a 404 header response for these URLs, and nothing else. Verify the actual http header, don't just look at the resulting browser page. If by any chance your server is doing something else, get that fixed as fast as you can or the problem will never go away.
Next I would verify that these URLs are not even mentioned anywhere on your domain -- this would include looking for any server logs and analysis files that might accidentally be open to public crawling.
If you have verified that no hint of these urls exists anywhere on YOUR domain, then you've done your due diligence. Time to write to Google, I'd say -- thousands of these urls showing up all at once could well be part of a bigger picture that is hurting you.
It looks like the other site failed to provide anchor text or to close the </a> tag on the first link (to your site) and Google carried on parsing the next URL on their page as if it were still a part of the first URL that it was already considering. However, as already stated, as long your your site returns a 404 status in the HTTP header, then this will never be a problem.
I would say that they found something like this on some other site:
<a href="http://www.yoursite.com/a href="http://www.someothersite.com/">the anchor text</a>...
That looks like a simple cut and paste error, and I would expect there to be millions of similar such errors across the web. Notice the lack of closing quotes on the first URL. I also assumed that %3C is a / but I didn't actually look. It might be a > instead.
If the owner of the other site had used either the HTML validator, or something like Xenu LinkSleuth, then this error would have been very quickly found by them.
I didn't realize that the Not found report included URL's on a web site which aren't listed in the sitemaps for the web site.
My site has a lot of user created content, and ocassionally someone will post an outbound link using incorrect HTML. It appears that Google is interpreting the link as a relative, local link and prepending the base URL of my site onto it, so that the link looks like 'http://mysite.com%20http://www.theirsite.com', which of course is totally incorrect.
I don't know why all of a sudden these are showing up in the Not found report on Webmaster Tools, they weren't there before. Regardless I don't think this is what cause my site to tank on Sept. 15th. For the record, my server does return a standard 404 when any of these URL's are viewed.
Now if someone can tell me why my site tanked on Sept. 15th, that would be the million dollar question.
Thanks for the help.
Look again, your original post contained ..a%20href="http://.. in the middle of the example URL. I still think that one is caused by duff HTML on some other site.