Msg#: 4366179 posted 6:14 am on Sep 23, 2011 (gmt 0)
In webmaster tools account, I found some urls are going to 404 not found but these are text link on third party site:for
http://www.example.com/ 29-Jun-11 but webmaster treated as a - http://www.example.com/29-Jun-11
n these links are from directories n boomarking type sites.. If the third party site is having some alignment problem and they cant show the complete url in one line then why it is showing in crawl error section? Y it is broken link?
So, my questions are: 1. What can reasonably be done to cause these errors to stop appearing? 2. Any thoughts in terms of how long these errors will continue to appear... months/years? 3. Is there any way to contact Google and have them cut this nonsense out. 4. My site has also been out from google, is 404 error is the reason for it.?..
[edited by: tedster at 2:53 pm (utc) on Sep 23, 2011] [edit reason] switch to example.com [/edit]
Msg#: 4366179 posted 4:20 am on Sep 24, 2011 (gmt 0)
Most of these sites are scrappers that use a broken link to trick Google. I wouldn't waste a time looking if they have traffic - don't need their traffic. I would save the 301 (and its machine process) for important cases.
I've had googlebot test for wordpress on non-wordpress sites several times, their favorite page(on my sites) being xmlrpc.php (wordpress remote posting page). That page outputs just on line of text that differs if you allow/disallow remote publishing, it's a nice lightweight page that also tells google if your site is more vulnerable to hacking.
If you have a wordpress site you can visit example.com/xmlrpc.php to see what I mean. if you don't you can do a Google search for "XML-RPC server accepts POST requests only." WITH quotes and see over a million listings for a page that is rarely ever linked to. Google's doing some detective work beyond just crawling your pages, looking for signatures...
"aha, this is a wordpress site, apply known wordpress filters." - GBot