Forum Moderators: open
HTML validates XHTML 1.0 Transitional.
User-agent: *
Disallow: /dir1/dir2/donotspider.html from the html file lets say dir1/dir2/indexd.html:
<a href="donotspider.html"><img src="donotspider.gif" /></a> and she requested
/dir1/dir2/donotspider.gif/donotspider.html I am pretty sure it is a xhtml parsing issue. No, there is surely no external link to that disallowed file.
----------
I e-mailed googlebot at google dot com and am waiting for response, there is also a 301 issue currently making me extremly nervous, Googlebot getting proper and wanted 301 on some dozen files but not following them - Slurp does.
following a good link badlyis the only thing I currently can imagine, where the good link is disallowed. Not something really severe, but I think it is due to the xhtml code. Is there any problem with "cleaned up" (here: auto-generated) html, i.e. w/o any unnecessary line feeds, tabs, spaces?
Is there any problem with "cleaned up" (here: auto-generated) html, i.e. w/o any unnecessary line feeds, tabs, spaces?
This is something I have worried about in the past; as my pages generally (unless there is inserted advertiser code) don't have any whitespace or new lines at all.
I am confident that Googlebot doesn't have a problem with non-stop markup; as I have had some very large pages indexed in Google with no problem.
However, it is easy to imagine an amateur spider that fetches a page into a local file; and then processes that file line-by-line having trouble with a 100% auto-generated page.
It would be comforting to hear GG state that Googlebot is a professional when it comes to non-stop "pure" markup...