Forum Moderators: open
Now that Mister Googlebot has throughally groped my site I've noticed that he's hit a few errors for internal links - that don't exist in my site. I started off with a few relatively linked pages ("../../widgets.html") but then switched to absolute links ("/widgetfile/widgets.html") - but there's still a few more of those left.
I also noticed that the errors always include "/images/" with the rest of an existing link after it - my robots.txt says stay out of "/images/" - perhaps something to do with it as well?
Just wondering, and hoping Google 404's aren't of any concequence...
are you using flash? This is the only case I know where problems arise from relative links (see this thread: Googlebot reading flash? [webmasterworld.com])
We use a <base href="http://www.widgets.co.uk/"> and all our links are relative due to the directory structure (i.e. www.widgets.co.uk/products/product.htm). I'm noticing hundreds '500 errors' because google was trying to pull www.widgets.co.uk/category/products/product.htm rather than the correct www.widgets.co.uk/products/product.htm and also A LOT of 404's ;)
This has only happened this morning and nothing has been changed over the past month or so (we get crawled daily).
The IP addresses of the googlebots were:
64.68.86.9
64.68.87.43
64.68.86.54
64.68.86.79
for example i get 404's with links like "/print/send" where as on the "/print/" page i only have links like href="/send/"
it seems like a bug to mee unless it's not really googlebot
it seems like a bug to mee unless it's not really googlebot
Must be - i've never noticed anything like this before in my logs. It's not an imposter as far as I can tell. The IP addresses I mentioned earlier appear in the post (http://www.webmasterworld.com/forum3/15655.htm)
Our other website was spidered fine although it doesn't use subdirectories with product categories - all files(products/categories) appear in root directory.
Google has visited yet again today! But...
[error] [client 64.68.88.145] File does not exist: /home/********/public_html/salon/boutique/images/select/select.html
[Tue Sep 16 08:13:59 2003] [error] [client 64.68.88.151] File does not exist: /home/********/public_html/salon/boutique/images/original/original.html
[Tue Sep 16 05:59:37 2003] [error] [client 64.68.88.7] File does not exist: /home/********/public_html/salon/images/gallery/gallery.html
[Tue Sep 16 03:15:05 2003] [error] [client 64.68.88.131] File does not exist: /home/********/public_html/salon/boutique/images/couture/couture.html
...no relative links in Flash, either : P
We now have ireffutable proof that the googlebot DOES read Flash files. Sure enough, after doubting the certainty of my comments above, I opened up my flash files to have a look again, and lo and behold, I had a few relative links in there. The thing is, they were realtive from the page where the flash file was nested, not the flash file itself: the "widgetroom.html" page, residing withing the "widgetroom" folder, called the flash file from its own /images/ folder within the same folder... so if one was to follow a relative "newlink" link directly from the flash file one would get "/widgetroom/images/newlink"
This brings up a few more questions, though... : P