Forum Moderators: open
I know they are all correct since they are generated by the script (all of them).
Googlebot indexes them as /folder/another_folder <- without the trailing slash. Some of them do have slash and some of them don't. How can I prevent it from dropping slash?
I am talking about Googlebot. When it comes around my website to spider, it will drop trailing slashes.
This results in the error page (which is identical for every error) and it says "widget not found". And in SERPs they come up as separate pages. I thought Google was combining identical results.
I will rewrite the code to include version without the slash. But I am still interested why would it do it?
www.example/subdirectory
and
www.example/subdirectory/
with the link without the trailing slash being treated as a dead link (no title, no description, just the URL).
I have looked throughout my site for links that don't have the trailing slash, but they all have them. Gbot is either following an external link that is missing the trailing / or there is some sort of quirk in the Gbot crawler right now that is indexing subdirectories with and without the trailing slash.
I started noticing it about a week ago.
I wish I could notify Google about it. But who is going to listen to me? :)
Report errors, bugs and broken links: webmaster@google.com
I bet they'll listen to you! ;)
However, normaly a webserver appends the trailing slash / to a directory request: a request without trailing slash normaly returns a 301 redirect to the url with trailing slash. So it might be a good idea to implement something similar into your scripts to avoid errors.
Example:
curl [webmasterworld.com...]
returns this:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://www.webmasterworld.com/forum3/">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.26 Server at www.webmasterworld.com Port 80</ADDRESS>
</BODY></HTML>