Googlebot dropping / from directory names

Forum Moderators: open

Message Too Old, No Replies

Googlebot dropping / from directory names

moltar

7:52 pm on Sep 5, 2003 (gmt 0)

I have links with a trailing slash (/folder/another_folder/).

I know they are all correct since they are generated by the script (all of them).

Googlebot indexes them as /folder/another_folder <- without the trailing slash. Some of them do have slash and some of them don't. How can I prevent it from dropping slash?

Arnett

1:56 pm on Sep 6, 2003 (gmt 0)

Google does some things like embedding spaces in the urls to keep other SEs from robbing them. What do the urls resolve to when you click them?

moltar

2:52 pm on Sep 6, 2003 (gmt 0)

Yes, it adds spaces, but only in SERPs though.

I am talking about Googlebot. When it comes around my website to spider, it will drop trailing slashes.

This results in the error page (which is identical for every error) and it says "widget not found". And in SERPs they come up as separate pages. I thought Google was combining identical results.

I will rewrite the code to include version without the slash. But I am still interested why would it do it?

Arnett

3:00 am on Sep 7, 2003 (gmt 0)

Where are you seeing this,in your logs or in the SERPS?

moltar

3:33 am on Sep 7, 2003 (gmt 0)

SERPs and logs

rfgdxm1

4:04 am on Sep 7, 2003 (gmt 0)

>Google does some things like embedding spaces in the urls to keep other SEs from robbing them.

No, this is just a Google display bug. A bot could get the correct URL from the clickable link.

sidyadav

4:51 am on Sep 7, 2003 (gmt 0)

If you ask me , a typical bug... Must be fixed...

RonPK

9:24 am on Sep 7, 2003 (gmt 0)

rfgdxm1, are you sure it is a 'display bug'? I allways thought the space was there to enable line wrapping on small screens.

Yidaki

9:40 am on Sep 7, 2003 (gmt 0)

>the space was there to enable line wrapping on small screens.

Exactly. Spaces are only used if a displayed url is over a certain length - that's why i also think that it's to enable wrapping.

PatrickDeese

5:04 pm on Sep 8, 2003 (gmt 0)

I have noticed the same thing when doing a negative search to see all the pages in the G index, I have recently seen several instances of listing both:

www.example/subdirectory

and

www.example/subdirectory/

with the link without the trailing slash being treated as a dead link (no title, no description, just the URL).

I have looked throughout my site for links that don't have the trailing slash, but they all have them. Gbot is either following an external link that is missing the trailing / or there is some sort of quirk in the Gbot crawler right now that is indexing subdirectories with and without the trailing slash.

I started noticing it about a week ago.

moltar

5:20 pm on Sep 8, 2003 (gmt 0)

It's definetely doing it by itself, not from external links. I have ~ 50 deep links on one of my pages and *all* of them have trailing slash. As I said, links are generated by a script. And some of them Googlebot followed with the slash and some of them without.

I wish I could notify Google about it. But who is going to listen to me? :)

Yidaki

5:33 pm on Sep 8, 2003 (gmt 0)

>I wish I could notify Google about it. But who is going to listen to me? :)

Report errors, bugs and broken links: webmaster@google.com

I bet they'll listen to you! ;)

However, normaly a webserver appends the trailing slash / to a directory request: a request without trailing slash normaly returns a 301 redirect to the url with trailing slash. So it might be a good idea to implement something similar into your scripts to avoid errors.

Example:

curl [webmasterworld.com...]

returns this:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>301 Moved Permanently</TITLE>
</HEAD><BODY>
<H1>Moved Permanently</H1>
The document has moved <A HREF="http://www.webmasterworld.com/forum3/">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.26 Server at www.webmasterworld.com Port 80</ADDRESS>
</BODY></HTML>

Slade

5:45 pm on Sep 8, 2003 (gmt 0)

If you're going to work on your code, have it identify if the url would be correct if it had the slash on it, and issue a redirect permanent.