Forum Moderators: open

Message Too Old, No Replies

Googlebot and characters in URL

What is supported?

         

Sinner_G

8:35 am on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have dynamic pages, some of which are (or to be exact look like they will be) on the index for the first time with the november update.

The problem is that some of the URL have special characters in them. E.g. [mysite.com...]

But only the part until 'blue' appears in the link from google. Is there a problem for googlebot? The spider simulator on searchengineworld gets it right.

bcc1234

2:54 pm on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It should not be a problem but google will not index them unless your have enough PR.

Format your URLs according to the encoding rules and any decent client (including googlebot) should be able to request them.

Sinner_G

3:14 pm on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



google will not index them unless your have enough PR

...which I will get with good content which will only show when the pages are indexed which...

Format your URLs according to the encoding rules

Which rules are these?

bcc1234

3:24 pm on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



...which I will get with good content which will only show when the pages are indexed which...

Content has nothing to do with PR. If you have enough inboing links pointing to your pages (or just the home page) then... maybe...

Which rules are these?

Check the RFC on it, I don't remember the exact number.

Sinner_G

3:52 pm on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RFC2396 on syntax of URI:

An escaped octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing the octet code. For example, "%20" is the escaped encoding for the US-ASCII space character.

Now please correct me if I understand it wrongly, but doesn't that mean the %20 code should be used instead of a space?

bcc1234

4:51 pm on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Either %20 or +

Sinner_G

4:58 pm on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Which is exactly what I have.

Don't get it.

bcc1234

6:36 pm on Nov 28, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just told you that you won't get indexed if you don't have enough PR.

Sinner_G

7:21 am on Nov 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's a misunderstanding here. The URLs ARE indexed, i.e. if I search for 'widgets site:www.mysite.com' those pages are listed, but with the url [mysite.com...] instead of [mysite.com...]

Lisa

9:14 am on Nov 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In links to that page is your href like this:
<a href="http://www.mysite.com/somepage.asp?page=blue widgets">Blue Widgets</a>

If so, I think a bot will incorrectly get the right url. Try
<a href="http://www.mysite.com/somepage.asp?page=blue%20widgets">Blue Widgets</a>

That is my guess.

Sinner_G

9:38 am on Nov 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nice try, but... ;)

From my site all links are coded the second way (with %20). I looked at the code from every link to these pages I could find (with google and with link.all at fast) and they also all feature the %20.

Any other guesses?

bcc1234

12:27 pm on Nov 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Post your site in the profile