Forum Moderators: open

Message Too Old, No Replies

CGI Parameters and Google

How I tricked Google to index my site.

         

hanan_cohen

9:41 am on Jan 2, 2004 (gmt 0)

10+ Year Member



For too long I was frustrated that Google didn't index my site properly.

The main page is a list of items showing only part of the text of each item. A link leads to the full text of the item, like this:

index.php leading to item.php?item=107150141804760623

One of the problems was that the main page was too large for GG (more than 100k) so I reduced it to 20 item.

Didn't help.

Then I read here somewhere about site map pages.

Added a page called site_map.php?begin=1 showing 50 items on each page.

GG found the pages, indexed them but didn't index the items.

Then a friend suggested a trick. He said that maybe GG is "afraid" of cgi parameters so maybe I should send the item number differently, like this:

item.php/107150141804760623

and it works!

GG thinks that item.php is a name of a directory and that 107150141804760623 is also a name of directory so it thinks it is indexing a directory and not a page with a parameter.

Thank you Roman K.

Hope that you all here will find this information useful.

Hanan

Hissingsid

12:02 pm on Jan 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



item.php/107150141804760623

Did you have to make a change to your server setup to make this get parsed?

Best wishes

Sid

dirkz

12:17 pm on Jan 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> Did you have to make a change to your server setup to make this get parsed?

You should use URL rewriting in order to accomplish this in a clean way. By the way, just look at this forum :)

hanan_cohen

12:37 pm on Jan 2, 2004 (gmt 0)

10+ Year Member



I didn't have to change anything on the server to do that.

I can send the PHP code to anyone who ask. Send me a stickey mail or (with the forum master permission) I can post it here.

It's really simple.

abates

5:24 am on Jan 3, 2004 (gmt 0)

10+ Year Member



I did something similar in Perl for a database on my site, though I included a superfluous .html on the end of the URL just for show ;)

ALbino

5:45 am on Jan 3, 2004 (gmt 0)

10+ Year Member



This has long been the solution for solving that particular 'issue' with Google. The only problem is it doesn't work for all the engines (AV comes to mind). Hope it helps your site though, as G's all that really counts right now anyway :)

amznVibe

6:25 am on Jan 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's called dynamic to static URL conversion. Lots of threads around here on it for a variety of environments.

GoogleGuy

7:54 am on Jan 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Glad it worked for you, hanan_cohen!

dirkz

8:24 am on Jan 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sticky on the way. By the way, it should be all right to post that.

hanan_cohen

5:13 pm on Jan 3, 2004 (gmt 0)

10+ Year Member



I can either send the parameter to the page by
item.php?item=105688773543182278

or by

item.php/105688773543182278

the PHP code it

if(empty($item)){
$item = ereg_replace ("/", "", getenv("PATH_INFO"));
}

Thats it. No big deal.

kazonik

5:57 pm on Jan 3, 2004 (gmt 0)

10+ Year Member



Hi hanan,

Thanks for the code snippet!
Its a simple way to solve a problem and saves having to mess with the webserver's configuration file.

I'll keep that one in mind for when I need it :)

fkottar

1:16 am on Jan 4, 2004 (gmt 0)

10+ Year Member



Great snippet, Hanan.
The best part is that it does not involve messing with the configuration of Apache.

Thanks

mipapage

1:37 am on Jan 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hanan_cohen,

Congrats on the good work!

Could any of you in the know of using PATH_INFO and mod_rewrite, provide any comments over here [webmasterworld.com]?

not hijacking, just looking for some help!

dirkz

7:40 pm on Jan 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hanan_cohen, thanks for that.

g1smd

9:43 pm on Jan 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nice code snippet. Very useful.

I guess that Google was wary in case the very long number was a session ID.
Google avoids pages that have those.

hanan_cohen

9:52 pm on Jan 4, 2004 (gmt 0)

10+ Year Member



Actually it's a Blogger item ID. So since Blogger and Google are one company now, I sure hope they will not filter it out.

GoogleGuy

5:14 am on Jan 5, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The only reason not to trust long parameters is session IDs. You can end up getting the same page many times with different urls because the session ID is different; that's why we're wary about crawling urls with those long id numbers. I'm glad you mentioned this tip.

antrat

5:30 am on Jan 5, 2004 (gmt 0)



Hi hanan_cohen

That is simply cool!

I have a forum on my site that Google seems to eat up well. It uses URL's like php?tid=8278. However, I may soon be forced to change to a forum that uses those horribly long sid#'s

Where would I place your PHP code?

hanan_cohen

6:00 am on Jan 5, 2004 (gmt 0)

10+ Year Member



To GoogleGuy:

Whay you are actually saying is that the problem is not with GCI parameters per-se but with long values? It makes sense because I have another site that's crawled quite nicely while have URL's like:

static.asp?apd=27&scd=143&pd=284

And another thing. What I have shown here is a trick/workaround, meaning that maybe, in the future, when it gets too popular, Google might decide to detect it and then we are back to square one.

Isn't it so?

Or maybe we can rely on Google to look up the valid Blogger ID's and know they are Kosher?

To antrat:

The code is placed at the top of the file, where it has to decide on what dynamic value to act upon.

dirkz

1:59 pm on Jan 6, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A statement from GoogleGuy on dynamic URLs! Just flagged this thread :-)