Forum Moderators: open

Message Too Old, No Replies

Google will not index all php pages

Google will not index all php pages

         

pob123

9:53 am on May 10, 2003 (gmt 0)

10+ Year Member



Hi

I have never had a problem getting my sites listed on Google until recently when someone designed me a new site in php.

Since then Google will only list the main index page but wont list all the other pages in the /doc directory.

Any ideas?

[edited by: heini at 10:22 am (utc) on May 10, 2003]
[edit reason] please, no urls. Thank you [/edit]

ncsuk

9:55 am on May 10, 2003 (gmt 0)

10+ Year Member



Everything after the? is ignored so technically you are still on the index page.

nebuhost

10:09 am on May 10, 2003 (gmt 0)

10+ Year Member



not true at all! the params after the? are not ignored.

Gotta love that auto? formatting. :(

beebware

10:17 am on May 10, 2003 (gmt 0)

10+ Year Member



Urgh. If the programmer that designed your site was any good, they'll be able to easily do away with that?page part of the URL. Use Apache's mod_rewrite function or just use a htaccess file to define "docs" as a PHP file and then make the file use the PATH_INFO variable to pick up the real page info. On one site of mine I've got several hundred thousand pages, but they all look like either directories or static .html files: but they are all driven and created by the same 4 Perl files (1 main library and then 3 others for different functions and database calls).

mitchofoz

10:07 pm on May 10, 2003 (gmt 0)

10+ Year Member



google is working hard to index pages with parameters like the ones used by php. however, here's how i use apache's mod_rewrite to solve the problem:

RewriteEngine On
RewriteRule ^/([^\/]*)/([^\/]*)/([0123456789]*)/?$ /cgi-bin/WebObjects/Store.woa/wa/browse?style=$1&substyle=$2&pageIndex=$3 [P,L]

i use apple's webobjects for my site but you should be able to write similar rewrite rules for php pages with as many parameters as you wish.

you also need to make sure mod_rewrite and mod_proxy are properly compiled into your httpd executable:

./httpd -l
Compiled-in modules:
http_core.c
mod_env.c
mod_log_config.c
mod_mime.c
mod_negotiation.c
mod_status.c
mod_include.c
mod_autoindex.c
mod_dir.c
mod_cgi.c
mod_asis.c
mod_imap.c
mod_actions.c
mod_userdir.c
mod_alias.c
mod_rewrite.c <--------------
mod_access.c
mod_auth.c
mod_proxy.c <--------------
mod_setenvif.c
mod_ssl.c
mod_php4.c

skipfactor

10:23 pm on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Everything after the? is ignored so technically you are still on the index page.

Definitely not the case for me. I have a new site that Deepbot grabbed all of the *.asp?* pages (around 100 pages)on the first deepcrawl. A friend who has a php site confirms this as well.

If your site's only a few months old, and as long as you keep session ids limited to one, two max, I'd wait on the next deepcrawl logs before making changes to page names.