Google doesn't index .php's?

Forum Moderators: open

Message Too Old, No Replies

Google doesn't index .php's?

php, google

alik

6:25 pm on Apr 12, 2004 (gmt 0)

I use php server programming quite a lot in my sites.
All of my pages have .php extension and I am starting
to suspect that Google doesn't index pages with such
extension, since I can't get listed in Google by any
means.

Has the googlebot troubles to crawl sites build with php? It shouldnt, as this is not client side, but server side programming...

Jesse_Smith

1:08 am on Apr 13, 2004 (gmt 0)

It does crawl php, but .html is always the best. Use mod_rewrite to change the URL.

rogerd

1:20 am on Apr 13, 2004 (gmt 0)

Alik, it's less likely that the .php extension is the problem than a query string that follows it. Long query strings and in particular session IDs are hindrances.

The other thing you can do to encourage spidering is to increase your inbound links. Also, try spidering the site yourself to be sure bots can navigate the site (use Xenu, for example). Check your robots.txt, too, to be sure you aren't inadvertently telling Googlebot to go away.

bbonline

1:54 am on Apr 13, 2004 (gmt 0)

I have many sites with php and Google spiders them all inlcuding search terms.

One of many things that shows Google is the best is searching and indexing.

trimmer80

4:44 am on Apr 13, 2004 (gmt 0)

>>>>>>>It does crawl php, but .html is always the best

Totally disagree,
google does not discriminate between file extensions. It does prefer simply structures though. A mod rewrite should be used to simplify complex query string navigations.
example
www.example.com/index.php?section=computers&subsection=monitors&brand=hp

may not be crawled as well as

www.example.com/computers/monitors/hp/

which can be the same file with mod rewrite

sandor

7:02 am on Apr 13, 2004 (gmt 0)

i have lots of .php and google gives them lots of love .. same as for .html in my view.

curlykarl

7:58 am on Apr 13, 2004 (gmt 0)

i have lots of .php and google gives them lots of love

Same here, pure php with no problems!

Karl

expert_21

8:25 am on Apr 13, 2004 (gmt 0)

Google crawls php, with queries even. Just don't crawl those with session ids.

MikeBeverley

3:28 pm on Apr 13, 2004 (gmt 0)

Use mod_rewrite to change the URL.

Everyone keeps suggesting this but I think it is bad advice. This goes in line with the 'creating pages for search engines rather than users' problem. Google don't have a problem indexing .php pages and if your linking structure is sound (internal and inbound) then you've got nothing to worry about.
The only reason you really need to use mod_rewrite is if you have terrible inbound links and the only other thing you can think of is to change what Google sees as your URL, it doesn't affect whether Google lists your page and ranks it.

ThreeQuarks

3:35 pm on Apr 13, 2004 (gmt 0)

php urls with session ids are being crawled by googlebot at my site - no problems at all.

netguy

3:49 pm on Apr 13, 2004 (gmt 0)

If your php code looks like this:
ht*p://www.yourdomain.com/store/index.php?action=item&substart=0&id=85

Google won't follow it. As rogerd said, Long query strings and in particular session IDs are hindrances. (Some have found this out the hard way).

GoogleGuy

5:12 pm on Apr 13, 2004 (gmt 0)

We give equal love to .php and .html, and even .asp. :) We normally dislike "&id=yyy" parameters though because they are normally session IDs.

Some Google aficionados will probably notice that we're doing better on dynamic urls too. We always crawled dynamic urls with one or two parameters, but recently we've started to loosen those restrictions a little more.

Yes, I did use Google spellcheck for aficionado. I've gotten to where I just type something close like aficiando cuz I know it'll find it. :)

alik

7:09 pm on Apr 13, 2004 (gmt 0)

>Google won't follow it. As rogerd said, Long query
>strings and in particular session IDs are hindrances.
>(Some have found this out the hard way).

So if your site is build with session IDs in the URL,
how are you supposed to avoid them? Using cookies instead?

MikeBeverley

7:28 pm on Apr 13, 2004 (gmt 0)

Thanks Googleguy, it gets annoying when people keep announcing that you need to mod_rewrite because Google doesn't like dynamic pages.

How do you feel about mod_rewrite as it is aimed at making Googlebot think its a static file when it's not?

ThomasB

9:07 pm on Apr 13, 2004 (gmt 0)

GG, is there a limit of parameters based on PR? Is sth like domain.com/script/?para1=none a problem?

Bones

11:13 pm on Apr 13, 2004 (gmt 0)

"We give equal love to .php and .html, and even .asp. :) We normally dislike "&id=yyy" parameters though because they are normally session IDs. "

...but .cgi pages (without any parameters) are still not crawled, and don't appear to have been for a long time.

MikeBeverley

8:59 am on Apr 14, 2004 (gmt 0)

.cgi pages are crawled, but they have to be linked to.

.cgi & .cgi?param= are in the Google index right now and show on results pages.

The reason you won't see many is because most people have a templated robots.txt file which excludes the cgi-bin or they use the cgi bin for external programming which will not be crawled by Googlebot ... yet ...

cgilvarry

9:36 am on Apr 14, 2004 (gmt 0)

Any suggestions on how to remove session id's from .php without using mod_rewrite to change the URL?

ThreeQuarks

1:21 pm on Apr 14, 2004 (gmt 0)

my widget product pages are of this format, and linked to from my front page:

widget1234.php?widgetshopcode=4321&session=ef25afda4d8b437182ada0d82a60d68e

google is picking them up , no problem - its just dropping the session id variable at the end, which is fine.

if you don't have a session id, my site generates one for you and adjusts all the urls accordingly.

there is no need to use mod_rewrite - one of my new widget product pages was placed around 2 weeks ago, and googlebot has grabbed it. no problems.

rogerd

1:56 pm on Apr 14, 2004 (gmt 0)

Cgilvarry, the sessionIDs are in the PHP code. Usually, the alternative is to store session info in a cookie.

ThreeQuarks, that's interesting - based on your report, it sounds like Google is learning how to distinguish session IDs from real query parameters. Nevertheless, based on past experience, I'd still avoid taking chances by feeding GB session IDs.

pageoneresults

2:02 pm on Apr 14, 2004 (gmt 0)

There is no need to use mod_rewrite.

Yes there is. Think about the user first. Those long URI strings are not friendly at all. They usually break in certain email programs.

In addition to that, you don't want the user bookmarking session IDs or sending those session IDs as links to someone else.

Also, Google is not the only search engine. Other bots are not as smart as Googlebot is and those long query strings are going to be major roadblocks for getting content indexed.

Think outside the G! ;)

elklabone

3:11 pm on Apr 14, 2004 (gmt 0)

You don't need to use mod_rewrite ...

There are many good online tutorials that explain a simple way to make your URLs SE friendly with PHP... sticky me if you want the address...

On another note, it�s also a good idea to turn off PHP sessions (if you use them) for spiders. I�ve found a simple way to do this is by using the following code.

if (preg_match("/Mozilla/i", "$HTTP_USER_AGENT")){

session_start();

}

In other words, sessions will only start if the browser HTTP_USER_AGENT contains �Mozilla� somewhere in the identifier. All versions of Netscape/Mozilla and Internet IE (and even Opera) do this.

py9jmas

3:32 pm on Apr 14, 2004 (gmt 0)

All versions of Netscape/Mozilla and Internet IE (and even Opera) do this.

It depends on the User-Agent setting within Opera. If Opera is set to report itself as Opera (like mine is) the User agent string is something like

Opera/7.23 (X11; Linux i686; U) [es]
or
Opera/7.23 (Windows NT 5.1; U) [en]

Jon.

slade7

4:02 pm on Apr 14, 2004 (gmt 0)

If your php code looks like this:
ht*p://www.yourdomain.com/store/index.php?action=item&substart=0&id=85
Google won't follow it.

uh, yes they will unless your parameters are crazy long.

AthlonInside

4:20 pm on Apr 14, 2004 (gmt 0)

If your php code looks like this:
ht*p://www.yourdomain.com/store/index.php?action=item&substart=0&id=85
Google won't follow it.

uh, yes they will unless your parameters are crazy long.

Actually what happens is they will follow certain amout of pages if they have parameters to prevent them ending in an endless loop.

GoogleGuy

5:51 pm on Apr 14, 2004 (gmt 0)

"Also, Google is not the only search engine. Other bots are not as smart as Googlebot is and those long query strings are going to be major roadblocks for getting content indexed."

Good point, pageoneresults. If you want your site to be crawled by as many search engines as possible, things like static urls always help, and if you can't do that then fewer parameters probably help.

jatar_k

6:07 pm on Apr 14, 2004 (gmt 0)

google has no problems with the php extension, don't think they have since I have been writing the pages, 4 yrs or so.

long get strings are not very spider friendly
long get strings are not very user friendly

spider friendly == user friendly

GET strings are just easy and make life easy for programmers who have trouble understanding the tenets of spiderability or just don't care. Those may be the same people who think "If I build it they will come".

Most of the time GET strings can be avoided or only used on pages you don't care about being spidered. If not, rewrite them, helps your users and helps the spiders.

your choice

Bones

10:57 pm on Apr 14, 2004 (gmt 0)

MikeBeverly, maybe you can sticky me an example search that returns an URL that *ends* in ".cgi" then please (no parameters).