Welcome to WebmasterWorld Guest from

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Can I feed the spiders PHP in disguise?

Will this .htaccess trick affect spidering?

7:28 pm on Nov 27, 2002 (gmt 0)

10+ Year Member

As we all know, whether or not dynamic pages with .php/.asp/.cgi extensions are treated the same as plain .html files by the spidering search engines is a matter of constant debate.

I've recently discovered a very handy method of forcing your server to treat .php files as if they were .html by adding the following line to an .htaccess file:

AddType application/x-httpd-php .php .html

The result is that you can create .html files with embeded PHP scripts and they are parsed in exactly the same way.

What I want to know is will a spidering engine such as Google or Inktomi be able to differenciate between a normal .html page and one that has been through the PHP interpreter. Both appear as plain HTML in the browser, but will this affect spidering?

FYI - none of these pages will have variable=value pairs passed in the URLs such as index.html?prod=23.

8:36 pm on Nov 27, 2002 (gmt 0)

10+ Year Member

It is possible that some header may be sent back with the page that could give it away.

The thing I can think of is that the LastModified Header won't be sent(by default) with your processed pages. That's a decent way to tell if the page was preprocessed or static.

See Are you using If Modified Since? [webmasterworld.com] for some useful info.

11:12 pm on Nov 27, 2002 (gmt 0)

10+ Year Member

Google has no problem indexing a .php page!
10:15 am on Nov 28, 2002 (gmt 0)

10+ Year Member

Thanks for the responses. Most of my pages have .php extensions and so far I've not had any problems getting them indexed.

I was just wondering if some spiders might take 2 identical pages, one with .php and one with .html and give the latter a better ranking.

My other concern was that the spider may somehow detect that it was being served a dynamic page even though it had a .html extension and somehow penalise for it.

The 304 page header info was good, thanks slade.

12:02 pm on Nov 28, 2002 (gmt 0)

10+ Year Member

This method can kill a busy server though, as every .html page will now be passed through PHP.

Most search engines read PHP perfectly fine, since they get the HTML just as a web browser does (ie. View Source is what the search engine spiders see).


1:53 pm on Nov 28, 2002 (gmt 0)

10+ Year Member

You can be quite specific about which pages are parsed by the PHP translator since the .htaccess file only affects files in the same directory.

But I hear what you're saying. Processing true HTML files when they don't contain any PHP is not only pointless but draining on server resources.

Should probably be used with care.


Featured Threads

Hot Threads This Week

Hot Threads This Month