Welcome to WebmasterWorld Guest from 23.22.79.235

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Can I feed the spiders PHP in disguise?

Will this .htaccess trick affect spidering?

   
7:28 pm on Nov 27, 2002 (gmt 0)

10+ Year Member



As we all know, whether or not dynamic pages with .php/.asp/.cgi extensions are treated the same as plain .html files by the spidering search engines is a matter of constant debate.

I've recently discovered a very handy method of forcing your server to treat .php files as if they were .html by adding the following line to an .htaccess file:

AddType application/x-httpd-php .php .html

The result is that you can create .html files with embeded PHP scripts and they are parsed in exactly the same way.

What I want to know is will a spidering engine such as Google or Inktomi be able to differenciate between a normal .html page and one that has been through the PHP interpreter. Both appear as plain HTML in the browser, but will this affect spidering?

FYI - none of these pages will have variable=value pairs passed in the URLs such as index.html?prod=23.

8:36 pm on Nov 27, 2002 (gmt 0)

10+ Year Member



It is possible that some header may be sent back with the page that could give it away.

The thing I can think of is that the LastModified Header won't be sent(by default) with your processed pages. That's a decent way to tell if the page was preprocessed or static.

See Are you using If Modified Since? [webmasterworld.com] for some useful info.

11:12 pm on Nov 27, 2002 (gmt 0)

10+ Year Member



Google has no problem indexing a .php page!
10:15 am on Nov 28, 2002 (gmt 0)

10+ Year Member



Thanks for the responses. Most of my pages have .php extensions and so far I've not had any problems getting them indexed.

I was just wondering if some spiders might take 2 identical pages, one with .php and one with .html and give the latter a better ranking.

My other concern was that the spider may somehow detect that it was being served a dynamic page even though it had a .html extension and somehow penalise for it.

The 304 page header info was good, thanks slade.

12:02 pm on Nov 28, 2002 (gmt 0)

10+ Year Member



This method can kill a busy server though, as every .html page will now be passed through PHP.

Most search engines read PHP perfectly fine, since they get the HTML just as a web browser does (ie. View Source is what the search engine spiders see).

Allen

1:53 pm on Nov 28, 2002 (gmt 0)

10+ Year Member



You can be quite specific about which pages are parsed by the PHP translator since the .htaccess file only affects files in the same directory.

But I hear what you're saying. Processing true HTML files when they don't contain any PHP is not only pointless but draining on server resources.

Should probably be used with care.