| 1:27 pm on Oct 16, 2002 (gmt 0)|
you can edit your .htaccess file so that HTML pages can be executed as PHP pages
do site search
| 1:43 pm on Oct 16, 2002 (gmt 0)|
Ah, very good knighty! I wasn't even aware you could use php within .html pages. Thank you for the heads up. Now I'm off to add that line in my .htaccess file.
| 1:45 pm on Oct 16, 2002 (gmt 0)|
Have a look at Changing index.htm to index.php [webmasterworld.com] and especially msg #12 [webmasterworld.com] for the code to direct Apache to parse .htm files as php code.
| 3:22 pm on Oct 16, 2002 (gmt 0)|
|I've read around here that people are having no SE problems with php sites. What are your views on this subject? |
Certainly nothing wrong with telling your server to process .html as .php, except some relatively minor overhead in that it will now parse all .html pages looking for PHP directives before serving. Since all my pages have PHP in them, this isn't a big deal for me, but it's something I've at least heard bandied about.
I do know that google has definitely crawled and indexed some of my dynamic pages, based on the fact that my résumé is the number 3 hit for a search on the law firm I used to work at, and my contact info page is the number three hit for my name. No intentional SEO on either one, and they'll never show up without the right query string. The query string itself, however, has been the same for several years and scross three different hosts.
| 3:39 pm on Oct 16, 2002 (gmt 0)|
Well, I added this line to my .htaccess file and changed the order.php to order.html and, imagine that, it worked;)
|AddType application/x-httpd-php .php .php3 .phtml .html |
Thanks for the help. Now I can have my cake and eat it too.
One quick, off topic, question: Do spiders follow form action links or should I use querystring to pass info?
| 3:45 pm on Oct 16, 2002 (gmt 0)|
|Do spiders follow form action links |
Never seen that. They probably donīt, since forms were not intended as navigational elements.
| 4:09 pm on Oct 16, 2002 (gmt 0)|
I believe that there's something in the HTTP spec that says that while GET requests should return the same thing every time, POST requests are allowed to change. If so, then a spider would at least be wasting its time to follow the "action" element of a form that had a "method" of "post". No point in spidering something that's never going to be the same twice. It might not be a total waste to follow a form with a method of 'get' and all hidden inputs, but at the point that you're doing that just writing out the url is easier anyway.
| 4:13 pm on Oct 16, 2002 (gmt 0)|
Thanks dingman. I'll switch to querystring.
| 6:26 pm on Oct 16, 2002 (gmt 0)|
dingman, you are probably referring to the fact that GET is considered to be idempotent [ftp.isi.edu] whereas POST is not.
While this is true, I would not expect spiders to follow the URI given in the action attribute which specifies [t]he program that will handle the completed and submitted form [...]. The receiving program must be able to parse name/value pairs in order to make use of them. ([w3.org ]) Since a spider will know that the program will expect some parameters that will be passed by the form, it will not follow that URI.
This behavior has nothing to do with whether the action URI contains a query string or not. (That the form may be submitted via the GET method which will cause the data to be added to the URL as a query string, is just a coincidence. This fact cannot be used as an argument when deciding whether spiders will follow the URL given in the action attribute. It does not make sense for the spider to reach that decision based on a fact of the underlying transport protocol.) Furthermore this behavior is entirely unrelated to the question of whether a spider will follow a link to an URI that contains a query string (which it will in most cases).
For navigation use links which the spiders will follow, to receive user input use forms which the spiders wonīt follow since they donīt have to contribute valuable user input (which they are very well aware of. NOT really but they wonīt pretend they have something to contribute since SEs have no interest to do that.). Doing this you wonīt have any problems.
| 6:58 pm on Oct 16, 2002 (gmt 0)|
Andreas, my only point with 'get' vs 'post' was that a truly ambitious spider author might possibly decide to submit 'get' method requests if there wasn't any user input in the form. I didn't mean to imply that such a thing was likely - just that 'post' would be even more absurd to try to spider. If one is both going to use 'get' and use no form of input other than hidden, there is no reason to use a form at all, so you might as well use URLs with query strings, which are much more likely to be crawled. Sorry if I wasn't clear about that, and thank you for taking the time to make sure I didn't lead anyone astray.
Frankly, the only reason I can think of to make a spider do such a thing is if its intended purpose is off-line browsing, rather than SE indexing.
|For navigation use links which the spiders will follow, to receive user input use forms which the spiders wonīt follow since they donīt have to contribute valuable user input (which they are very well aware of. NOT really but they wonīt pretend they have something to contribute since SEs have no interest to do that.). Doing this you wonīt have any problems. |
|Furthermore this behavior is entirely unrelated to the question of whether a spider will follow a link to an URI that contains a query string (which it will in most cases) |
In what sorts of cases are spiders likely not to follow such links?
| 7:03 pm on Oct 16, 2002 (gmt 0)|
In my experience (far from gospel) spiders lose interest as the strings get longer. 3 vars in the string seems to hit as where you get into trouble.
We all know for a fact that no strings is always better. So if that is absolute then as we get farther from that model the more likely we are to find problems.
Clean url's unless absolute necessity says otherwise.
| 7:10 pm on Oct 16, 2002 (gmt 0)|
hmm... might not bode well for my tendency to create sites where everything is a querystring on the end of the domain name.
Any idea whether using '/' as a delimiter instead of '&' makes any difference? Should I maybe take this branch of questioning to a more appropriate forum?
| 7:14 pm on Oct 16, 2002 (gmt 0)|
whether using '/' as a delimiter instead of '&' makes any difference
it definitely does. I have worked with sites that had great rankings then moved to a dynamic set up with long, 2 to 7 var, strings in the url. They effectively removed every listing they had.
Just by rewriting them to dir style has solved the spidering problem. It still creates the look of being 2 to 7 levels, or more, down in the directory structure but it does solve the spidering issues.
| 7:18 pm on Oct 16, 2002 (gmt 0)|
I concur with Adam: The more variables in the query string the higher the chance to be ignored by spiders.
And frankly I donīt see any point in having a query string. One RewriteRule takes care of that. If you donīt have mod_rewrite, install it. If your hosting company does not have it, switch hosts. If you are running IIS you seem to have the money to buy a IIS rewriting engine.
dingman, I didnīt get that you were only talking about the truly ambitious spider author. ;)
| 7:29 pm on Oct 16, 2002 (gmt 0)|
And here I thought I was playing with delimiters just for the sheer fun of it ;)
|If you donīt have mod_rewrite, install it. If your hosting company does not have it, switch hosts. |
I am my host, I have mod_rewrite installed, I just don't know how to use it :) Franly, I'm so used to the control I get running my own servers that I don't think I could stand to switch to any form of hosting other than colocation.
How does one use rewrite rules to eliminate query strings?
| 7:31 pm on Oct 16, 2002 (gmt 0)|
there are a ton of threads around about mod rewrite. I think most of them are in, either, technology or general.