Forum Moderators: coopster
If you have a simple url like http://www.example.com/myphppage?id=12 then it is probably OK. It is possible to use mod_rewrite in Apache to have this URL show up as http://www.example.com/myphppage/id/12 to avoid any possibility of search engines not being able to index your link but, unless your URLs are complex (i.e. pass many variables at a time) I don't think this is necessary these days.
On the other hand, look at this very forum. I'll almost guarantee that Brett does not keep each thread in a seperate file so the link;
[webmasterworld.com...]
is most likely a mod_rewrite of;
[webmasterworld.com?forum=88&thread=8501...]
or something along those lines.
If it's good enough for WebmasterWorld it's probably the right way for you to jump too.
HTH,
BAD
[edited by: jatar_k at 9:09 pm (utc) on June 15, 2005]
[edit reason] examplified [/edit]
When you access a file in the format of file.php/var/val, there is a $_SERVER['PATH_INFO'] entry made in PHP to identify this path. All you need to do is split it up and store in an array with key/value pairs. You can then merge this into your GET SGA and voila, it's like mod rewrite, only it doesn't need specific rules for it :)
This used to be true more than it is today.If you have a simple url like http://www.example.com/myphppage?id=12 then it is probably OK. It is possible to use mod_rewrite in Apache to have this URL show up as http://www.example.com/myphppage/id/12 to avoid any possibility of search engines not being able to index your link but, unless your URLs are complex (i.e. pass many variables at a time) I don't think this is necessary these days.
Can someone give a brief explanation on how to do this. I started looking into this and found this thread on WW that says you cannot rewrite the url in the way described above.
[webmasterworld.com...]
Thanks
[edited by: jatar_k at 9:09 pm (utc) on June 15, 2005]
[edit reason] examplified [/edit]
Don't use "&id=" as a parameter in your URLs, as we don't include these pages in our index.
If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.
</added>
I've seen on this forum that it is possible to use mod_rewrite to change a URL from say http://www.example.com/index.html?var=bluewidget
into:
http://www.example.com/bluewidget/
BUT, i've also read on this forum that mod_rewrite can't rewrite a URL that doesn't exist. So if I wanted to do this then www.mysite.com/bluewidget/index.html would still have to exist as a file.
The two explanations I've read as you can see very much contradict each other. So I'm at a bit of a loss as to whether this can or cannot be done. I'll be researching it myself as soon as I find time, but until then I'll hold out the hope that someone here will help me out:)
[edited by: jatar_k at 9:07 pm (utc) on June 15, 2005]
[edit reason] changed to example.com [/edit]
Another different question I've: if urls are passing session data, how bad it is and how to avoid session variables populating in urls, without changing application code. ;)
anshul: how a page is refreshed depends on the server headers. This is a part of the page you can't see just by viewing source - google 'live http headers' and you'll find a number of sites that let you see the http headers of whatever page you want, the firefox webdev toolbar also has this facility. Very important when you get further into web development.
Normally, apache does a great job returning info on .html pages without you knowing anything about all the hard work it's doing. It sends a bit of data out in an E_TAG' header, and sends out the date the page was last modified. When you want to look at the page again that's in your browser's cache, your browser sends back either this E_TAG info, or the last-modified date. Apache looks at this, and can tell if the page has been modified or not. If not, it just sends back a tiny bit of info, 'no, your cache is still valid' - this isn't a whole HTML page, so it happens real nice and fast.
PHP won't give you any of this information unless you ask it to. Since the browser doesn't have any information on how long the page stays fresh, it just asks for the whole page again, and PHP serves up the whole entire page, again. You can add smart cache-headers to your scripts, but this takes some thinking. There are also drop-in cache options like jpcache.
If you use mod_rewrite to make your .php pages look like .htm pages, or make PHP parse .htm pages, this situation doesn't change - the .htm page is still served by PHP, and won't give any significant cache headers. This can be good or bad, depending on the page - if your pages don't change much, it's bad, if they're always updated with fresh stuff, it's good.
Search engines are getting much better at indexing pages with parameters.
Session id's: see the ini settings session.use_cookies and session.use_only_cookies .
You can put cache control into .php pages so that they will return a 304 return code. However many search bots don't even seem to send the If-Modified-Since in the HTTP headers, but then some don't even send that when getting .htm files.
Googleguy said it would help with Googlebot, however the first page I tried testing it on it didn't work with Googlebot, however I now suspect that was because it is my main page which is normally reached via a 301 redirect. I'm trying on another page now, but results aren't yet in ;-)