Forum Moderators: phranque
i think you can disable this default behavior within your apache config:
Option -Indexes give it a try. to "fix" the already existant problem try with robots.txt for that url and maybe something with modrewrite. or just put a index.html into that directory, that should do the job.
is there custom coding involved with preventing activity like www.webmasterworld.com/?testfor404 from producing a 200/ok response?
going on the fact i am able to get to the main page of webmasterworld by trying that, i would think it's safe to ignore this problem?
i have spotted yahoo looking for urls like this... but if everyone (with php) experiences the same type of response, why would yahoo even bother trying to see the type of response any page would give?
also... say portions of your site are dynamic in this scenario:
- 10 products per page
- 100 products in stock
= that translates into 10 different pages.
*these pages are dynamic: index.php?page=1 and index.php?page=2 and so on..
*you sell 50 products, you are down to 5 pages.
*google, msn and yahoo are still hitting index.php?page=10 and getting a 200/ok response even though you don't have enough products to list past page 5
...what can be done to thwart the 200/ok responses on these types of pages? is it possible to custom-code into your website backend, or are you doomed for having the blasted little '?' in the urls?
A query string such as "?S=A" is not part of a URL, and does not identify a specific resource in the context of HTTP and Apache server. Rather, it is a string of data attached to a URL, to be passed to the resource *at* the given URL -- in your example, your script at "/" -- the default index page of your site.
As such, only the base URL "/" can be checked for existence or non-existence by the server. If you want to check whether the script-generated 'page' identified by "S=" exists, then the script itself must perform that checking function.
You *can* use mod_rewrite code in .htaccess to return a 410-Gone response based on the query string, but that is a high-maintenance and error-prone approach. If you elect to follow this path (not recommended), see mod_rewrite's RewriteCond directive, which can be used to check the requested %{QUERY_STRING} available as a server variable.
The problem is that from your description, it sounds like your query-string-based pages come and go frequently. Since it can often take a search engine many months (or even a couple of years) to finally recognize a 410-Gone or 404-Not Found response and stop asking for the obsolete page, your list of removed pages might grow very large, making maintenance difficult. (You would have a big problem deciding how long to leave each removed-page 410 in place; If you removed the 410 too soon and a search engine spider re-requested that page, then you'd have to add the code for that page back in, and start the clock for that removed entry all over again.)
I suggest that you modify the script to check the database to see if it can generate the requested page, and if not, return a 410-Gone response along with a custom error page that the visitor can use to find a similar product. (Explain that the product is no longer available and provide links on that 410 page pointing to your site map, home page, and product selector, for example.)
Jim