Forum Moderators: open

Message Too Old, No Replies

If I Use PHP Will Google Still Like Me? Part 2

         

trebor

4:28 pm on Mar 28, 2002 (gmt 0)

10+ Year Member



Continued From: [webmasterworld.com...]


Here are a few tips:

1) The advice about rewriting urls with ? in them is sound. Only use ? on pages you don't want spiders to mess around in. And only use rewriting for pages you DO want the spiders to see. WebmasterWorld should use it, for example!

2) Hide all access to dynamic content you don't want spidered behind POST-based forms. The classic example is shopping carts. You don't want googlebot or anyone else browsing your entire site and buying everything!

3) Avoid what I call "Stupid ASP tricks". The classic one is a page redirecting to itself to set a cookie. Or using javascript for essential site functions. These confuse spiders.

4) A huge mistake that a lot of sites make is setting an individual context as soon as they get a visitor (ie: redirecting to a URL with ?id=whatever). This is a variant of the cookie stupid ASP trick. Sites should only establish an individual context for a user when they absolutely need to (eg: when the user add the first item to their shopping cart).

Every part of your site that you want spiders to see ought to be viewable with a minimally configured browser. No java, no scripting, no DHTML.

Best
R

rpking

2:21 pm on Mar 29, 2002 (gmt 0)

10+ Year Member



Is there a quick and easy test to see whether or not mod_rewrite is turned on? My host states that it is but I can't get gethan's example to work. I just get a 404 returned for the static URL request.

Brett_Tabke

7:50 am on Mar 31, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Gethan, I'm going to try it without the Webmaster World in titles for awhile and see what happens (it really messed with the site search engine on kw's with "webmaster" in the searches). Or if you meant, just that we take time to make sure titles reflect the post content, then ya - we do that all the time now - too many posts not too.

Kapow, data is deprecated. I don't think Google has as much of a problem with php or any dynamic url as they used too. (just keep them low key without too many parameters)

trebor, I can see your point. If it were just search engines I was concerned about, I'd agree with you 100%. However, rouge spiders are the worst. The ? random urls with browser cache busting random strings has the effect of detouring the rogues. It's also why only about 50% of the site is accessible by nonauthorized spiders (ya, I cloak out of self preservation).

DenRomano

12:50 am on Apr 4, 2002 (gmt 0)

10+ Year Member



I use PHP but I have stayed away from parsing the URL with "/" as I thought it seemed that google decreased the page rank for each sub-directory. If this is true would it be better to

[domain.com...]

or

[domain.com...]

Would google not give me a 1 pagerank decrease for each directory deep is if I let it find the "default" index.html program?

gethan

1:37 pm on Apr 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



DenRomano: Welcome to the boards. The effect you see of Google decreasing page rank one point per sub dir is actually not quite as simple as that. Where Google doesn't know of any links to a particular page, the googlebar returns an estimate for the page rank, that estimate is basically minus one point per level of depth - based on the PR of the domains home page.

Where links are known google calculates the PR based in a more complex way. (based on the number of links and the PR of the pages on which they appear) - its possible to have sub-directories with much higher PR's than the homepage.

Brett: Yep, and the side effect is that your site search works better and search engines index your pages appropriately. All because you've taken the time to get the information before returning the headers. There are so many forums where the title is "powered by myforum" for every post - in fact I'm about to customise a forum to get around these limitations very soon.

rpking: Put a post in the serverside or website technologies forums and we'll see if we can get to the bottom of the mod_rewrite prob.

Mittoni

12:41 am on Apr 8, 2002 (gmt 0)



This is a great thread!

My concern (and that of my server admin) is that mod-rewrite is a security risk since it requires FollowSymlinks?

Are there ways of overcoming security risks for mod-rewrite, this method seems to be the most elegant way of serving php with queries?

In our case it provided a doorway for hackers to fiddle with the website and database.

DenRomano

1:55 am on Apr 8, 2002 (gmt 0)

10+ Year Member



I do not use mod rewrite at all. Set up a 404 error page that is a PHP program that parses the URL and "include" the correct file

ahmad

7:19 pm on Apr 9, 2002 (gmt 0)

10+ Year Member



So, doesn't it show the 404 page URL instead of the real one by setting up a 404 page?

Say I want to redirect foobar.com/blah to foobar.com/index.html?dir=blah, and i forward everything to foobar.com/redirect, what's being shown in the browser's address bar when user types foobar.com/blah?

DenRomano

9:40 pm on Apr 9, 2002 (gmt 0)

10+ Year Member



The URL in the browser bar shows whatever page was requested by the brower it never knows a "server" served a different page.

ahmad

10:09 pm on Apr 9, 2002 (gmt 0)

10+ Year Member



I think there's an exception to your statement, Denromano:

If the error page is defined like [anotherdomain.com,...] the browser shows the new address.

gethan

9:04 pm on Apr 10, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mittoni: Welcome to the boards,

I tried to follow up your query on FollowSymlinks being a security risk but was unable to find anything newer than 1998? I would hope that any holes would have been fixed by now 4 years!

If it is still a risk then I'd be interested in knowing more.

DenRomano and Ahmad: The discussion here provides some additional ways of providing mod_rewrite style functionality without mod_rewrite - [webmasterworld.com...]

One of them is the custom 404 script - I used to use that method ;)

alxdean

11:12 pm on Jun 25, 2002 (gmt 0)

10+ Year Member



Not sure if anybody is going to pick this one up, as this topic seems to be done and chewed by now, but still I have a question to you all out there.

I am designing a website, which has pure dynamic content. The site has an online input interface for editors to edit content with 0% html knowledge and each page can change from one second to the next. Thus I had to build in a way of ensuring that the user sees the newest version of the page possible and not a cached copy from his/her hard disk. I.e. integrated a datetimestamp into each link to ensure that no page would be cached. In some cases i even added ?timestamp to img tags to ensure the right picture would show. I have based my server side scripting on ASP. yes. ASP. don't go cursing me. it has done the trick for me so far.

But this also means that my URLs are absolutely spider unfriendly.

Now has anybody actually come across this problem. each page must not be cached on the users machine under no circumstance. must be reloaded from the server. but want to be spider friendly as well?
And if so, found an answer except for putting hidden links into a div style=display:none tag to allow spiders to crawl without timestamps.

korkus2000

11:18 pm on Jun 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld alxdean

I eliminate browser and server caching with this code at the top of my asp pages.
<%
Response.Buffer = True
'Eliminate browser and proxy caching
Response.Expires = -1
Response.Expiresabsolute = Now() - 2
Response.AddHeader "pragma","no-cache"
Response.AddHeader "cache-control","private"
Response.CacheControl = "no-cache"
%>

this way you won't have to use strange query string links.

there is also an excellent asp forum with some really knowledgable people. All nt related stuff is discussed including ASP 3.0 and .Net
[webmasterworld.com...]

alxdean

1:39 am on Jun 26, 2002 (gmt 0)

10+ Year Member



korkus 2000, you're a genious.
And many many happy thanks and compliments to whoever created this message board. I have never in my 5 years of internet experience seen a response this fast on a closed techie topic.

tried the response.expires one before the expires absolute one too, but did not quite work on all browsers or platforms. did not try your combination of buffering, expires , expiresabsolute, cache control and header modification. I can see that you have had many sleepless nights over this problem too.

Should this work, I will credit you in every page I will use this combination. promise. Because you have just saved me about 50 sleepless nights. Wish I would have figured this out earlier, as by now I must have added time stamps to about 500 links in my website.

feedmenow

7:12 am on Sep 9, 2002 (gmt 0)



I hear what several people have said about google being okay with php. But I think that might only be true if you have a *mix* of php and html pages.

Case in point: my site (virginiaartists.net) started out as all html, evolved to a few php pages, and recently went to all php. That's right, every stinkin page. (Why? I like include and mysql.)

As each step of the switch from html to php extensions went online, I noticed the number of indexed pages dropping. At first I thought they didn't like something else about my site (who knows?). Now that all the pages have been converted, I have dropped from 120+ indexed pages to 1. Just the home page. That's it.

Now I'm going to try the AddType trick to parse html as php, and change all the extensions, again! Betcha anything the pages get indexed around the 20th...

BTW: Does anyone make a tool that will evaluate a site for Google niceness?

darex

4:00 pm on Nov 29, 2002 (gmt 0)

10+ Year Member



Does anybody know if it is possible to replicate this masking of php query strings on W2K and IIS (for php running on IIS)?