Welcome to WebmasterWorld Guest from 184.72.177.182

Message Too Old, No Replies

Remove php files from Google Index

How to remove php files from Google index

     
10:51 pm on Jul 13, 2006 (gmt 0)

New User

10+ Year Member

joined:June 16, 2005
posts:5
votes: 0


Hi guys - I am getting concerned and need some immediate help.
I use a PHP redirect script for affiliate links on my website
(for example, example.com/redirect.php?1) that redirects to merchants. Obviously, I do not want these pages indexed, but when I look up the search results from my site in Google, they appear. In my robots.txt file, I have specified that php files should not be indexed, but yet they appear in Google's index. Does anyone know how I can go about fixing this issue?
6:48 am on July 14, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 4, 2004
posts:877
votes: 0


I have a similar problem, I've excluded entire directories and 3 weeks later they are still there. A few of these results haven't even existed for months... good to know if it was sensitive information I could easily get it removed promptly...
6:55 am on July 14, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Mar 4, 2003
posts:274
votes: 0


Use remove-url , this is quite boring and painfull solution but it worked for me when I was working for my previous firm.We removed some 2000 urls.
7:04 am on July 14, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 27, 2003
posts:570
votes: 0


"I use a PHP redirect script for affiliate links on my website
(for example, example.com/redirect.php?1) that redirects to merchants. Obviously, I do not want these pages indexed, but when I look up the search results from my site in Google, they appear. In my robots.txt file, I have specified that php files should not be indexed, but yet they appear in Google's index. Does anyone know how I can go about fixing this issue? "

Had some problems with these a while back. Are you using a 301/302 redirect? Are they appearing as url only or are they appearing fully indexed? If they are fully indexed then what are they indexed with? A year ago I seen 301 tracking redirect urls indexed with content of the page the redirect was on. Just a funny thing I seen happened. If they are URL only then more than likely it shows that google knows about the url since it is placed on a public page and they simply do nothing with them.

There are some form pages that I use for discussion posts that use the same script and url except for the query strings. I just put the forms in a seperate directory and disallow the whole directory plus disallow on page to be safe. Works real well. There are many variations due to query strings and google knows about those urls since they are on public pages (they show in site-maps and one version in the site: command) but does not index any contents of those pages.

You could also be safe and put a rel="nofollow" in the href tags containing the links. Keep in mind that google will still know of the url but again should do nothing with it and if any were displayed they should be url only.

7:14 am on July 14, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 19, 2002
posts:3171
votes: 8


>>In my robots.txt file, I have specified that php files should not be indexed

if i understand you properly, you can't do this.

7:19 am on July 14, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 3, 2005
posts:1585
votes: 0


Adding rel=nofollow attribute, to A links may help but google seems to ignore that at times.

If url structure is example.com/redirect.php?N N being some number I would hide link using javascript eg href="javascript:redir(N);" where the "example.com/redirect.php?" is encoded into the javascript function "redir". This should prevent reoccurance.

8:08 am on July 14, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 27, 2003
posts:570
votes: 0


The rel=nofollow attribute is just a bit of added security -- a "just in case" type thing.

Javascript can work just keep in mind that any url or partial url can be crawled or attempted to be crawled by google within the javascript.

"example.com/redirect.php?" within the javascript can be attempted to be crawled by google. If 100 pages have the script then 100 pages are "linking" (despite being out of href tags) to this page even if the page is active or not. Something to keep in mind as I know for a fact that google has done this.

11:36 am on July 14, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 3, 2005
posts:1585
votes: 0


"example.com/redirect.php?" within the javascript can be attempted to be crawled by google

If this an issue, use some simple encoding of url or place javascript in external file.

4:27 pm on July 14, 2006 (gmt 0)

New User

10+ Year Member

joined:June 16, 2005
posts:5
votes: 0


The pages are indeed indexed as URLs only - does this make it less of a worry? And I was under the impression that using the line "disallow: /*.php$" in my robots.txt file would prevent them from indexing php files... was I wrong?
10:59 pm on July 14, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 27, 2003
posts:1642
votes: 0


Once a page is indexed, adding an entry to the robots.txt to cover it will not delete it, although it should eventually be moved to the supplemental index.
You have to use the url removal tool to get it out of the index once its in. :(
2:19 am on July 15, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 3, 2005
posts:1585
votes: 0


using the line "disallow: /*.php$" in my robots.txt file would prevent them from indexing php files.

[robotstxt.org...] "*" and "$" do not any special meaning within the context of "disallow" as far as I can make out. Does anyone have a better reference?
12:53 pm on July 19, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 27, 2003
posts:1642
votes: 0


Google is non-standard and recognises the * operator
5:38 pm on July 19, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


You can use the * in the disallow statement only in the section where the user-agent is "Googlebot".
1:58 am on July 20, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 3, 2005
posts:1585
votes: 0


And has the "$" any special meaning?
5:59 pm on July 20, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


It does, but I can't remember what it is right now.

Google's helpfiles have more on this..

6:27 pm on July 20, 2006 (gmt 0)

New User

5+ Year Member

joined:Mar 25, 2006
posts:7
votes: 0


The $ marks the end of the URL, so
/*.php$
means everyting that ends with php
11:34 pm on July 20, 2006 (gmt 0)

New User

10+ Year Member

joined:June 16, 2005
posts:5
votes: 0


So does this mean that if I don't want Google to index files such as name.com/file.php?12 i should use "disallow: /*.php" instead of "disallow: /*.php$"?
11:40 pm on July 20, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Yes.

.

Don't forget to place the User-agent: Googlebot line immediately above the disallow too.