Welcome to WebmasterWorld Guest from 54.234.129.215

Message Too Old, No Replies

Remove php files from Google Index

How to remove php files from Google index

     

eLeads123

10:51 pm on Jul 13, 2006 (gmt 0)

10+ Year Member



Hi guys - I am getting concerned and need some immediate help.
I use a PHP redirect script for affiliate links on my website
(for example, example.com/redirect.php?1) that redirects to merchants. Obviously, I do not want these pages indexed, but when I look up the search results from my site in Google, they appear. In my robots.txt file, I have specified that php files should not be indexed, but yet they appear in Google's index. Does anyone know how I can go about fixing this issue?

thecoalman

6:48 am on Jul 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a similar problem, I've excluded entire directories and 3 weeks later they are still there. A few of these results haven't even existed for months... good to know if it was sensitive information I could easily get it removed promptly...

aravindgp

6:55 am on Jul 14, 2006 (gmt 0)

10+ Year Member



Use remove-url , this is quite boring and painfull solution but it worked for me when I was working for my previous firm.We removed some 2000 urls.

arubicus

7:04 am on Jul 14, 2006 (gmt 0)

10+ Year Member



"I use a PHP redirect script for affiliate links on my website
(for example, example.com/redirect.php?1) that redirects to merchants. Obviously, I do not want these pages indexed, but when I look up the search results from my site in Google, they appear. In my robots.txt file, I have specified that php files should not be indexed, but yet they appear in Google's index. Does anyone know how I can go about fixing this issue? "

Had some problems with these a while back. Are you using a 301/302 redirect? Are they appearing as url only or are they appearing fully indexed? If they are fully indexed then what are they indexed with? A year ago I seen 301 tracking redirect urls indexed with content of the page the redirect was on. Just a funny thing I seen happened. If they are URL only then more than likely it shows that google knows about the url since it is placed on a public page and they simply do nothing with them.

There are some form pages that I use for discussion posts that use the same script and url except for the query strings. I just put the forms in a seperate directory and disallow the whole directory plus disallow on page to be safe. Works real well. There are many variations due to query strings and google knows about those urls since they are on public pages (they show in site-maps and one version in the site: command) but does not index any contents of those pages.

You could also be safe and put a rel="nofollow" in the href tags containing the links. Keep in mind that google will still know of the url but again should do nothing with it and if any were displayed they should be url only.

topr8

7:14 am on Jul 14, 2006 (gmt 0)

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



>>In my robots.txt file, I have specified that php files should not be indexed

if i understand you properly, you can't do this.

daveVk

7:19 am on Jul 14, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Adding rel=nofollow attribute, to A links may help but google seems to ignore that at times.

If url structure is example.com/redirect.php?N N being some number I would hide link using javascript eg href="javascript:redir(N);" where the "example.com/redirect.php?" is encoded into the javascript function "redir". This should prevent reoccurance.

arubicus

8:08 am on Jul 14, 2006 (gmt 0)

10+ Year Member



The rel=nofollow attribute is just a bit of added security -- a "just in case" type thing.

Javascript can work just keep in mind that any url or partial url can be crawled or attempted to be crawled by google within the javascript.

"example.com/redirect.php?" within the javascript can be attempted to be crawled by google. If 100 pages have the script then 100 pages are "linking" (despite being out of href tags) to this page even if the page is active or not. Something to keep in mind as I know for a fact that google has done this.

daveVk

11:36 am on Jul 14, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



"example.com/redirect.php?" within the javascript can be attempted to be crawled by google

If this an issue, use some simple encoding of url or place javascript in external file.

eLeads123

4:27 pm on Jul 14, 2006 (gmt 0)

10+ Year Member



The pages are indeed indexed as URLs only - does this make it less of a worry? And I was under the impression that using the line "disallow: /*.php$" in my robots.txt file would prevent them from indexing php files... was I wrong?

leadegroot

10:59 pm on Jul 14, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Once a page is indexed, adding an entry to the robots.txt to cover it will not delete it, although it should eventually be moved to the supplemental index.
You have to use the url removal tool to get it out of the index once its in. :(

daveVk

2:19 am on Jul 15, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



using the line "disallow: /*.php$" in my robots.txt file would prevent them from indexing php files.

[robotstxt.org...] "*" and "$" do not any special meaning within the context of "disallow" as far as I can make out. Does anyone have a better reference?

leadegroot

12:53 pm on Jul 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google is non-standard and recognises the * operator

g1smd

5:38 pm on Jul 19, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



You can use the * in the disallow statement only in the section where the user-agent is "Googlebot".

daveVk

1:58 am on Jul 20, 2006 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



And has the "$" any special meaning?

g1smd

5:59 pm on Jul 20, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It does, but I can't remember what it is right now.

Google's helpfiles have more on this..

FireBrigade

6:27 pm on Jul 20, 2006 (gmt 0)

5+ Year Member



The $ marks the end of the URL, so
/*.php$
means everyting that ends with php

eLeads123

11:34 pm on Jul 20, 2006 (gmt 0)

10+ Year Member



So does this mean that if I don't want Google to index files such as name.com/file.php?12 i should use "disallow: /*.php" instead of "disallow: /*.php$"?

g1smd

11:40 pm on Jul 20, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes.

.

Don't forget to place the User-agent: Googlebot line immediately above the disallow too.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month