homepage Welcome to WebmasterWorld Guest from 54.242.126.9
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Remove php files from Google Index
How to remove php files from Google index
eLeads123

5+ Year Member



 
Msg#: 3006245 posted 10:51 pm on Jul 13, 2006 (gmt 0)

Hi guys - I am getting concerned and need some immediate help.
I use a PHP redirect script for affiliate links on my website
(for example, example.com/redirect.php?1) that redirects to merchants. Obviously, I do not want these pages indexed, but when I look up the search results from my site in Google, they appear. In my robots.txt file, I have specified that php files should not be indexed, but yet they appear in Google's index. Does anyone know how I can go about fixing this issue?

 

thecoalman

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3006245 posted 6:48 am on Jul 14, 2006 (gmt 0)

I have a similar problem, I've excluded entire directories and 3 weeks later they are still there. A few of these results haven't even existed for months... good to know if it was sensitive information I could easily get it removed promptly...

aravindgp

10+ Year Member



 
Msg#: 3006245 posted 6:55 am on Jul 14, 2006 (gmt 0)

Use remove-url , this is quite boring and painfull solution but it worked for me when I was working for my previous firm.We removed some 2000 urls.

arubicus

10+ Year Member



 
Msg#: 3006245 posted 7:04 am on Jul 14, 2006 (gmt 0)

"I use a PHP redirect script for affiliate links on my website
(for example, example.com/redirect.php?1) that redirects to merchants. Obviously, I do not want these pages indexed, but when I look up the search results from my site in Google, they appear. In my robots.txt file, I have specified that php files should not be indexed, but yet they appear in Google's index. Does anyone know how I can go about fixing this issue? "

Had some problems with these a while back. Are you using a 301/302 redirect? Are they appearing as url only or are they appearing fully indexed? If they are fully indexed then what are they indexed with? A year ago I seen 301 tracking redirect urls indexed with content of the page the redirect was on. Just a funny thing I seen happened. If they are URL only then more than likely it shows that google knows about the url since it is placed on a public page and they simply do nothing with them.

There are some form pages that I use for discussion posts that use the same script and url except for the query strings. I just put the forms in a seperate directory and disallow the whole directory plus disallow on page to be safe. Works real well. There are many variations due to query strings and google knows about those urls since they are on public pages (they show in site-maps and one version in the site: command) but does not index any contents of those pages.

You could also be safe and put a rel="nofollow" in the href tags containing the links. Keep in mind that google will still know of the url but again should do nothing with it and if any were displayed they should be url only.

topr8

WebmasterWorld Senior Member topr8 us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3006245 posted 7:14 am on Jul 14, 2006 (gmt 0)

>>In my robots.txt file, I have specified that php files should not be indexed

if i understand you properly, you can't do this.

daveVk

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3006245 posted 7:19 am on Jul 14, 2006 (gmt 0)

Adding rel=nofollow attribute, to A links may help but google seems to ignore that at times.

If url structure is example.com/redirect.php?N N being some number I would hide link using javascript eg href="javascript:redir(N);" where the "example.com/redirect.php?" is encoded into the javascript function "redir". This should prevent reoccurance.

arubicus

10+ Year Member



 
Msg#: 3006245 posted 8:08 am on Jul 14, 2006 (gmt 0)

The rel=nofollow attribute is just a bit of added security -- a "just in case" type thing.

Javascript can work just keep in mind that any url or partial url can be crawled or attempted to be crawled by google within the javascript.

"example.com/redirect.php?" within the javascript can be attempted to be crawled by google. If 100 pages have the script then 100 pages are "linking" (despite being out of href tags) to this page even if the page is active or not. Something to keep in mind as I know for a fact that google has done this.

daveVk

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3006245 posted 11:36 am on Jul 14, 2006 (gmt 0)

"example.com/redirect.php?" within the javascript can be attempted to be crawled by google

If this an issue, use some simple encoding of url or place javascript in external file.

eLeads123

5+ Year Member



 
Msg#: 3006245 posted 4:27 pm on Jul 14, 2006 (gmt 0)

The pages are indeed indexed as URLs only - does this make it less of a worry? And I was under the impression that using the line "disallow: /*.php$" in my robots.txt file would prevent them from indexing php files... was I wrong?

leadegroot

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3006245 posted 10:59 pm on Jul 14, 2006 (gmt 0)

Once a page is indexed, adding an entry to the robots.txt to cover it will not delete it, although it should eventually be moved to the supplemental index.
You have to use the url removal tool to get it out of the index once its in. :(

daveVk

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3006245 posted 2:19 am on Jul 15, 2006 (gmt 0)

using the line "disallow: /*.php$" in my robots.txt file would prevent them from indexing php files.

[robotstxt.org...] "*" and "$" do not any special meaning within the context of "disallow" as far as I can make out. Does anyone have a better reference?

leadegroot

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3006245 posted 12:53 pm on Jul 19, 2006 (gmt 0)

Google is non-standard and recognises the * operator

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3006245 posted 5:38 pm on Jul 19, 2006 (gmt 0)

You can use the * in the disallow statement only in the section where the user-agent is "Googlebot".

daveVk

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3006245 posted 1:58 am on Jul 20, 2006 (gmt 0)

And has the "$" any special meaning?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3006245 posted 5:59 pm on Jul 20, 2006 (gmt 0)

It does, but I can't remember what it is right now.

Google's helpfiles have more on this..

FireBrigade

5+ Year Member



 
Msg#: 3006245 posted 6:27 pm on Jul 20, 2006 (gmt 0)

The $ marks the end of the URL, so
/*.php$
means everyting that ends with php

eLeads123

5+ Year Member



 
Msg#: 3006245 posted 11:34 pm on Jul 20, 2006 (gmt 0)

So does this mean that if I don't want Google to index files such as name.com/file.php?12 i should use "disallow: /*.php" instead of "disallow: /*.php$"?

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3006245 posted 11:40 pm on Jul 20, 2006 (gmt 0)

Yes.

.

Don't forget to place the User-agent: Googlebot line immediately above the disallow too.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved