| 6:48 am on Jul 14, 2006 (gmt 0)|
I have a similar problem, I've excluded entire directories and 3 weeks later they are still there. A few of these results haven't even existed for months... good to know if it was sensitive information I could easily get it removed promptly...
| 6:55 am on Jul 14, 2006 (gmt 0)|
Use remove-url , this is quite boring and painfull solution but it worked for me when I was working for my previous firm.We removed some 2000 urls.
| 7:04 am on Jul 14, 2006 (gmt 0)|
"I use a PHP redirect script for affiliate links on my website
(for example, example.com/redirect.php?1) that redirects to merchants. Obviously, I do not want these pages indexed, but when I look up the search results from my site in Google, they appear. In my robots.txt file, I have specified that php files should not be indexed, but yet they appear in Google's index. Does anyone know how I can go about fixing this issue? "
Had some problems with these a while back. Are you using a 301/302 redirect? Are they appearing as url only or are they appearing fully indexed? If they are fully indexed then what are they indexed with? A year ago I seen 301 tracking redirect urls indexed with content of the page the redirect was on. Just a funny thing I seen happened. If they are URL only then more than likely it shows that google knows about the url since it is placed on a public page and they simply do nothing with them.
There are some form pages that I use for discussion posts that use the same script and url except for the query strings. I just put the forms in a seperate directory and disallow the whole directory plus disallow on page to be safe. Works real well. There are many variations due to query strings and google knows about those urls since they are on public pages (they show in site-maps and one version in the site: command) but does not index any contents of those pages.
You could also be safe and put a rel="nofollow" in the href tags containing the links. Keep in mind that google will still know of the url but again should do nothing with it and if any were displayed they should be url only.
| 7:14 am on Jul 14, 2006 (gmt 0)|
>>In my robots.txt file, I have specified that php files should not be indexed
if i understand you properly, you can't do this.
| 7:19 am on Jul 14, 2006 (gmt 0)|
Adding rel=nofollow attribute, to A links may help but google seems to ignore that at times.
| 8:08 am on Jul 14, 2006 (gmt 0)|
The rel=nofollow attribute is just a bit of added security -- a "just in case" type thing.
| 11:36 am on Jul 14, 2006 (gmt 0)|
| 4:27 pm on Jul 14, 2006 (gmt 0)|
The pages are indeed indexed as URLs only - does this make it less of a worry? And I was under the impression that using the line "disallow: /*.php$" in my robots.txt file would prevent them from indexing php files... was I wrong?
| 10:59 pm on Jul 14, 2006 (gmt 0)|
Once a page is indexed, adding an entry to the robots.txt to cover it will not delete it, although it should eventually be moved to the supplemental index.
You have to use the url removal tool to get it out of the index once its in. :(
| 2:19 am on Jul 15, 2006 (gmt 0)|
|using the line "disallow: /*.php$" in my robots.txt file would prevent them from indexing php files. |
[robotstxt.org...] "*" and "$" do not any special meaning within the context of "disallow" as far as I can make out. Does anyone have a better reference?
| 12:53 pm on Jul 19, 2006 (gmt 0)|
Google is non-standard and recognises the * operator
| 5:38 pm on Jul 19, 2006 (gmt 0)|
You can use the * in the disallow statement only in the section where the user-agent is "Googlebot".
| 1:58 am on Jul 20, 2006 (gmt 0)|
And has the "$" any special meaning?
| 5:59 pm on Jul 20, 2006 (gmt 0)|
It does, but I can't remember what it is right now.
Google's helpfiles have more on this..
| 6:27 pm on Jul 20, 2006 (gmt 0)|
The $ marks the end of the URL, so
means everyting that ends with php
| 11:34 pm on Jul 20, 2006 (gmt 0)|
So does this mean that if I don't want Google to index files such as name.com/file.php?12 i should use "disallow: /*.php" instead of "disallow: /*.php$"?
| 11:40 pm on Jul 20, 2006 (gmt 0)|
Don't forget to place the User-agent: Googlebot line immediately above the disallow too.