| 5:35 pm on Sep 1, 2012 (gmt 0)|
This annoys the **** out of me when it happens.
| 7:39 pm on Sep 1, 2012 (gmt 0)|
"If it is an URL, we will index it."
Is there a thread that explains how to attach a "noindex" directive to something that isn't html? robots.txt and sitemap.xml will do for starters.
:: quick detour to check obvious corollary question ::
SO FAR, an image search for "favicon" does not bring up a slew of actual sites' actual favicons. But give them time; I'm sure it is merely an oversight.
| 2:22 am on Sep 2, 2012 (gmt 0)|
|how to attach a "noindex" directive to something that isn't html |
There is a technology called an x-robots-tag that allows a noindex directive to be placed in the http header that's sent by the server. It's very handy for non-html document types, such as video files, pdf files, etc.
For details, see this page from the Google developers site: Robots meta tag and X-Robots-Tag HTTP header specifications [developers.google.com]
| 5:58 am on Sep 2, 2012 (gmt 0)|
robots.txt page is such an unique name and it should be easy for them to exclude it from their index without any directive. But these days they are so focused on user ex. you know....
The interesting part is what does it rank for to get the traffic? Is it some file or folder name that is unique and which you won't find easily elsewhere on the web or is it a keyword that does drive some traffic to sites?
| 8:16 am on Sep 2, 2012 (gmt 0)|
|For details, see this page |
... which someone, quite possibly yourself, has already pointed me to in the recent past. I think I even looked at it.
Oh well. I did manage to get chummy with mod_expires yesterday. Only took about seven tries-- and NO pleas for help-- to hit on the right wording for what I wanted to do.
| 12:19 pm on Sep 2, 2012 (gmt 0)|
| 1:40 pm on Sep 2, 2012 (gmt 0)|
|There is a technology called an x-robots-tag that allows a noindex directive to be placed in the http header that's sent by the server. |
This is what I did for all .txt files (and some others) after some of them started showing up in the serps. Never saw a robots.txt though; that's beyond ridiculous.