homepage Welcome to WebmasterWorld Guest from 54.225.1.70
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google ignores one line of robots.txt
wheelie34




msg:3680655
 8:44 am on Jun 22, 2008 (gmt 0)

Hi
I have a mediawiki install and can't find anywhere on the web that tells you how to setup a spider friendly mediawiki.

My robots.txt file contains
User-agent: *
Disallow: /bin/
Disallow: /cgi-bin/
Disallow: /config/
Disallow: /docs/
Disallow: /extensions/
Disallow: /includes/
Disallow: /languages/
Disallow: /local/
Disallow: /maintenance/
Disallow: /math/
Disallow: /serialized/
Disallow: /skins/
Disallow: /t/
Disallow: /tests/

While doing a site: command the skins folder IS indexed so are all the sub directories in it? out of all the disallow lines above only skins has a problem, any ideas where to look, thanks

 

Receptional Andy




msg:3680658
 8:51 am on Jun 22, 2008 (gmt 0)

Was the line added recently? It's possible that Google discovered the content a long time ago, and has not revisited the pages since being disallowed (you can check the cache date on the files to get an idea if this is the case).

This is often true of content that has few or no external links to it (which is likely the case with your /skins/ folder). Such content can hang around for months and months since Googlebot never revisits it.

If it's important to get the files removed you can use the URL removal tool in webmaster tools, otherwise it's just a case of waiting. Certainly, there doesn't appear to be any problem with your robots directives.

wheelie34




msg:3680659
 8:55 am on Jun 22, 2008 (gmt 0)

Hi Andy

No the whole robots.txt file was created in Feb 2008 I have just started to work on it again and to see how it was doing in the serps I did a site: command and found the skins folder to be indexed and every folder within it, I thought it was strange.

edit: there's no cache date only the Similar OR note tags

Receptional Andy




msg:3680661
 9:02 am on Jun 22, 2008 (gmt 0)

there's no cache date only the Similar OR note tags

Is there a snippet underneath the listing, or do you just see the URLs? If it's just a URL, then this is quite common: files excluded in robots.txt often appear in Google listings in that way.

wheelie34




msg:3680663
 9:05 am on Jun 22, 2008 (gmt 0)

No snippet just

URL
Similar pages - Note this

I use the short URL's directives in htaccess but cant see that affecting anything.

Receptional Andy




msg:3680664
 9:08 am on Jun 22, 2008 (gmt 0)

In that case they are excluded, and your robots directives are being obeyed: Google is aware of the content because of links to it, but it is 'prevented' from retrieving it and so there is no cache or snippet.

Excluded files can hang around in this way for a long time (forever?) and while they make a mess of site: search results, in my experience there isn't usually any impact on performance.

wheelie34




msg:3680666
 9:11 am on Jun 22, 2008 (gmt 0)

ok thanks Andy I will leave as is

Receptional Andy




msg:3680668
 9:16 am on Jun 22, 2008 (gmt 0)

I tracked down a (very!) old thread on the same subject, which has a bit more detail:

Indexed pages that are disallowed by robots.txt [webmasterworld.com]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved