homepage Welcome to WebmasterWorld Guest from 54.163.89.8
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google keywords
keywords, disallow, user, agent
cyberdyne

5+ Year Member



 
Msg#: 4406510 posted 1:08 pm on Jan 13, 2012 (gmt 0)

Google webmaster tools reports that the top 3 most common keywords found in my site are 'Disallow', 'User' and 'Agent', which clearly only exist in my 'robots.txt' file. Why is Google listing these words and how do I prevent it from doing so please?

Many thanks

 

DeeCee



 
Msg#: 4406510 posted 1:24 pm on Jan 13, 2012 (gmt 0)

Should not be coming from the robots file if setup the normal way.

try a site search on Google.

Google search

site:example.com Disallow

site:example Agent

and see what comes out.

cyberdyne

5+ Year Member



 
Msg#: 4406510 posted 1:55 pm on Jan 13, 2012 (gmt 0)

Having just done as you suggested, Google returned my robots.txt file for all three search terms.

I'm wondering if it has something to do with an entry in my .htaccess:

<Files ~ (403\.php|block\.php|robots\.txt)>
Order Allow,Deny
Allow from all
</Files>


Perhaps I shouldn't have robots.txt in that list.

cyberdyne

5+ Year Member



 
Msg#: 4406510 posted 3:53 pm on Jan 14, 2012 (gmt 0)

Can anyone else offer any help with this please?
I'm guessing it's something simple.
Thank you

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4406510 posted 11:00 pm on Jan 14, 2012 (gmt 0)

Part of the answer is simple: As noted in earlier threads, g### does index robots.txt itself.

It is appropriate to allow everyone access to robots.txt. Again, see assorted other threads. But you'd be better with a FilesMatch if you want to group them. I think Apache itself says so. I group mine by extension:

<FilesMatch "(forbidden|goaway|missing)\.html">
Order Allow,Deny
Allow from all
</FilesMatch>

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

<Files "favicon.ico">
Order Allow,Deny
Allow from all
</Files>

cyberdyne

5+ Year Member



 
Msg#: 4406510 posted 12:54 am on Jan 15, 2012 (gmt 0)

Thanks you Lucy, I have made the changes you suggest.
You say 'part' of the answer, do you not think this alone will solve the issue?
Thanks

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4406510 posted 5:01 am on Jan 15, 2012 (gmt 0)

By "part of" I meant the discussion about material from robots.txt showing up in searches. Google also indexes sitemaps and-- if you're careless enough to leave them lying around-- raw logs. They would undoubtedly index your htaccess if only they could get to it.

But if the single most common words are "Disallow" "User" and "Agent" it suggests that they haven't got around to counting the rest of the keywords yet. Feed in some random exact-text phrases and make sure they come up in search results. Then you'll know that your other pages are indexed.

Keywords seem to be processed entirely separately from indexing-in-general. And I've got a hunch they don't generate the list at all until you sign up with gwt. In my case I'd just gotten used to a list absurdly packed with words like "it's" and, yes, "word"... and then suddenly a whole slew of names crops up. They belong to a rarely-visited page that happens to be fatter (in html) than anything else on the site. So as soon as they threw it into the Keywords mix, everything changed.

For a while, one of my most common keywords was "thumbnail". I finally forced myself to sit down and make proper alts for all my, ahem, thumbnails ;)

cyberdyne

5+ Year Member



 
Msg#: 4406510 posted 12:01 pm on Jan 15, 2012 (gmt 0)

I'll add some keywords all over the site, then monitor GWT and see if it changes. Many thanks for the advice and explanation.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved