homepage Welcome to WebmasterWorld Guest from 54.163.72.86
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Why are my .swf files still being indexed by Googlebot?
ichthyous




msg:1526616
 9:33 pm on Jun 15, 2004 (gmt 0)

Hi there...I have two Flash sites and they are built using various .SWF files linked together. Google has indexed each component individually and I am actually getting hits to them, which I don't want. I added this line to my robots.txt:
User-Agent: Googlebot
Disallow: /*.swf$

The swf files are still indexed though...is there some kind of command I can give to google to drop these files from the index? Thanks!

 

jdMorgan




msg:1526617
 8:56 pm on Jun 24, 2004 (gmt 0)

ichthyous,

See this thread [webmasterworld.com] for reference, and see if that makes sense for your situation -- Google will list any link that it finds, regardles of whether it is allowed by robots.txt to fetch the linked-to page. Ask Jeeves and Yahoo do this as well. The thread describes a fix that will work for html-type pages, but won't directly help for non-html pages.

However, you could try a conditional redirect based on the user agent, and see if you can get the spiders to accept an html page when they fetch one of your .swf pages. If so, then you can add the <meta name="robots" content="noindex"> to the html page. Two notes: First, I don't know whether this will work, and second, this is technically cloaking, but since there is no intent to mislead visitors, I wouldn't worry too much about it.

Because the spiders must be set up to accept an html-type 404 error response if, for example, an .swf page is missing, I suspect that they will accept the redirect (or just a 403 or 404 response), and see the meta-tag. Anyway, if you haven't come up with any other ideas, it might be worth a try.

Jim

cpNmi




msg:1526618
 6:26 pm on Jul 13, 2004 (gmt 0)

Maybe I should start a new thread - not sure since I haven't posted here.

My question is about the general "not wanting Google to read a page" -- or any robot.
But my question is more about the theory of optimization ---

If I use whatever appropriate method (robots.txt or meta tag on the page) to keep a bot from crawling certain site pages (such as CustomerService [CS]) - does that hurt my optimized page since I have links on the CS page that drive to the optimized page?

I'm not sure if I wrote that clearly ...
Maybe the real question is - do internal links matter to relevance? Seems that not allowing the crawl on non-optimized pages will potentially hurt the rank of the optimized page.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved