Why are my .swf files still being indexed by Googlebot?

Forum Moderators: goodroi

Message Too Old, No Replies

Why are my .swf files still being indexed by Googlebot?

ichthyous

9:33 pm on Jun 15, 2004 (gmt 0)

Hi there...I have two Flash sites and they are built using various .SWF files linked together. Google has indexed each component individually and I am actually getting hits to them, which I don't want. I added this line to my robots.txt:
User-Agent: Googlebot
Disallow: /*.swf$

The swf files are still indexed though...is there some kind of command I can give to google to drop these files from the index? Thanks!

jdMorgan

8:56 pm on Jun 24, 2004 (gmt 0)

ichthyous,

See this thread [webmasterworld.com] for reference, and see if that makes sense for your situation -- Google will list any link that it finds, regardles of whether it is allowed by robots.txt to fetch the linked-to page. Ask Jeeves and Yahoo do this as well. The thread describes a fix that will work for html-type pages, but won't directly help for non-html pages.

However, you could try a conditional redirect based on the user agent, and see if you can get the spiders to accept an html page when they fetch one of your .swf pages. If so, then you can add the <meta name="robots" content="noindex"> to the html page. Two notes: First, I don't know whether this will work, and second, this is technically cloaking, but since there is no intent to mislead visitors, I wouldn't worry too much about it.

Because the spiders must be set up to accept an html-type 404 error response if, for example, an .swf page is missing, I suspect that they will accept the redirect (or just a 403 or 404 response), and see the meta-tag. Anyway, if you haven't come up with any other ideas, it might be worth a try.

Jim

cpNmi

6:26 pm on Jul 13, 2004 (gmt 0)

Maybe I should start a new thread - not sure since I haven't posted here.

My question is about the general "not wanting Google to read a page" -- or any robot.
But my question is more about the theory of optimization ---

If I use whatever appropriate method (robots.txt or meta tag on the page) to keep a bot from crawling certain site pages (such as CustomerService [CS]) - does that hurt my optimized page since I have links on the CS page that drive to the optimized page?

I'm not sure if I wrote that clearly ...
Maybe the real question is - do internal links matter to relevance? Seems that not allowing the crawl on non-optimized pages will potentially hurt the rank of the optimized page.