Welcome to WebmasterWorld Guest from 50.16.112.199

Forum Moderators: goodroi

Why are my .swf files still being indexed by Googlebot?

   
9:33 pm on Jun 15, 2004 (gmt 0)

10+ Year Member



Hi there...I have two Flash sites and they are built using various .SWF files linked together. Google has indexed each component individually and I am actually getting hits to them, which I don't want. I added this line to my robots.txt:
User-Agent: Googlebot
Disallow: /*.swf$

The swf files are still indexed though...is there some kind of command I can give to google to drop these files from the index? Thanks!

8:56 pm on Jun 24, 2004 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



ichthyous,

See this thread [webmasterworld.com] for reference, and see if that makes sense for your situation -- Google will list any link that it finds, regardles of whether it is allowed by robots.txt to fetch the linked-to page. Ask Jeeves and Yahoo do this as well. The thread describes a fix that will work for html-type pages, but won't directly help for non-html pages.

However, you could try a conditional redirect based on the user agent, and see if you can get the spiders to accept an html page when they fetch one of your .swf pages. If so, then you can add the <meta name="robots" content="noindex"> to the html page. Two notes: First, I don't know whether this will work, and second, this is technically cloaking, but since there is no intent to mislead visitors, I wouldn't worry too much about it.

Because the spiders must be set up to accept an html-type 404 error response if, for example, an .swf page is missing, I suspect that they will accept the redirect (or just a 403 or 404 response), and see the meta-tag. Anyway, if you haven't come up with any other ideas, it might be worth a try.

Jim

6:26 pm on Jul 13, 2004 (gmt 0)

10+ Year Member



Maybe I should start a new thread - not sure since I haven't posted here.

My question is about the general "not wanting Google to read a page" -- or any robot.
But my question is more about the theory of optimization ---

If I use whatever appropriate method (robots.txt or meta tag on the page) to keep a bot from crawling certain site pages (such as CustomerService [CS]) - does that hurt my optimized page since I have links on the CS page that drive to the optimized page?

I'm not sure if I wrote that clearly ...
Maybe the real question is - do internal links matter to relevance? Seems that not allowing the crawl on non-optimized pages will potentially hurt the rank of the optimized page.

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month