Forum Moderators: Robert Charlton & goodroi
The correct MIME type for javascript files is text/javascript I believe. Those should not be indexed. I can't see any sense in indexing them at all.
I have no problem with Google scanning the files to see what is in them, and rooting out sites with dodgy redirects to spam, trojans, and malware, but they should not appear in search results.
The correct MIME type for javascript files is text/javascript I believe. Those should not be indexed. I can't see any sense in indexing them at all.
I can verify that text/javascript IS indexed. I have an include file on one website I include on another and it's listed in the search results. How can I prevent this?
From that admittedly limited experience, I would guess that files ending in .js or .css are usually naturally excluded (as they are not fetched), but files with other extensions are fetched but then not indexed (but listed as URL only) if the declared mime type is
application/x-javascript or text/css. In other words, a file extension other than .js or .css means that Googlebot needs to fetch the file to check the mime type, whereas when those extensions are used, the mime type is assumed and the file is ignored. You could put such files in a directory excluded by robots.txt to avoid any listing.
JS: [google.com...]
CSS: [google.com...]
The common denominator seems to be that the URI does not end with "js" or "css"
The common denominator seems to be that the URI does not end with "js" or "css"
Yes, as I said if the files don't end in .js or .css then Googlebot has to fetch them to check the mime type and see if they are worth indexing. And once the file is fetched, it is in the index even if it is listed with the URL only.
Looks like that for Google at least, file extensions are as important as mime types.
But google indexed those pages with a cache.
The funny thing is, I blocked the directory those pages are in with my robots.txt and even used javascript to link to those pages so that only users could click through but google still indexed the pages.
So I submitted my robots.txt file to the google URL removal tool which got rid of the pages.
There are options to remove URLs using a robots.txt file, or via meta-tags on the page, as well as removal of pages that no longer exist (if they return a 404 error).
Google has definately been spidering javascript lately on one of my sites because I have pages that are linked to with javascript links so that the engines wouldn't crawl the urls
I have had similar happen to me before, even folders that where not linked from anywhere but attributed their listings to the Google Toolbar.
Do you use the toolbar? If so this could be the reason how Google found the pages.