Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Images and the Safe Search filters

         

bumpski

3:11 pm on May 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm still looking into a lack of indexing of images from one of my sites. One factor I've noticed with images from my site is that if I do a search of site:www.example.com and "Strict filtering" is on I see no image results at all. With moderate filtering I see all my images.

I began to experiment with Web search and noted that several of my pages were not indicated in a site search if I had Strict filtering on. The pages that were gone were those blocked by robots.txt! These are pages the shouldn't be indexed anyway from my perspective. But from a strict filtering point of view I could see how Google would still want to "crawl" the content anyway. I started to investigate other sites "Images" results and noted several sites whose images were not in the results even with moderate safe search selected. These sites were blocking subdirectories with "robots.txt". In sampling sites with no "robots.txt" I saw no restriction on images indexed regardless of "safesearch" filter setting.

So it now appears to be safer to allow Google to crawl your entire site by having no blocking in robots.txt, but, "disallow" indexing your site's specific pages using the robots metatag "noindex". Your telling Google, look and read whatever you want but don't index the pages I've marked. In this way google can still crawl all your pages and evaluate them from a "safesearch" perspective.

The method above was required in the past to succesfully De-Index pages. If you had a page indexed and wanted to remove it from the index it was not sufficient to just block with robots.txt. You had to specifically mark the page as robots=noindex and allow Google to crawl the page, I believe at least 3 times, to successfully De-Index a page.

The observations above are mostly conjecture (except for deindexing) from very recent investigations.

For now to solve my image indexing problem, my robots.txt file allows unlimited access to the site, I've disabled my sitemap.xml file, because I certainly did not itemize all my image files (.jpg), I didn't feel it was necessary and now I'm not sure. And now I'll be adding a few more robots=noindex metatags.

One other note about another page not shown in search results with strict filtering on. All I could think was wrong with the page was it had links with the words "license agreement" in them. I could see how Google might want to prevent children from reaching pages with license agreements on them. This was a bit of a surprise.

So an obviuos suggestion is to review your sites search results with the various Safesearch settings. Google is clearly getting more conservative with safe search and it could be they have changed how they are interpreting robots.txt. And of course Google can make mistakes and the bugs might impact NON SafeSearch results. If I remember correctly every site I looked at had fewer results with Strict SafeSearch filtering on! Seeing this result I've started to tweak content to be even "safer", except on my highest ranking pages which I'm afraid to change!

[edited by: tedster at 4:07 pm (utc) on May 31, 2006]
[edit reason] use example.com [/edit]