Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Custom Search Engine: Excluding useless pages from results

         

flapane

7:59 pm on Jun 7, 2012 (gmt 0)

10+ Year Member



Hi,
I noticed that a lot of results are altered because of some indexed pages.
- my header menu (of course it shouldn't be indexed)
- my sitemap.xml (why in the world should it be indexed by CSE?)
- my Coppermine Gallery "last uploads" which shoudn't be indexed because the images are already indexed, and "last uploads" section is created dinamically by the gallery
- The "archive" section of my WP blog (ie. /blog/page[1...x]) which contains articles which have been ALREADY indexed by CSE

I excluded the URL patterns as shown in the attached image ([i.imgur.com ]).
Anything wrong concering the way I excluded them and whether they should be excluded form CSE or not?
Thanks in advance

[edited by: tedster at 10:03 pm (utc) on Jun 7, 2012]
[edit reason] member request [/edit]

netmeg

12:47 pm on Jun 8, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do they show up as indexed in Google?

I run a lot of CSEs, and if I have the page blocked or noindexed, it doesn't show up in the local search.

If your unwanted pages were indexed before you added them to robots.txt - that won't remove them.

You could try removing the robots.txt entry and putting NOINDEX on the pages that render as HTML. As far as your sitemap.xml - I dunno why Google does that soemtimes, but I exclude it in robots.txt and then remove it in GWT and see if that keeps it out.

flapane

9:57 pm on Jun 8, 2012 (gmt 0)

10+ Year Member



Sorry, I wasn't too clear about it. I excluded them yesterday.
I just wanted to know if such patterns were wrong (ie. they could lead to other pages being excluded from the search engine because of some CSE bugs or other reasons I'm not aware of), and if the logic behind their exclusion was (speaking in general) right.

flapane

6:40 pm on Jun 10, 2012 (gmt 0)

10+ Year Member



I'm afraid that it hasn't worked:
I searched for "UFO", which is one of the links in the header menu (I blacklisted menu.php, so it shouldn't be indexed), and here's a long of list of results showing "UFO" just because it is one of the links in the header menu of every page: [i.imgur.com...]

I wonder if it's due to the fact that menu.php is automatically included in the header via <? php include ?>. I'm afraid that the spider couldn't exclude it.