Welcome to WebmasterWorld Guest from 23.20.241.155

Message Too Old, No Replies

How to delete Joomla site search pages from Google index?

     
9:57 pm on Aug 11, 2008 (gmt 0)

10+ Year Member



I have a Joomla site and at some point in the last few months Google started including results pages from my site's own search function in it's index. The searches are all on random-looking word fragments. Now I have 360 garbage pages in the index. I have them blocked now using robots.txt, but too late.

What is the best way to remove these from the index? I started to do so using the Webmaster Tools removal tool, but it will take a while.

12:53 am on Aug 12, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The robots.txt file will kick in to prevent future indexing. If you used the url removal tool in your Webmaster Tools account, then it usually only takes a few days to see the removal. You've done what you can do.
3:06 am on Aug 12, 2008 (gmt 0)

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I use a triple play when possible:

410 them (if possible)
block with robots.txt
URL removal tool

4:17 am on Aug 12, 2008 (gmt 0)

10+ Year Member



Thanks - Glad I'm on the right track. I'll try the 410 approach if the url removal tool doesn't do the trick.
4:27 pm on Aug 17, 2008 (gmt 0)

10+ Year Member



The URL removal tool denied all of the removals of the garbage search results pages. It seems that I've followed the guidelines for removal, but they were denied anyway, with no further explanation than pointing me back to the list of conditions I complied with. Can't do the 410 because they are search results and not pages that have been removed. All URLs are showing up as blocked by robots.txt in Webmaster Tools. I guess all I can do is wait for them to drop from the index? I hope? Soon?
4:55 pm on Aug 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Can you use a "nofollow" command on the search results pages? If you can, it will prevent any future indexing and the old pages will gradually drop out once they are re-indexed.
12:22 am on Aug 18, 2008 (gmt 0)

10+ Year Member



The search results are displayed on the index page. The URL is something like index.php?component=search&searchstring=foo. Shouldn't the robots.txt file prevent future reindexing?
12:40 am on Aug 18, 2008 (gmt 0)

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Disallow: /index.php?component=search
9:21 pm on Aug 21, 2008 (gmt 0)

10+ Year Member



Yes, the pages are blocked by robots.txt, and webmaster tools confirms this by showing an error for each i.e. "URLs restricted by robots.txt". The problem is they have been blocked for well over a month, maybe two, they aren't dropping out of Google's index, and I can't remove them with the URL removal tool. Does anyone know if they will just go away by themselves, and if so, how long it will take? There are more of these garbage results in the Google index than actual valid pages on my website, which can't be a good thing.
10:41 pm on Aug 21, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I can't remove them with the URL removal tool

There's an option in the tool to use your robots.txt rules for URL removal. Have you tried that?

12:24 am on Aug 22, 2008 (gmt 0)

10+ Year Member



No. I did run across that recommendation in my research, but wasn't able to find the function in the removal tool and figured it was something that had been taken out. When I go to the tool I see these options:

1) Individual URLs: web pages, images, or other files Remove outdated or blocked web pages, images, and other documents from appearing in Google search results.
2) A directory and all subdirectories on your site Remove all files and subdirectories in a specific directory on your site from appearing in Google search results.
3)Your entire site Remove your site from appearing in Google search results.
4) Cached copy of a Google search result Remove the cached copy and description of a page that is either outdated or to which you've added a noarchive meta tag.

Can you point me to the function that allows me to submit robots.txt to the removal tool? Maybe I'm looking in the wrong place.

1:07 am on Aug 22, 2008 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It's that first option - robots.txt or meta robots noindex create "blocked web pages". See the top page under the "Remove URLs" section for the exact description.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month