| 7:17 pm on Oct 18, 2011 (gmt 0)|
How about a robots.txt rule first?
| 8:41 pm on Oct 18, 2011 (gmt 0)|
Hi Tedster I thought of this but would this actually remove them from the index? (or 'supplemental' index')
| 8:45 pm on Oct 18, 2011 (gmt 0)|
actually i just checked my robots.txt and I have disallowed the particular php file.
However they are still appearing in the search results using the 'site:mywebsite' command, so obviously google does know they are there and is taking them into account.
What do you suggest?
| 9:38 pm on Oct 18, 2011 (gmt 0)|
Well, you have a tangle here. Because the URLs are disallowed in robots.txt they cannot be crawled any longer. This means you could request removal - which as you say might be quite time consuming.
1. If the robots.txt block is relatively new, then you might just wait. I assume the URLs are not serving content but rather doing redirect. So Google will most likely drop those URLs in the relatively near future. At any rate, they are unlikely to be showing up in the SERPs except for site: operator results.
2. Another approach would be to move your php script into a new directory (or even just use a new name) that you also disallow in robots.txt from the start. With this approach, you would also edit all the internal linking to point to the new location - do not rely on a 301 redirect here. If there are old style URLs in your internal linking and if the crawl is disallowed, then they may never go away.
No matter what you do here, anything except a URL Removal Request will likely take time to be reflected in the site: operator results. However, moving your link script to a new location, disallowing it in robots.txt and changing the internal links is the best long-term solution.
| 7:11 am on Oct 19, 2011 (gmt 0)|
thanks for that reply Ted.
The approach you suggest in point (2) is probably the way I'll go.
If I create a new directory for the 'jump script', immediately disallow in the robots and I guess I'll have to put in the hard yards and amend each link on my deep pages.
Once this is done go in and ask for a URL removal request for the old jump script URL.
| 7:24 am on Oct 19, 2011 (gmt 0)|
|Once this is done go in and ask for a URL removal request for the old jump script URL. |
You can just make the old script 404 and remove the disallow rule from robots.txt. Google will then be able to crawl it, get the 404 response several times and then POOF! it will be gone without you needing to go through the ordeal of the URL removal tool.
| 7:59 am on Oct 19, 2011 (gmt 0)|
Thanks Ted that sounds like the way to go.
| 8:30 am on Oct 19, 2011 (gmt 0)|
Just out of interest Ted which method do you use to assess pages of lower quality (I guess 'supplemental index' is a phrase of yesteryear!)
I use site:mysiteaddress, click to the last page then click
"If you like, you can repeat the search with the omitted results included."
Once again click to the last page and measure the difference?
On a side note when I use the site:mysite operator, google indicates it has found 1100 results but when I scroll through the pages there are only around 50 (10 results on each page)= around 500 articles. But google has just said its found over 1000?
Could you clarify this for me Ted?