Forum Moderators: Robert Charlton & goodroi
I wonder if anyone can beat my triple indexed site? Four times anyone?
I am not sure what you mean. The correct URLs, with ampersands, etc. return 200 OK. The URLs Google is sending return 404 Not found. I can only assume Google found a page it wanted to index (with an ampersand), incorrectly changed the URL on its servers to hex, and then provided that incorrect link for the SERPs. User clicks the Google link, gets a 404 from my server (as it should). This has been going on for over a year. I simply don't use ampersands, spaces etc. anymore.
example.com/folder/file.html/
The server however responds with 200 ok, the page that shows up looks terrible (css is the only file I use a relative link on, so that url breaks that)
Also for the first time in the almost 9 years this site has been alive, G. is now also starting to index without the www, adding /index.html etc.
A few days ago I started adding 301's to try to correct the errors. Gbot did follow one of the 301s, no idea if it will actually help correct anything or not. (I had tried fixing this before on another site, it never did recover but it's worth a shot.)
G1smd, how would I return a 404 for something like file.html/? Would that be a better way to deal with that than by using a 301?
1. Put a "Disallow:/page" entry in robots.txt
2. Put a <meta name="robots" content="noindex,nofollow"> tag on the page
3. Put a "nofollow" tag on each link to the page
4. Shutdown your server whenever you think the Googlebot might be about to crawl the page.
Hope this helps!
PS: If you also want some of your pages to rank, you'll need to start several blogs that link "organically" to your site via a bunch of junky pseudo-articles. This is Web 0.2 Google style!
The on-page "disallow" meta tag completely removes a page from the index.
More than once I've had a dud URL like that knock the real URL out of the SERPs. It would be nice if they'd ignore extraneous following spaces instead.
Google Sitemaps shows these as errors, but ironically I have to go to Yahoo's link: command to track them down!