Forum Moderators: Robert Charlton & goodroi
The thing about this is that these pages aren't just typo'd addresses or adding invalid parameters... they're page numbers that don't exist.
As an example... on this forum, you can change page numbers by clicking the links, links which point to:
http://www.webmasterworld.com/forum30/page1.htm
http://www.webmasterworld.com/forum30/page2.htm
http://www.webmasterworld.com/forum30/page3.htm
I have something similar on my site, and for the category in question, there's a total of 36 pages. According to the "Unreachable URLs" section of Webmaster Tools, Google is trying to get page 37, 38, 39, and everything else up to 9721 (not every broken page is listed there, but some are and the highest is 9721). Obviously, these are broken links, so I'm wondering if this is related to my PR drop.
Again, those pages shouldn't be in there - if you go to my site, there are no links pointing to those pages (I just double and triple checked). Unless they're not showing up in a Google search, no other sites are linking to these invalid pages. Short of manually changing the URL, you can not get to those pages - so why is Google spidering them?
I obviously want Google to continue spidering the legit pages, but is there any way to tell it, you know, not to look on pages that I don't have linked?
[edited by: tedster at 1:14 am (utc) on Nov. 18, 2008]
[edit reason] de-link the url examples [/edit]
[webmasterworld.com...]
Imagine if everything between the top and bottom "Home / Forums Index / The Google World / Google Search News" was gone, and you'll see what my site does.
I don't think I could set a 404 on these invalid pages, since you access them through something like:
www.example.com/page/1
which, through the magic of htaccess, would transparently redirect to
www.example.com/myscript.php?page=1
So, not an issue here?
If a url should not exist, then a request for that url should return a 404 in the http header sent from the server. How you code that will depend on your own server technology, but one way that you can verify your server headers is to use Firefox and install the LiveHTTPHeaders add-on.
If you consistently return a 404 status, the spurious url requests should slow way down over time. Amongh other benefits, this will help Google make a more economical use of whatever crawl budget they allocate to your site.
However, I don't think this is the reason behind your home page PR now showing zero - although who knows. If your home page PR really fell from 5 to 0, your search traffic would also have fallen off dramatically.
TerrCan123 - I wrote the code myself, and trust me, I double checked both the code and the actual output. It shouldn't be putting out screwy pages, and if you look at the actual site, it's not. Oh, how I wish I could link it here ;)
On the flipside, there's 236 pages on here right now... if the webmaster of this site were in my shoes and tried to put in some code change, what would happen when the site has 237 pages tomorrow or next week?
I think I'm going to have to do what tedster did and serve up a 404 error, but this is sort of tricky with my setup - I can use the header command to give that error status, but then I can't redirect to the proper error page... and if I can't redirect, then any legitimate 404 will serve up a blank white page.
Ack... I think it'd be best to just see if they knock it off eventually.