What are your permalinks set to?
Actually. I was just playing with urls from wordpress sites that do not belong to me and they seem to also do the same thing.
Well, WP uses numeric IDs for posts so I'm not surprized it finds the post using an ID. I don't think it's an issue - maybe an undocumented feature. But as long as your links use the permalink structure then you should be fine and the IDs will never get indexed.
Then why would they show up in WMT?
I just checked a W.P site using permalinks and yeah, it's doing the same thing. What the heck . . .
I'd try to write some sort of 301 that rewrites any trailing numeric URL's to the ones without it, unless you actually have any that end in numbers.
Here's what I came up with on a development site (not "live") but my permalink structure is just /%postname%/
ReWriteRule ^(.+)\/\d+\/?$ /$1 [R=301,L]
Seems to work without breaking anything but it's only **one** number after the permalink. What's the likelihood of postname/1/2?
What is happening is actually a problem I have seen when I activated the <!--nextpage--> quick tag.
After I started using the tag I started seeing these errors come up in WMT as well. But what the nextpage tag did was allowed me to break up my long POST into paginated pages , the subsequent pages are thus labeled www.example.com/longpost/2/ for page 2, www.example.com/longpost/3/ for page 3. Not sure if that applies to you or any other pagination plugin you're using but that was where I encountered the problem.
You can fix that to prevent having duplicate meta and title tag by using a canonical plugin like Yoast WordPress SEO plugin. If you don't have paginated posts or pages this should do the trick.
If you have paginated posts as well, then you can also try using ZB Phantom Toolkit (ZipsBazaar) where any phantom pages e.g. pg 100 will always be redirected to the last authentic page in a post or page. Or rocknbil solution above - didn't try that one yet.
I don't know where these so called phantom pages come from either through bots that are malicious or actual scumbags typing/linking to these pages. I have seen pages go out to /29/ even.
[edited by: lorax at 9:34 pm (utc) on Mar 9, 2012]
I second the suggestion for using the Yoast SEO plugin, it lets you keep multiple versions of pages/posts out of your sitemaps. WP offers multiple ways to link to things which is great for users but not for sitemaps. Check the sitemap that you have now to see if those URIs are in the sitemaps.
@rocknbill - Good point. If it shows up in WMT there must be a link to them somewhere? I'm guessing here but how else would Google find it? How many pages (rough %) of the total are showing up with numeric IDs?
|Seems to work without breaking anything but it's only **one** number after the permalink. What's the likelihood of postname/1/2? |
I can type any number after the slash and it comes up with that page. Since it showed up in webmaster tools, that concerns me.
I will try the Yoast plugin. Thanks for the suggestion. I hope it works.
I just tried rocknbil's suggestion and it worked. Thanks a million.
One problem I just discovered with this solution is that it breaks pagination at the bottom of the page. I had to remove it.
With WordPress' pretty permalinks, there are 3 discrete paging syntaxes -- for the Home Page, for paged Post or Page content, and for paged comments.
A page ordinal suffix of /0/ is always invalid, and /1/ is redundant. But both /0/ and /1/ (or any number, really) can usually sit happily in the address bar and cause issues of 'duplication'. You can test this on just about any WordPress blog out there right now. I recommend always having redirection in place for requests for pages /0/ and /1/.
The Home or Front Page only becomes 'paged' once the number of posts exceeds nn in Settings > Reading > Blog pages show at most [nn] posts. The suffix syntax for a sub-page of the Home Page is:
To redirect Home/Frontpage pages 0/1:
RewriteRule ^(page\/)?\/$ / [R=301,L]
Posts and Pages become paged when you insert the <!--nextpage--> tag into the content. The suffix syntax for the second and subsequent pages simply tacks a numeric segment on the end of the URL:
To redirect Page or Post pages 0/1:
RewriteRule ^(?!page)(.+?)\/\/?$ $1/ [R=301,L]
Post or Pages with comments are split into comment-pages according to Settings > Discussion > Break comments into pages with [nn] top level comments per page. The suffix syntax for second and subsequent pages here is:
To redirect paged-comments pages 0/1:
RewriteRule ^(.+?)\/comment-page-\/?(#comments)?$ $1/$2 [R=301,L]
If your blog doesn't require any paging support at all, you can replace  with \d+ in each RewriteRule above to redirect ANY page-number suffix to the canonical post or page URL.
This is a very basic level of protection. If you do rely on any of WordPress' paging functionality, you can run into another problem -- paged URLs with ordinals that might be out-of-bounds at the high end. Say you have a Wordpress Page that's split into 3 sub-pages, and all 3 are indexed in the SEs. You later chop some waffle out of that Page, so it's now just a 2-page Page. How do you explain where page 3 went?
The plot thickens. What's to stop anyone posting malicious links to stupidly high page numbers (whether the content is really paged or not)? To fix that issue, you need to load WordPress so you can access the requested content and test the page number in the URL against the true number of pages.
For that, you will need a plugin.