Hi guy I had the issue a long time ago and I thought I solved it. But I didn't cos today I find the same issue.
I have a wordpress blog with url like this:
http://www.example.com/post-title-etc-.html, in some posts I have images like this <a href=http://www.example.com/heres-image-name.jpg><img src=http://www.example.com/heres-image-name.jpg></a> and everything was OK until one day I found Google indexed some pages (over 10) like this:
http://www.example.com/post-title-etc-.html/heres-image-name which only had a image with adsense ads without any content.
or http://www.example.com/post-title-etc-.html/ or http://www.example.com/page/11?updated-min=2010-01-01T00%253A00%253A00-08%253A00&updated-max=2011-01-01T00%253A00%253A00-08%253A00&max-results=9
I have no idea what happened, but apparently I got penalized in Panda. So anybody here please help me.
meaning "If there is a slash after the .html, redirect to form of URL that stops at .html". No closing anchor in the pattern, so there might be more stuff after the directory slash.
This is one of the many rules that you don't need to add unless you're actually getting requests for bogus URLs. Others include rules for intercepting multiple directory slashes, and rules for html files with attached query string (assuming you're not secretly parsing .html as .php).
Does the link actually lead to a real resource?
By default you can add any kind of garbage after html-- or any other extension used with static files-- and it will have no effect on the page.
True in most cases but I once came across a situation (a couple of plugins stepping on each other as I recall) that messed with the URL and generated 404s and 500s. Took me the longest time to figure out which ones were involved because the client had installed so many (20+).