| 2:33 am on Jun 14, 2012 (gmt 0)|
You cannot be responsible for other sites posting malformed links. There have been several discussions on Google's own webmaster forum where Google engineers clarified this point. The malformed links report in WMT is just there for your information - do with it what you will or ignore it. The word "error" just means "crawl error" - not "you made an error."
Some malformed backlinks are worth contacting the linking website about - that's where the report can be useful. Other 404s are just coming from some poorly written spammer's script. In either case, you do not hurt your own site by ignoring malformed EXTERNAL backlinks. Of course, if it's an internal link, then you're sending a poor quality signal./
| 10:47 am on Jun 14, 2012 (gmt 0)|
I use 400 bad request for a few urls, but in this particular case, I would implement a redirect, especially if you actually have a page.php. It looks like somebody was trying to link to page.php, but appended a bunch of junk accidentally. I have a rule in my htaccess to remove all the junk and redirect to the page.
| 9:08 pm on Jun 14, 2012 (gmt 0)|
tedster - Thanks for the clarification, I haven't come across that particular post as everything I search in regards to these errors come up with XML fragments.
They appear to be from an overseas spam hut. A few similar sites to mine are also appearing in my error'd URLs, so it isn't just me that's getting targeted.
|Some malformed backlinks are worth contacting the linking website about - that's where the report can be useful. |
|I would implement a redirect, especially if you actually have a page.php. |
I do! Would you mind showing a snippet of the htaccess redirect? :)
| 10:10 pm on Jun 14, 2012 (gmt 0)|
Wish Google would give us the option to check off those malformed links and hide them from the report. Not much we can do about them unless we write every site with a link problem and ask them to fix their links.
| 10:16 pm on Jun 14, 2012 (gmt 0)|
You can click the "fixed" link and Google will remove those URLs from the WMT report.
Some of the URLs may reappear in the report at some later date, simply because when first found as 404 Google will respider URLs returning 404 status several days in a row to see if the page comes back to life.
| 2:18 am on Jun 15, 2012 (gmt 0)|
|I would implement a redirect, especially if you actually have a page.php |
Since you've now identified the linking site as a "spam hut", I would not redirect. What's the value of claiming link juice from such a site? There's even a possible downside, I'd think, of intentionally associating your site with a spammer.
| 2:07 pm on Jun 15, 2012 (gmt 0)|
I use rules like this:
RedirectMatch 301 ^/(.*\.php).+ http://example.com/$1
RedirectMatch 301 ^/(.*\.html).+ http://example.com/$1
These assume that I will never create a url myself that has a .php or a .html somewhere in the middle.
| 3:50 pm on Jun 15, 2012 (gmt 0)|
Here's the way I look at things. In the old days, many SEOs felt that every link was sending you PR - nd you wanted to capture "every drop'. But now we know that some backlinks are eighter ignored or even penalized. I don't see that it is sensible to try to "capture" any effects at all from spam sites.
| 5:49 pm on Jun 15, 2012 (gmt 0)|
|RedirectMatch 301 ^/(.*\.php).+ http://example.com/$1 |
RedirectMatch 301 ^/(.*\.html).+ http://example.com/$1
If you use RewriteRule anywhere on your site, you should convert those rules to also use RewriteRule syntax. Additionally, the .* sub-pattern should never be used at the start or in the middle of a RegEx pattern. Change it to something that can be parsed from left to right in one pass. Using .* can lead to thousands of "back off and retry" trail matching attempts per URL request which can severely impact server performance.
| 1:25 am on Jun 16, 2012 (gmt 0)|
The way I look at it, if Googlebot can find a link, so can users. If its an easy redirect, do it. Folks here are talking about "denying" links by returning 404 statuses for them. I don't buy into that at all unless I hear something from Google saying they support it.
| 7:36 pm on Jun 17, 2012 (gmt 0)|
Interesting difference of opinions regarding the redirect options.
With Google as of late, I'd have to agree with tedster that forwarding links coming from spammy pages could be (mis)interpreted as association.
One would think, that by the spammy pages linking to so many 4xxs that it would hurt their link juice/SEO efforts. ..but, wouldn't also linking to anything other than 200/ok pages?
It would be interesting to hear from Google how they deal with these types of spam pages.
| 10:14 pm on Jun 17, 2012 (gmt 0)|
|if Googlebot can find a link, so can users |
Not necessarily. Error Reports show all kinds of things that are obviously not links, and would never be mistaken for links by humans. For example the one illustrated in the OP: a human would never see that. Or, conversely, the URLs with "..." in the middle. It may be what you see with your eyeballs, but it isn't what you get when you click-- no matter how often google tries to make it into a link.
When I see something odd in logs, like a 404 ending in ".htmlWill", I know it's only a matter of time before it shows up under Crawl Errors. But a human user can look at the URL and say "Oops, where did that 'Will' come from?" and delete it on the fly.
| 10:30 pm on Jun 17, 2012 (gmt 0)|
|When I see something odd in logs, like a 404 ending in ".htmlWill", I know it's only a matter of time before it shows up under Crawl Errors. |
I see these popping up all over the place. I don't adopt them. I don't redirect them. They are left as 404 status. I don't use 410 Gone, as they were never previously in a state to be gone from.
With 404 errors Google tends to return several days in a row to check the status then don't request it again for several weeks or months. At that point, I click the 'fixed' box in the WMT report and hope it doesn't return.
Only URLs that used to exist and no longer do get to return 410 Gone.