homepage Welcome to WebmasterWorld Guest from 54.237.125.89
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
WMT - Invalid Characters in my 400 Error Reports
brokaddr




msg:4465155
 10:12 pm on Jun 13, 2012 (gmt 0)

According to GWT, I've been getting an increasing amount of 400 errors from URLs being incorrectly linked on my site, for links that do not really exist. Someone on the web somewhere is linking to these nonexistent URLs.

Most of these URLs are formatted as follows:
http://www.example.com/page.php%3C/web:Url%3E%3Cweb:DisplayUrl%3Ewww.example.com/...page.php%3C/web:DisplayUrl%3E%3Cweb:DateTime%3E [snipped]


In my opinion, the 400 response is acceptable, due to the presence of invalid characters?
But Google is reporting it as an 'Indexing Error' - what should I do in this case, try to 404 it?

Their FAQ doesn't mention this issue specifically, other than give explanations for what various error codes represent.

 

tedster




msg:4465207
 2:33 am on Jun 14, 2012 (gmt 0)

You cannot be responsible for other sites posting malformed links. There have been several discussions on Google's own webmaster forum where Google engineers clarified this point. The malformed links report in WMT is just there for your information - do with it what you will or ignore it. The word "error" just means "crawl error" - not "you made an error."

Some malformed backlinks are worth contacting the linking website about - that's where the report can be useful. Other 404s are just coming from some poorly written spammer's script. In either case, you do not hurt your own site by ignoring malformed EXTERNAL backlinks. Of course, if it's an internal link, then you're sending a poor quality signal./

deadsea




msg:4465430
 10:47 am on Jun 14, 2012 (gmt 0)

I use 400 bad request for a few urls, but in this particular case, I would implement a redirect, especially if you actually have a page.php. It looks like somebody was trying to link to page.php, but appended a bunch of junk accidentally. I have a rule in my htaccess to remove all the junk and redirect to the page.

brokaddr




msg:4465629
 9:08 pm on Jun 14, 2012 (gmt 0)

tedster - Thanks for the clarification, I haven't come across that particular post as everything I search in regards to these errors come up with XML fragments.

Some malformed backlinks are worth contacting the linking website about - that's where the report can be useful.
They appear to be from an overseas spam hut. A few similar sites to mine are also appearing in my error'd URLs, so it isn't just me that's getting targeted.

I would implement a redirect, especially if you actually have a page.php.

I do! Would you mind showing a snippet of the htaccess redirect? :)

AndyA




msg:4465643
 10:10 pm on Jun 14, 2012 (gmt 0)

Wish Google would give us the option to check off those malformed links and hide them from the report. Not much we can do about them unless we write every site with a link problem and ask them to fix their links.

g1smd




msg:4465650
 10:16 pm on Jun 14, 2012 (gmt 0)

You can click the "fixed" link and Google will remove those URLs from the WMT report.

Some of the URLs may reappear in the report at some later date, simply because when first found as 404 Google will respider URLs returning 404 status several days in a row to see if the page comes back to life.

tedster




msg:4465686
 2:18 am on Jun 15, 2012 (gmt 0)

I would implement a redirect, especially if you actually have a page.php

Since you've now identified the linking site as a "spam hut", I would not redirect. What's the value of claiming link juice from such a site? There's even a possible downside, I'd think, of intentionally associating your site with a spammer.

deadsea




msg:4465874
 2:07 pm on Jun 15, 2012 (gmt 0)

I use rules like this:

RedirectMatch 301 ^/(.*\.php).+ http://example.com/$1
RedirectMatch 301 ^/(.*\.html).+ http://example.com/$1

These assume that I will never create a url myself that has a .php or a .html somewhere in the middle.

tedster




msg:4465905
 3:50 pm on Jun 15, 2012 (gmt 0)

Here's the way I look at things. In the old days, many SEOs felt that every link was sending you PR - nd you wanted to capture "every drop'. But now we know that some backlinks are eighter ignored or even penalized. I don't see that it is sensible to try to "capture" any effects at all from spam sites.

g1smd




msg:4465968
 5:49 pm on Jun 15, 2012 (gmt 0)

RedirectMatch 301 ^/(.*\.php).+ http://example.com/$1
RedirectMatch 301 ^/(.*\.html).+ http://example.com/$1

If you use RewriteRule anywhere on your site, you should convert those rules to also use RewriteRule syntax. Additionally, the .* sub-pattern should never be used at the start or in the middle of a RegEx pattern. Change it to something that can be parsed from left to right in one pass. Using .* can lead to thousands of "back off and retry" trail matching attempts per URL request which can severely impact server performance.

deadsea




msg:4466131
 1:25 am on Jun 16, 2012 (gmt 0)

The way I look at it, if Googlebot can find a link, so can users. If its an easy redirect, do it. Folks here are talking about "denying" links by returning 404 statuses for them. I don't buy into that at all unless I hear something from Google saying they support it.

brokaddr




msg:4466513
 7:36 pm on Jun 17, 2012 (gmt 0)

Interesting difference of opinions regarding the redirect options.

With Google as of late, I'd have to agree with tedster that forwarding links coming from spammy pages could be (mis)interpreted as association.


One would think, that by the spammy pages linking to so many 4xxs that it would hurt their link juice/SEO efforts. ..but, wouldn't also linking to anything other than 200/ok pages?

It would be interesting to hear from Google how they deal with these types of spam pages.

lucy24




msg:4466545
 10:14 pm on Jun 17, 2012 (gmt 0)

if Googlebot can find a link, so can users

Not necessarily. Error Reports show all kinds of things that are obviously not links, and would never be mistaken for links by humans. For example the one illustrated in the OP: a human would never see that. Or, conversely, the URLs with "..." in the middle. It may be what you see with your eyeballs, but it isn't what you get when you click-- no matter how often google tries to make it into a link.

When I see something odd in logs, like a 404 ending in ".htmlWill", I know it's only a matter of time before it shows up under Crawl Errors. But a human user can look at the URL and say "Oops, where did that 'Will' come from?" and delete it on the fly.

g1smd




msg:4466546
 10:30 pm on Jun 17, 2012 (gmt 0)

When I see something odd in logs, like a 404 ending in ".htmlWill", I know it's only a matter of time before it shows up under Crawl Errors.

I see these popping up all over the place. I don't adopt them. I don't redirect them. They are left as 404 status. I don't use 410 Gone, as they were never previously in a state to be gone from.

With 404 errors Google tends to return several days in a row to check the status then don't request it again for several weeks or months. At that point, I click the 'fixed' box in the WMT report and hope it doesn't return.

Only URLs that used to exist and no longer do get to return 410 Gone.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved