| This 37 message thread spans 2 pages: < < 37 ( 1  ) || |
|WMT Crawl Errors: The Next Generation of Reports|
new WMT - looks pretty interesting. Hopefully will get rid of some of those 404s from years ago:
Crawl errors is one of the most popular features in Webmaster Tools, and today we’re rolling out some very significant enhancements that will make it even more useful.
We now detect and report many new types of errors. To help make sense of the new data, we’ve split the errors into two parts: site errors and URL errors.
Couple days ago I happened to notice* the googlebot trying to crawl an obviously bogus address:
Took about a day and a half for it to show up as a Crawl Error. I am not WebmasterWorld ;)
Sigh. Googlebot, do you really and truly believe that there might exist an URL with that configuration?
:: putter, putter ::
Sigh again. Is Alexa genuinely garbage or is the site just designed to look like garbage?
* Normally I wipe authorized robots from logs at an early stage, sight unseen. Maybe it was the %20 that caught my eye.
I'm assuming you know %20 is really a space. I've apparently been getting some of those visits too, as I got a whole bunch in my WMT Crawl Errors starting 4/5/2012 and increasing in frequency daily. Of course when you check the error report, there is nothing listed for "linked from". The base pages appear to be totally random. I thought at first it was someone trying to link incorrectly (accidentally putting a stray space before the ?) (perhaps a programmer at Gbot :). Without the space my server would serve up the correct page and throw away the parameter list just fine.
Wow, just found the apparent source. It is an obscure "search engine" (I think - it's got very little text in another language). They use that exact syntax to display others' pages in a frame over top of their own search page and found one of our affected pages there. The "link" on their site includes the space and apparently is internally processed by their server and displays our page correctly within the frame, but when G collects the link verbatim and tries to follow it, it fails, producing a 404 and a crawl error report. Not sure who is in the wrong here. The search engine or Google? But I sure hope they fix it soon.
There have been problems for a few months now.
1) The added%20 at the end of URLs, they are truly nonexistent on the links the bot supposedly follows. I ended up 301ing them to the correct page.
2) The obscure search engines having listings like:
Title of Page (linked to actual http://www.example.com/page.html page, correct URI)
Text from page, like a normal search engine.
http://www.example.com/p... (text only, but not displaying the entire URI)
G bot now crawls the text URI and reports it as an error, even though the correct URI is directly above it within the text. It results in hundreds and hundreds of 404 errors from every single obscure search engine out there.
3) G bot crawling their own SERPs incorrectly. Every once in a while, they'll be crawling some RSS feed of their own search results, but instead try to crawl http://www.example.com/page.html<web:URL_whatever_the_RESULT_BELOW_MINE
Which again, sometimes results in hundreds of 404s.
Google is clearly looking at pages more as part of a book instead of on an individual basis. I just checked again and I now have 2000+ errors. 1200 of those are pages I submitted a removal request for eons ago that has expired, they haven't existed in a LONG time, and the other 800 are bogus and all just improper or incomplete links from other sites.
SO - my site has ZERO pages missing right now yet Google thinks I have 2200+. Nice.
Is this impacting my search rankings? It had better not be.
I believe this is an attempt by a competitor to cause 404 errors on our sites and create what appears to be unnatural linking with massive amounts of links per search engine. This may not apply to the framed pages but could be.
The truncated URLs under search results? See if the domains linked to you most (like hundreds or thousands of unsolicited links) and see if they aren't on those sites. I've seen some of the domains expire or the web host shut them down for terms violations and our SERPs improved. In any case, with all the search engine startups at any given time you'd expect to see this regularly and Google normalize somehow.
Two other things I've found:
1. The index pages of the sites are totally different than the search results pages
2. About half of the sites linking to us most are only linking to us and some of our competitors, no other sites are represented.
One last note, I didn't get a email from Google about unnatural links but when a couple of those sites went down our SERPs went up.
Welcome to the forums, Str82u. If some competitor is using that approach to try to hurt your rankings, they're out of luck. 404 errors on external links don't have any effect on your rankings. They do get listed in WMT, but only as a kind of FYI.
What you want to avoid, however, is internal 404 links. Too many of those is a low quality signal.
| This 37 message thread spans 2 pages: < < 37 ( 1  ) |