Forum Moderators: Robert Charlton & goodroi
It may be linked to this report : Webmaster Tools Content Analysis Glitch [webmasterworld.com]
When checking "Pages Not Found" they are reported as 404's [ not found ] - we have over 5,000 of them.
These pages have re directs on them to valid pages.
-I haven't had data for 2-3 weeks either, what is the deal?
-When I click the 'Cached' link for my homepage listing in the serps it does not return a page, and there is a strange prefix cache:dpvMXAlxq7kJ:www.site.com/
It isn't happening with all the site's pages. The rankings remain strong. Should I be concerned with this weird caching issue and no webmaster tool search data?
Thanks,
Greg
Also in regard to the cache issue, I have seen this on several of our sites and many of our leading competitors and it is datacenter specific, again some form of update propagating throughout there datacenters.
The link for three weeks ago is still data unavailable. Soon that data will just be part of "July" when we get in to August, and the error will not show - even if a whole week of data really is still missing.
.
The prefix for the cache link is normal. It's some sort of ID for the document.
You can strip off all except that prefix and the domain name, and the cache link will still work.
The "n weeks ago" links are all gone, and all the data is now merged under the "July" link. It is impossible to tell what data might still be missing.
The change from "n weeks ago", to a named "month" link has happened on the first day of the new month, and not at the weekend; so when does a new "week" start, and how many are there in a month?
However, normally I also see the Links Report update one to two days after the homepage visit message changes. Last time there was no such update.
The URL was originally just a duff incoming link from some other site; the URL in the link was a typo of what it should have been. Google found the link only days after it was created, and added it to the Crawl Errors report a few days later, as the URL returned 404 and did not return content.
A few days later I set up a 301 redirect to capture any incoming traffic from that typoed link, and redirect it to the correct URL for the content.
Google has continued to show the URL as a 404 Error since that time, until some time earlier today, even though they have updated the Incoming Links report several times during that time, and updated the information about the page that contains the duff link. Google has spidered the page that contains the duff link several times. That page contains other links that are reported in the Incoming Links reports of other pages and/or sites, so showing the circumstances.
Looks like the cycle of updating the Crawl Report is a lot slower than everything else. Other WMT link reports seem to update in only three to ten days, but for the crawl error it took Google almost two months to notice that the URL status had changed from a 404 to be a 301.
Or has a WMT bug just been fixed?
It's come to our attention that some URLs are listed as 404s for some sites in Webmaster Tools even though they were apparently crawled correctly. In general, even if we were not able to crawl some URLs correctly once or twice, this should not affect a site's crawling, indexing or ranking in our search engine.We're currently analyzing the situation and will give you more information as soon as we have it.
[edited by: Whitey at 10:31 am (utc) on Aug. 3, 2008]
The good news is that Googlebot no longer indicates it is finding 404's for valid html pages.
Site with 84 pages, fully indexed for several months, has been showing the PageRank distribution like this in WMT under Statistics > Crawl Stats:
- Roughly 95%+ Low and
- 5% Not Yet Assigned.
Now today, it shows:
- Roughly 75% Low and
- 25% Not Yet Assigned.
Hang on, "not yet assigned" says to me that 25% of the pages have never had any PageRank assigned, but as you can see, last week only about 5% didn't have any assigned.
No new pages have come online for many months, and Google has already found all of the pages that are online, many months ago. So is the "yet" word redundant, meaning that pages can have PageRank un-assigned, or is it a glitch in the reporting?
.
I took a peek at the site using a copy of IE with the Google Toolbar loaded. The root URL shows a PR of one. About 15 to 20% of the other pages show a white bar, and about 80 to 85% show a grey bar.
Nearly 10 days of gray bars now. I am concerned.One thing i noticed, is that on sites that could be having issues, the TBPR distribution appears to be responding only to pages with IBL's.
WMT shows identical distribution of TBPR to sites that have the "green" bar. Not sure if this is reliable though.
[webmasterworld.com...]
Not sure what to say as I'm confused. Yo Yo , TBPR update , WMT with glitches .....