Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

WMT - Web crawl glitch

         

Whitey

12:46 am on Jul 10, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is anyone seeing glitches in the web crawl section of WMT ?

It may be linked to this report : Webmaster Tools Content Analysis Glitch [webmasterworld.com]

When checking "Pages Not Found" they are reported as 404's [ not found ] - we have over 5,000 of them.

These pages have re directs on them to valid pages.

Gshaughn

2:41 pm on Jul 28, 2008 (gmt 0)

10+ Year Member



"I still have no access to the Search Queries data for the first week of July, now billed as "three weeks ago", in WMT."

-I haven't had data for 2-3 weeks either, what is the deal?
-When I click the 'Cached' link for my homepage listing in the serps it does not return a page, and there is a strange prefix cache:dpvMXAlxq7kJ:www.site.com/

It isn't happening with all the site's pages. The rankings remain strong. Should I be concerned with this weird caching issue and no webmaster tool search data?

Thanks,
Greg

drall

3:48 pm on Jul 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Greg,
Our WMT data just finally updated after 3 weeks of lagtime and this is on a pr7/8 site so dont worry its just some type of update in there backend.

Also in regard to the cache issue, I have seen this on several of our sites and many of our leading competitors and it is datacenter specific, again some form of update propagating throughout there datacenters.

g1smd

7:15 pm on Jul 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



When does a new week start?

The link for three weeks ago is still data unavailable. Soon that data will just be part of "July" when we get in to August, and the error will not show - even if a whole week of data really is still missing.

.

The prefix for the cache link is normal. It's some sort of ID for the document.

You can strip off all except that prefix and the domain name, and the cache link will still work.

TechMan

2:28 am on Jul 29, 2008 (gmt 0)

10+ Year Member



I still don't see TOP SEARCH QUERIES related to July.

System

4:37 pm on Jul 29, 2008 (gmt 0)

redhat



The following 2 messages were cut out to new thread by tedster. New thread at: google/3710725.htm [webmasterworld.com]
7:48 pm on July 29, 2008 (EDT -4)

g1smd

1:05 am on Jul 30, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Updated message today: "We Last Visited your Home Page on July 20th" -- still this massive lag in reporting; nine days.

The keyword data missing from "three weeks ago" is still gone, but is now behind the "four weeks ago" link instead, and still missing.

TechMan

6:07 am on Jul 30, 2008 (gmt 0)

10+ Year Member



My July data related to TOP SEARCH QUERIES is back!

Brett_Tabke

12:59 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Ya, all the data was updated and back yesterday morning.

They ever say what the problem was? (I would be curious - have heard some rumors...)

g1smd

1:14 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No way to know what is there and what is not now.

The "n weeks ago" links are all gone, and all the data is now merged under the "July" link. It is impossible to tell what data might still be missing.

The change from "n weeks ago", to a named "month" link has happened on the first day of the new month, and not at the weekend; so when does a new "week" start, and how many are there in a month?

g1smd

9:35 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A recent change to the WMT message. It now says "Googlebot last visited your home page on July 26th", so the nine to ten day lag is now down to six days. The last time the message changed was only two days ago. Before this ten-day lag started it used to change every three days.

However, normally I also see the Links Report update one to two days after the homepage visit message changes. Last time there was no such update.

icedowl

9:57 pm on Aug 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Mine still says July 24th. Hmmm...

g1smd

8:30 pm on Aug 2, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The 404 Error previously listed for many weeks in the WMT Crawl Errors report has finally disappeared today.

The URL was originally just a duff incoming link from some other site; the URL in the link was a typo of what it should have been. Google found the link only days after it was created, and added it to the Crawl Errors report a few days later, as the URL returned 404 and did not return content.

A few days later I set up a 301 redirect to capture any incoming traffic from that typoed link, and redirect it to the correct URL for the content.

Google has continued to show the URL as a 404 Error since that time, until some time earlier today, even though they have updated the Incoming Links report several times during that time, and updated the information about the page that contains the duff link. Google has spidered the page that contains the duff link several times. That page contains other links that are reported in the Incoming Links reports of other pages and/or sites, so showing the circumstances.

Looks like the cycle of updating the Crawl Report is a lot slower than everything else. Other WMT link reports seem to update in only three to ten days, but for the crawl error it took Google almost two months to notice that the URL status had changed from a 404 to be a 301.

Or has a WMT bug just been fixed?

Whitey

10:30 am on Aug 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



John Mu from Google says they are onto it :

It's come to our attention that some URLs are listed as 404s for some sites in Webmaster Tools even though they were apparently crawled correctly. In general, even if we were not able to crawl some URLs correctly once or twice, this should not affect a site's crawling, indexing or ranking in our search engine.

We're currently analyzing the situation and will give you more information as soon as we have it.

[edited by: Whitey at 10:31 am (utc) on Aug. 3, 2008]

g1smd

4:25 pm on Aug 3, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the update. That was mainly about pages that are 200 OK being listed as 404.

This was about a URL that turned 301 within days of Google finding the 404, but then seemingly being unable to see the 301.

I hope they are aware of that variant too.

rocco

11:26 pm on Aug 3, 2008 (gmt 0)

10+ Year Member



I am getting a bunch of wrong backlinks on "Pages with external links".

Key_Master

12:17 am on Aug 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google is continuing to show 404's for pages that do exist on my site. An interesting twist to this problem is that these same urls also show up in URLs restricted by robots.txt. You see, I blocked Googlebot from spidering wml pages some weeks ago. Crawl dates for both error reports occur on the same date.

The good news is that Googlebot no longer indicates it is finding 404's for valid html pages.

g1smd

12:48 pm on Aug 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Another glitch, or just a poor choice of words?

Site with 84 pages, fully indexed for several months, has been showing the PageRank distribution like this in WMT under Statistics > Crawl Stats:

- Roughly 95%+ Low and
- 5% Not Yet Assigned.

Now today, it shows:

- Roughly 75% Low and
- 25% Not Yet Assigned.

Hang on, "not yet assigned" says to me that 25% of the pages have never had any PageRank assigned, but as you can see, last week only about 5% didn't have any assigned.

No new pages have come online for many months, and Google has already found all of the pages that are online, many months ago. So is the "yet" word redundant, meaning that pages can have PageRank un-assigned, or is it a glitch in the reporting?

.

I took a peek at the site using a copy of IE with the Google Toolbar loaded. The root URL shows a PR of one. About 15 to 20% of the other pages show a white bar, and about 80 to 85% show a grey bar.

Whitey

11:14 am on Aug 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nearly 10 days of gray bars now. I am concerned.

One thing i noticed, is that on sites that could be having issues, the TBPR distribution appears to be responding only to pages with IBL's.

WMT shows identical distribution of TBPR to sites that have the "green" bar. Not sure if this is reliable though.

[webmasterworld.com...]

Not sure what to say as I'm confused. Yo Yo , TBPR update , WMT with glitches .....

tedster

12:31 pm on Aug 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, we have never been wise to think of Google Search as a stable thing that we can understand once and for all time. They're in a state of "perpetual beta", so new bugs are to be expected - and new features and behaviors as well.
This 49 message thread spans 2 pages: 49