Forum Moderators: Robert Charlton & goodroi
My pages are still listed as url only, so does anyone know what the deep crawls indicate, and when title/desc may appear in serps?
Thanks!
walkman:Slurp indexed them all...a day after letting the spiders in. I was shockedAnnoyed at not having proper stats ... copied the logs across ... installed AWStats. Now I'm shocked ... first report is for 4+ days (1-6 May) ... Search bots account for 30% of the hits and 23% of the bandwidth:
It has taken quite some time, but the scale of bot-scraping current at this moment is beginning to sink in.
On Google's front: I'm back in in cache, but still can't find anything by searching (not even the longest phrase, or www.name.com) but I'll give it a few more days. The cache was from last night so it may need a day or two to spread. All the pages linked from the homepage were indexed, so I just added a random feature to pull 10-15 randon pages each time. Hopefully in a week, everything will be indexed.
Assuming the site in question is the one in your profile.
Do you also have the same content on another domain like .co.uk?
I went searching for some pages from your site and some of them actually appear in the serps but are marked supplemental (possibly duplicate content?).
It however looks like Google is slowly doing something with your pages.
You might also want to look at the various pages produced by the search script you are using, it looks like title issues.
I've got to actually hit the correct keys.
some j-rk with Wget ... (50 sitemaps) kept pulling them about 10-15 times a second ... pulled over 30 times by the time I stopped it by banning his IP. My CPU was at 99% constantly...
Assuming the site in question is the one in your profile
Do you also have the same content on another domain like .co.uk?Same content, no. Another domain, yes. However, html-only site, no dupe issues.
look at the various pages produced by the search script you are using, it looks like title issues.
That is very useful advice, thank you (never thought of the obvious).
Now, if you can only work out why so many of the site:mysite.com searches are url-only you will be my saviour! No title issues there (erm, I think).
BTW: isn't The Bear the creature that classicly gets 'a sore head'? Is that why you chose your nick?! I am getting towards your (assumed) age, but no genetic gift of arthritis for me yet. Heart attacks and a wayward imagination - yes. Arthritis - no.
I took a look at your site. It looks to me like all is well.
6. Where is my page's title?Unlike many search engines, Googlebot can return results for pages that are known but haven't been crawled yet. Since we haven't looked at those pages yet, their titles aren't shown; the Google results page displays the URL instead.
[google.com...]
Your pages are dynamically generated. We are able to index dynamically generated pages. However, because our web crawler can easily overwhelm and crash sites serving dynamic content, we limit the amount of dynamic pages we index.
[google.com...]It looks to me like you have changed server location (or something) and it is just taking time to index it.
It looks to me like you have changed server location (or something) and it is just taking time to index it.
I took a look at your site. It looks to me like all is well.Well, I may agree with you (and thanks for looking), but a site:mysite.com Google search shows just 13 of the first 100 results with title + snippet. The other 87 are url-only. It also knows of 12,700 pages but only shows 908. Visitors from G have fallen by ~90% since last Dec. So, G thinks different.
It gets worse. Bots from Abovenet Communications are crawling all over my site
About the internet archive bot.
Send them an email showing the information from your log and politely ask them to look into the matter.
Explain to them that your next step if the bot isn't controlled will be a ban of the bot, but you are willing to allow them some time to make a change on their end first.
I've had very good luck with them. They listen and act when presented with hard evidence.
Now I sent you a sticky back a bit.
If I have further information I may be able to provide more help.
But I'll need specifics that don't work well when examplefied for the board.
There is something funny but I'm not sure what it is.
When I use 'poodle predictor' on your site it seems to render some strange results.
type in w*w.yoursite.com
and it seems to have numerous links to itself.
I don't know whats causing it but this could be the root of your problems.
This could explain multiple requests for robots.txt and maybe non-listings in google.
I don't know the cause - but it sure is producing strange results.
When I use 'poodle predictor' on your site it seems to render some strange results.
I certainly agree that the Home page appears to have many links to itself (looks very strange on the `Predictor' page). Easily explained: they are internal links
<a href='#link' title='A link on this page'>
duh - i should have known
In the meantime, virtually all of the tens of thousands of pages on my site are URL-only.
Is your sticky mail still down?
I have received one email notification (from Walkman's sticky) that I *had* a sticky (if you are reading this, Walkman, I sent a sticky back to you) but no sticky in the box. In fact, no mails of any kind in any box :(. Apart from that, nothing.
Send me one to test. I *really* would like to know.
[Thinks: best to test myself also, so I will try to send one to *you*, Mr Bear.]
So, I don't know what to think. I see Gbot much more frequently and much deeper than I ever did before the reincl request. But weeks and weeks go by and no changes to the stale serps.
C
And is it possible to change the 301 if it dossent work, because something must happen now, it can not be we have to create simular sites just because google can not fix this.
I was hijacked once, where another domain.com cam up when I made www.mydomain.com in google and the worst thing is that site had a meta tag that said noindex. It all happen 3 nov. from 38.000 unique visits to 0 from google
I see in the logs a hit from google with the old keyword and placement
I see this happening with my site sometimes too. So, I go to my raw stats log and extract the exact search url which shows the serps page on which the search phrase was found. I then go to that results page for the search query and see a 302. The traffic coming to my site from Google are through these 302s.
Zeus, have you checked this out, to see if the visitors are entering your site via a redirect/302 url ranking on that phrase where your canonical url used to be?
C
If a site is replaced in the serps by another site - thats hijacking, then its removed + the google bug sites.
Then the logic will tell be I would maybe not be in the serps for 1- max 2 month because of possible dublicated content which was spidered through the other googlebug(302) sites and hijacking sites, but nothing is happening after the fix, so whats wrong here? with time I realy think all this mess is still related to all those "sites" added mid last year, with that we saw alot of mess and webmasterworld was full of wierd questions and observations, which is still so, as are all the fake sites indexed.
I started this thread ... site suddenly started getting deep crawled ... only index page and ONE of my internal pages have returned with full title/desc. All others url only.
fgrep -c 'Googlebot' access_log*and
ls -al access_log*:
May 15 04:03 access_log.1 : 600
May 8 04:03 access_log.2 : 3143
May 1 04:03 access_log.3 : 568
Apr 24 04:03 access_log.4 : 713
Comparative figures for the MSN-Bot are astonishing:
May 15 04:03 access_log.1 : 5324
May 8 04:03 access_log.2 : 6154
May 1 04:03 access_log.3 : 5490
Apr 24 04:03 access_log.4 : 6230
You know google does have half-indexed pages "we know the page exists but we havn't crawled it yet" thats what url only means.
http://www.mysite.com/mfcs.php?nocompress=1&mid=30
egrep "GET /mfcs.php?(.*)&mid=30(.*)Googlebot" access_log*
66.249.64.37 - - [12/May/2005:06:55:51 +0100] "GET /mfcs.php?nocompress=1&mid=30&nid=3770 HTTP/1.0" 200 32175 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
66.249.66.137 - - [03/May/2005:20:05:36 +0100] "GET /mfcs.php?nocompress=1&mid=30 HTTP/1.1" 200 53900 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.28 - - [28/Apr/2005:07:16:22 +0100] "GET /mfcs.php?nocompress=1&mid=30&nid=3770 HTTP/1.0" 200 32342 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
66.249.71.32 - - [17/Apr/2005:07:49:38 +0100] "GET /mfcs.php?nocompress=1&mid=30&nid=3770 HTTP/1.0" 200 32110 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"
"we know the page exists but we havn't crawled it yet" thats what url only means.
17 of the first 20 SERPs for a site:mysite.com are url-only. Using a variation of the following I found that:
The foll is from just 4+ weeks of logs.
egrep "GET /mfcs.php\?nocompress=1&mid=30 (.*)Googlebot" access_log*
66.249.66.137 - - [03/May/2005:20:05:36 +0100] "GET /mfcs.php?nocompress=1&mid=30 HTTP/1.1" 200 53900 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
URL-only:
01#1: 66.249.66.137 - - [03/May/2005:20:05:36 +0100] HTTP/1.1 200 53900 "-" "Mozilla/5.0
02#1: 66.249.66.137 - - [03/May/2005:09:06:14 +0100] HTTP/1.1 200 7021 "-" "Mozilla/5.0
03#1: 66.249.66.137 - - [03/May/2005:21:34:04 +0100] HTTP/1.1 200 6929 "-" "Mozilla/5.0
04#1: [no hit]
05#1: 66.249.66.137 - - [03/May/2005:13:43:55 +0100] HTTP/1.1 200 7408 "-" "Mozilla/5.0
.
06#1: 66.249.64.79 - - [26/Apr/2005:13:10:23 +0100] HTTP/1.0 200 32584 "-" "Googlebot/2.1
06#2: 66.249.65.80 - - [29/Apr/2005:17:41:23 +0100] HTTP/1.1 200 8188 "-" "Mozilla/5.0
.
08#1: 66.249.66.137 - - [03/May/2005:22:55:22 +0100] HTTP/1.1 200 6226 "-" "Mozilla/5.0
09#1: 66.249.66.137 - - [03/May/2005:23:42:48 +0100] HTTP/1.1 200 7750 "-" "Mozilla/5.0
10#1: [no hit]
11#1: 66.249.66.137 - - [03/May/2005:13:40:59 +0100] HTTP/1.1 200 7644 "-" "Mozilla/5.0
12#1: 66.249.66.137 - - [03/May/2005:22:59:54 +0100] HTTP/1.1 200 7678 "-" "Mozilla/5.0
.
14#1: 66.249.66.73 - - [13/May/2005:04:11:40 +0100] HTTP/1.1 200 6458 "-" "Mozilla/5.0
14#2: 66.249.66.137 - - [04/May/2005:01:59:26 +0100] HTTP/1.1 200 6594 "-" "Mozilla/5.0
.
15#1: 66.249.66.137 - - [04/May/2005:04:01:49 +0100] HTTP/1.1 200 10504 "-" "Mozilla/5.0
16#1: 66.249.66.137 - - [03/May/2005:19:41:11 +0100] HTTP/1.1 200 6964 "-" "Mozilla/5.0
18#1: 66.249.66.137 - - [03/May/2005:19:41:11 +0100] HTTP/1.1 200 6964 "-" "Mozilla/5.0
19#1: 66.249.66.137 - - [03/May/2005:20:25:51 +0100] HTTP/1.1 200 6931 "-" "Mozilla/5.0
20#1: 66.249.66.137 - - [03/May/2005:21:39:45 +0100] HTTP/1.1 200 7764 "-" "Mozilla/5.0
.
Title + snippet:
07#1: 66.249.71.69 - - [15/May/2005:07:27:16 +0100] HTTP/1.0 200 43804 "-" "Googlebot/2.1
07#2: 66.249.71.73 - - [30/Apr/2005:23:03:22 +0100] HTTP/1.0 200 43630 "-" "Googlebot/2.1
07#3: 66.249.64.30 - - [20/Apr/2005:01:19:02 +0100] HTTP/1.0 200 43625 "-" "Googlebot/2.1
.
13#1: 66.249.71.28 - - [15/May/2005:00:14:51 +0100] HTTP/1.0 200 42648 "-" "Googlebot/2.1
13#2: 66.249.71.72 - - [30/Apr/2005:21:33:47 +0100] HTTP/1.0 200 44252 "-" "Googlebot/2.1
13#3: 66.249.64.66 - - [20/Apr/2005:00:01:04 +0100] HTTP/1.0 200 41401 "-" "Googlebot/2.1
.
17#1: 66.249.64.33 - - [15/May/2005:02:41:22 +0100] HTTP/1.0 200 27565 "-" "Googlebot/2.1
17#2: 66.249.71.18 - - [30/Apr/2005:21:53:21 +0100] HTTP/1.0 200 27547 "-" "Googlebot/2.1
17#3: 66.249.71.29 - - [20/Apr/2005:01:08:09 +0100] HTTP/1.0 200 27493 "-" "Googlebot/2.1
___________________________________________________________________