homepage Welcome to WebmasterWorld Guest from 54.237.78.165
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Pages Crawled But Not Indexed
mightymid

10+ Year Member



 
Msg#: 27493 posted 9:04 pm on Jan 12, 2005 (gmt 0)

We recently restructured our site. The domain name stayed the same as did the content. However, all the page URLs were changed, and consequently so were all the internal links.

We launched the restructured site over four months ago. Although Googlebot seems to visit often, Google hasn't indexed many of our "new" pages -- most of our listings in the SERPS reflect the pages with the old URLs.

Four months seems especially long. Has anyone else experienced such a big lag-time w/ G?

 

diamondgrl

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27493 posted 10:22 pm on Jan 12, 2005 (gmt 0)

Have you done 301 redirects from the old pages to the new? If not, that's your problem. Google doesn't index the new pages because they are duplicates. Definitely redirect ASAP.

DerekH

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27493 posted 11:04 pm on Jan 12, 2005 (gmt 0)

If, as I was once, you had a site on an ISP where you can't do 301s, add a META ROBOTS = "NOINDEX, FOLLOW" to all your old pages so that when Google goes back, it drops them.
And make sure the old pages point to new pages, and are not simply interlinked amongst the old selves, or the new pages won't have enough ways in for Google to call by very often.
I did this quite successfully - lost no pages on the way - but it took quite a time for the changeover to complete!
DerekH

coolsaint

5+ Year Member



 
Msg#: 27493 posted 4:18 am on Jan 13, 2005 (gmt 0)

Dear Friends,

I m facing the same problem :

Google bot is crawling my site but not indexing.

in the previous week i did a lot of change like I have implemented mod_rewrite in my php pages.
Where index.php has become index.html, module.php has become modules.html and so on
The name of the pages have remain same in most of the cases. I have also implemented the dynamic title modification. That's means lots of change in the whole site.

After the modification i watched the deep crawl of google in my website. And I was very happy to see that it has indexed my new modified pages.
But it was only for 1 day. After 1 day google has dropped all my new pages from it's index and put all my previous links in the search result.

I can see see google is crawling my website each day for more than 10 times and crawling learge amount of pages,consuming a great amount of my bandwidth but it is not indexing the new pages at all.What should I do?

I don't have the previous pages anymore so it is not possible to put noarchive in my meta tags.
And I don't know about 301 redirect. Can I use it with my php pages. If I can then how?
I have become frastated now on this situation.
What should I do?

cooolsaint
www.aanchol.com
coolsaint@gmail.com

mightymid

10+ Year Member



 
Msg#: 27493 posted 2:48 pm on Jan 13, 2005 (gmt 0)

Yep, I've got a 301 redirect in place. It redirects the pages just fine but I wonder if the response.status isn't getting conveyed to Googlebot. Would the log files shed any light on that?

All the old pages are gone, so the noindex, nofollow idea wouldn't work here... altho that's a good one and I'll keep it in mind for future projects. (Thanks, DerekH.)

I'm still at a loss. We have the redirect script, we have a site map, we have lots of interlinking amongst the pages ...

coolsaint

5+ Year Member



 
Msg#: 27493 posted 4:25 am on Jan 14, 2005 (gmt 0)

this is what I tracked today!

Online? User IP Address Host Name Last Viewed Hits
saint 203.91.159.171 203.91.159.171 2005-01-13 20:18:52 9
Yes 202.156.2.130 cache130.156ce.maxonline.com.sg 2005-01-13 20:14:41 2
Yes 203.91.159.171 203.91.159.171 2005-01-13 20:05:56 5
Yes 81.153.137.0 host81-153-137-0.range81-153.btcentralplus.com 2005-01-13 20:00:50 1
Yes 24.148.234.108 user-0c99qjc.cable.mindspring.com 2005-01-13 19:56:47 3
66.228.164.141 rtools3.yst.corp.yahoo.com 2005-01-13 19:19:06 1
66.249.64.79 crawl-66-249-64-79.googlebot.com 2005-01-13 19:18:42 1
66.196.91.133 lj1353.inktomisearch.com 2005-01-13 19:14:12 1
69.225.193.118 adsl-69-225-193-118.dsl.scrm01.pacbell.net 2005-01-13 19:10:10 2
211.29.175.138 d211-29-175-138.dsl.nsw.optusnet.com.au 2005-01-13 19:06:13 2
66.196.90.36 lj1020.inktomisearch.com 2005-01-13 18:46:20 1
66.249.71.72 crawl-66-249-71-72.googlebot.com 2005-01-13 18:40:07 1
66.249.66.75 crawl-66-249-66-75.googlebot.com 2005-01-13 18:36:14 17
64.231.34.119 HSE-Toronto-ppp295725.sympatico.ca 2005-01-13 18:11:07 2
66.249.71.40 crawl-66-249-71-40.googlebot.com 2005-01-13 18:10:00 1
66.249.71.32 crawl-66-249-71-32.googlebot.com 2005-01-13 18:09:47 1
198.53.226.252 d198-53-226-252.abhsia.telus.net 2005-01-13 17:58:31 2
66.249.64.37 crawl-66-249-64-37.googlebot.com 2005-01-13 17:50:41 2
196.40.5.138 196.40.5.138 2005-01-13 17:44:39 17
24.17.93.170 c-24-17-93-170.client.comcast.net 2005-01-13 16:52:42 1
66.249.71.73 crawl-66-249-71-73.googlebot.com 2005-01-13 16:48:52 2
83.227.79.20 c-144fe353.545-1-64736c10.cust.bredbandsbolaget.se 2005-01-13 16:48:38 1
210.86.223.133 ppp-210.86.223.133.revip.asianet.co.th 2005-01-13 16:36:49 1
62.252.64.16 lutn-cache-5.server.ntli.net 2005-01-13 16:35:57 1
66.249.71.28 crawl-66-249-71-28.googlebot.com 2005-01-13 16:34:30 1
24.57.198.124 d57-198-124.home.cgocable.net 2005-01-13 16:16:16 1
136.165.83.194 dorm83194.dorm-net.louisville.edu 2005-01-13 16:10:42 1
139.184.36.15 cs214310.pws.uscs.susx.ac.uk 2005-01-13 15:54:46 1
66.249.64.66 crawl-66-249-64-66.googlebot.com 2005-01-13 15:10:35 1
211.30.34.71 c211-30-34-71.rivrw6.nsw.optusnet.com.au 2005-01-13 15:08:07 2
213.84.221.11 mfb.xs4all.nl 2005-01-13 14:55:33 4
213.118.82.44 dD576522C.access.telenet.be 2005-01-13 14:47:13 2
62.234.133.74 nperspectief01.nieuw-perspectief.nl 2005-01-13 14:41:00 1
213.190.147.194 213.190.147.194 2005-01-13 14:39:12 10
81.10.40.176 host-81.10.40.176.tedata.net 2005-01-13 14:38:51 1
68.142.249.76 lj2066.inktomisearch.com 2005-01-13 14:34:01 1

Google has given me more than one visit and it has crawled my pages. But it is not indexing. I removed all my .php files from previous website by google remove all tool. it did it good. but now Why is it not indexing my new pages?

Am I penalized for duplicate content. If I m penalized for duplicate content why I m visited by the bot?

Any answer or suggestion . It would be great. I can't see any solution till now.

Fstorm

10+ Year Member



 
Msg#: 27493 posted 5:11 am on Jan 14, 2005 (gmt 0)

I don't understand at all.
My site is showing 2 months old results with "Supplemental Result". Googlebot visited my site before 6 days but still no update. The cache date shows 1969. Its 2 months since this stuff of not updating. I think its serious now.

Can anyone plz help me out?

Thanks

coolsaint

5+ Year Member



 
Msg#: 27493 posted 11:22 pm on Jan 14, 2005 (gmt 0)

I can see lots of visit record of google today also.

but When I checked what it crawled I can understand it has only visited my web root that means / directory for lots of times.

here is one line from my rawlog file today.

66.249.66.200 - - [14/Jan/2005:00:25:21 -0800] "GET / HTTP/1.1" 200 10884 "-" "Mediapartners-Google/2.1"
66.249.66.200 - - [14/Jan/2005:00:25:23 -0800] "GET / HTTP/1.1" 200 10886 "-" "Mediapartners-Google/2.1"
66.249.66.200 - - [14/Jan/2005:00:25:23 -0800] "GET / HTTP/1.1" 200 53938 "-" "Mediapartners-Google/2.1"
66.249.66.200 - - [14/Jan/2005:00:25:24 -0800] "GET / HTTP/1.1" 200 10886 "-" "Mediapartners-Google/2.1"
66.249.66.200 - - [14/Jan/2005:00:25:25 -0800] "GET / HTTP/1.1" 200 51078 "-" "Mediapartners-Google/2.1"
66.249.66.200 - - [14/Jan/2005:00:25:26 -0800] "GET / HTTP/1.1" 200 10883 "-" "Mediapartners-Google/2.1"
66.249.66.200 - - [14/Jan/2005:00:25:27 -0800] "GET / HTTP/1.1" 200 53938 "-" "Mediapartners-Google/2.1"
66.249.66.200 - - [14/Jan/2005:00:25:29 -0800] "GET / HTTP/1.1" 200 10885 "-" "Mediapartners-Google/2.1"
66.249.66.200 - - [14/Jan/2005:00:25:30 -0800] "GET / HTTP/1.1" 200 51078 "-" "Mediapartners-Google/2.1"

I think GET / means the webroot directory. But it has requested this for 331 times today. What's the problem? has it enterd in a loop?
Wish a reply from the Search engine gurus. Please help.
Thanks in advance.

[edited by: coolsaint at 11:32 pm (utc) on Jan. 14, 2005]

Fstorm

10+ Year Member



 
Msg#: 27493 posted 11:29 pm on Jan 14, 2005 (gmt 0)

Mediapartners-Google/2.1 is not googlebot. Its an adsense bot.
btw Googlebot is mad!. It doesnt update since 2 months

coolsaint

5+ Year Member



 
Msg#: 27493 posted 12:20 am on Jan 15, 2005 (gmt 0)

I don't think you r right . Because I think it is google bot .

hostname:
crawl-66-249-66-200.googlebot.com
ip :
66.249.66.200

you can see here for google crawler information :

[google-dance-tool.com...]

My website in google index has been change more than 5 times in last month and till now. So as you have said google hasn't updated it's index for 2 months I wish wrong information.

Can anyone help on this case please?

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 27493 posted 12:47 am on Jan 15, 2005 (gmt 0)

If the user agent is Mediapartners-Google/2.1 then that is the adsense bot. No question about it.

layer8

10+ Year Member



 
Msg#: 27493 posted 4:47 pm on Jan 15, 2005 (gmt 0)

We lost serps and cache even though we were being crawled daily.

Last check seems that the index is working ok again but the serps are still changing. Have you seen any changes in the last 24 hours?

Fstorm

10+ Year Member



 
Msg#: 27493 posted 2:17 am on Jan 16, 2005 (gmt 0)

For me, I had 500+ hits this month on 8th Jan. But it shows the same old stuff in its cache/index with 1969. I have PR 4

inbound

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 27493 posted 2:35 am on Jan 16, 2005 (gmt 0)

Mediapartners-Google/2.1 info

To clear this up, the above bot is merely for letting adsense know what the page contains so relevant ads can be shown. Usually a page will be in the main Google index and info is already known, if there is no info then a copy of the page is requested by the seperate bot.

This bot DOES NOT pass the details to the main Google database, it is for adsense only. The usual pattern is that the adsense bot can retrieve the page almost immediately as the strain on it is nowhere near as bad as the crawls that GoogleBot does.

I can't explain the multiple requests for the same page, could it be there is something different such as many domains pointing to same content (bad) or possibly you have logs not recording querystrings or session IDs.

Interestingly, it is possible to get adsense to show what ads would be shown for any page you wish (find out what google thinks the page is really about, sometimes a surprise). Just lookup and install Adsense preview tool (IE), it will show the adverts that are associated with the words it thinks the site is about.

Nothing too exciting and definately not a way to jump any queues.

nzmatt

10+ Year Member



 
Msg#: 27493 posted 2:44 am on Jan 16, 2005 (gmt 0)

I changed 100s of dynamic urls/pages because of duplicate content worries...(anything to get some google traffic!)

Didn't do any good because although 2 months later most of them are now correctly indexed, we are still sandboxed to hell...

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved