Have you done 301 redirects from the old pages to the new? If not, that's your problem. Google doesn't index the new pages because they are duplicates. Definitely redirect ASAP.
If, as I was once, you had a site on an ISP where you can't do 301s, add a META ROBOTS = "NOINDEX, FOLLOW" to all your old pages so that when Google goes back, it drops them.
And make sure the old pages point to new pages, and are not simply interlinked amongst the old selves, or the new pages won't have enough ways in for Google to call by very often.
I did this quite successfully - lost no pages on the way - but it took quite a time for the changeover to complete!
I m facing the same problem :
Google bot is crawling my site but not indexing.
in the previous week i did a lot of change like I have implemented mod_rewrite in my php pages.
Where index.php has become index.html, module.php has become modules.html and so on
The name of the pages have remain same in most of the cases. I have also implemented the dynamic title modification. That's means lots of change in the whole site.
After the modification i watched the deep crawl of google in my website. And I was very happy to see that it has indexed my new modified pages.
But it was only for 1 day. After 1 day google has dropped all my new pages from it's index and put all my previous links in the search result.
I can see see google is crawling my website each day for more than 10 times and crawling learge amount of pages,consuming a great amount of my bandwidth but it is not indexing the new pages at all.What should I do?
I don't have the previous pages anymore so it is not possible to put noarchive in my meta tags.
And I don't know about 301 redirect. Can I use it with my php pages. If I can then how?
I have become frastated now on this situation.
What should I do?
Yep, I've got a 301 redirect in place. It redirects the pages just fine but I wonder if the response.status isn't getting conveyed to Googlebot. Would the log files shed any light on that?
All the old pages are gone, so the noindex, nofollow idea wouldn't work here... altho that's a good one and I'll keep it in mind for future projects. (Thanks, DerekH.)
I'm still at a loss. We have the redirect script, we have a site map, we have lots of interlinking amongst the pages ...
this is what I tracked today!
Online? User IP Address Host Name Last Viewed Hits
saint 188.8.131.52 184.108.40.206 2005-01-13 20:18:52 9
Yes 220.127.116.11 cache130.156ce.maxonline.com.sg 2005-01-13 20:14:41 2
Yes 18.104.22.168 22.214.171.124 2005-01-13 20:05:56 5
Yes 126.96.36.199 host81-153-137-0.range81-153.btcentralplus.com 2005-01-13 20:00:50 1
Yes 188.8.131.52 user-0c99qjc.cable.mindspring.com 2005-01-13 19:56:47 3
184.108.40.206 rtools3.yst.corp.yahoo.com 2005-01-13 19:19:06 1
220.127.116.11 crawl-66-249-64-79.googlebot.com 2005-01-13 19:18:42 1
18.104.22.168 lj1353.inktomisearch.com 2005-01-13 19:14:12 1
22.214.171.124 adsl-69-225-193-118.dsl.scrm01.pacbell.net 2005-01-13 19:10:10 2
126.96.36.199 d211-29-175-138.dsl.nsw.optusnet.com.au 2005-01-13 19:06:13 2
188.8.131.52 lj1020.inktomisearch.com 2005-01-13 18:46:20 1
184.108.40.206 crawl-66-249-71-72.googlebot.com 2005-01-13 18:40:07 1
220.127.116.11 crawl-66-249-66-75.googlebot.com 2005-01-13 18:36:14 17
18.104.22.168 HSE-Toronto-ppp295725.sympatico.ca 2005-01-13 18:11:07 2
22.214.171.124 crawl-66-249-71-40.googlebot.com 2005-01-13 18:10:00 1
126.96.36.199 crawl-66-249-71-32.googlebot.com 2005-01-13 18:09:47 1
188.8.131.52 d198-53-226-252.abhsia.telus.net 2005-01-13 17:58:31 2
184.108.40.206 crawl-66-249-64-37.googlebot.com 2005-01-13 17:50:41 2
220.127.116.11 18.104.22.168 2005-01-13 17:44:39 17
22.214.171.124 c-24-17-93-170.client.comcast.net 2005-01-13 16:52:42 1
126.96.36.199 crawl-66-249-71-73.googlebot.com 2005-01-13 16:48:52 2
188.8.131.52 c-144fe353.545-1-64736c10.cust.bredbandsbolaget.se 2005-01-13 16:48:38 1
184.108.40.206 ppp-220.127.116.11.revip.asianet.co.th 2005-01-13 16:36:49 1
18.104.22.168 lutn-cache-5.server.ntli.net 2005-01-13 16:35:57 1
22.214.171.124 crawl-66-249-71-28.googlebot.com 2005-01-13 16:34:30 1
126.96.36.199 d57-198-124.home.cgocable.net 2005-01-13 16:16:16 1
188.8.131.52 dorm83194.dorm-net.louisville.edu 2005-01-13 16:10:42 1
184.108.40.206 cs214310.pws.uscs.susx.ac.uk 2005-01-13 15:54:46 1
220.127.116.11 crawl-66-249-64-66.googlebot.com 2005-01-13 15:10:35 1
18.104.22.168 c211-30-34-71.rivrw6.nsw.optusnet.com.au 2005-01-13 15:08:07 2
22.214.171.124 mfb.xs4all.nl 2005-01-13 14:55:33 4
126.96.36.199 dD576522C.access.telenet.be 2005-01-13 14:47:13 2
188.8.131.52 nperspectief01.nieuw-perspectief.nl 2005-01-13 14:41:00 1
184.108.40.206 220.127.116.11 2005-01-13 14:39:12 10
18.104.22.168 host-22.214.171.124.tedata.net 2005-01-13 14:38:51 1
126.96.36.199 lj2066.inktomisearch.com 2005-01-13 14:34:01 1
Google has given me more than one visit and it has crawled my pages. But it is not indexing. I removed all my .php files from previous website by google remove all tool. it did it good. but now Why is it not indexing my new pages?
Am I penalized for duplicate content. If I m penalized for duplicate content why I m visited by the bot?
Any answer or suggestion . It would be great. I can't see any solution till now.
I don't understand at all.
My site is showing 2 months old results with "Supplemental Result". Googlebot visited my site before 6 days but still no update. The cache date shows 1969. Its 2 months since this stuff of not updating. I think its serious now.
Can anyone plz help me out?
I can see lots of visit record of google today also.
but When I checked what it crawled I can understand it has only visited my web root that means / directory for lots of times.
here is one line from my rawlog file today.
188.8.131.52 - - [14/Jan/2005:00:25:21 -0800] "GET / HTTP/1.1" 200 10884 "-" "Mediapartners-Google/2.1"
184.108.40.206 - - [14/Jan/2005:00:25:23 -0800] "GET / HTTP/1.1" 200 10886 "-" "Mediapartners-Google/2.1"
220.127.116.11 - - [14/Jan/2005:00:25:23 -0800] "GET / HTTP/1.1" 200 53938 "-" "Mediapartners-Google/2.1"
18.104.22.168 - - [14/Jan/2005:00:25:24 -0800] "GET / HTTP/1.1" 200 10886 "-" "Mediapartners-Google/2.1"
22.214.171.124 - - [14/Jan/2005:00:25:25 -0800] "GET / HTTP/1.1" 200 51078 "-" "Mediapartners-Google/2.1"
126.96.36.199 - - [14/Jan/2005:00:25:26 -0800] "GET / HTTP/1.1" 200 10883 "-" "Mediapartners-Google/2.1"
188.8.131.52 - - [14/Jan/2005:00:25:27 -0800] "GET / HTTP/1.1" 200 53938 "-" "Mediapartners-Google/2.1"
184.108.40.206 - - [14/Jan/2005:00:25:29 -0800] "GET / HTTP/1.1" 200 10885 "-" "Mediapartners-Google/2.1"
220.127.116.11 - - [14/Jan/2005:00:25:30 -0800] "GET / HTTP/1.1" 200 51078 "-" "Mediapartners-Google/2.1"
I think GET / means the webroot directory. But it has requested this for 331 times today. What's the problem? has it enterd in a loop?
Wish a reply from the Search engine gurus. Please help.
Thanks in advance.
[edited by: coolsaint at 11:32 pm (utc) on Jan. 14, 2005]
Mediapartners-Google/2.1 is not googlebot. Its an adsense bot.
btw Googlebot is mad!. It doesnt update since 2 months
I don't think you r right . Because I think it is google bot .
you can see here for google crawler information :
My website in google index has been change more than 5 times in last month and till now. So as you have said google hasn't updated it's index for 2 months I wish wrong information.
Can anyone help on this case please?
If the user agent is Mediapartners-Google/2.1 then that is the adsense bot. No question about it.
We lost serps and cache even though we were being crawled daily.
Last check seems that the index is working ok again but the serps are still changing. Have you seen any changes in the last 24 hours?
For me, I had 500+ hits this month on 8th Jan. But it shows the same old stuff in its cache/index with 1969. I have PR 4
To clear this up, the above bot is merely for letting adsense know what the page contains so relevant ads can be shown. Usually a page will be in the main Google index and info is already known, if there is no info then a copy of the page is requested by the seperate bot.
This bot DOES NOT pass the details to the main Google database, it is for adsense only. The usual pattern is that the adsense bot can retrieve the page almost immediately as the strain on it is nowhere near as bad as the crawls that GoogleBot does.
I can't explain the multiple requests for the same page, could it be there is something different such as many domains pointing to same content (bad) or possibly you have logs not recording querystrings or session IDs.
Interestingly, it is possible to get adsense to show what ads would be shown for any page you wish (find out what google thinks the page is really about, sometimes a surprise). Just lookup and install Adsense preview tool (IE), it will show the adverts that are associated with the words it thinks the site is about.
Nothing too exciting and definately not a way to jump any queues.
I changed 100s of dynamic urls/pages because of duplicate content worries...(anything to get some google traffic!)
Didn't do any good because although 2 months later most of them are now correctly indexed, we are still sandboxed to hell...