Forum Moderators: open

Message Too Old, No Replies

Erratic Google spidering and indexing.

I've waited for 3 weeks, but now have to face the music: I've got a problem

         

killroy

11:56 am on Aug 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok, I've waited 3 weeks in the hope this would resolve itself but now I'm at a loss.

I've launched a new website. I've kept the design plain, the content rich. I linked to it from several PR6 pages.

Within hours googlebot was all over it. It kept coming back, respidering everything for around 3-4 days, then it popped 10 pages into the index.

So far so good.

Then, as content development progressed I started to notice that no new pages were added. Even though I changed my PR6 inbounds to point to various deep locations not yet in the index. For the next 2 weeks, Googlebot would come back every 2 days and grab the homepage and one other page (which isn't any different from the other content pages). And sure enough those two pages always updated in the SERPS and had a fresh tag from the day before. None of the other pages have been updated since (I changed the titles and those are not reflected in the SERPS).

I started becoming suspicious and frantically checked all relevant areas. Robots.txt was my first guess as I had added it later on, so I renamed it. Checking it, everything seemed in order but I wanted to make sure.

Well, after some more soul searching I realised that somehow my brain must have short circuited and equalled "NONE" to "NOT(noindex,nofollow)", when it in fact means "noindex, nofollow". That's what happens if you code too long.

Hopefull that I'd found my culprit I removed the offending tag.

And, lo and behold, the next day (google still kept coming back daily for those other two pages) It grabbed a whole new set of pages, perhaps another 10 or so.

Please note that at this point I had over 90 pages of great content, and was starting to get morale problems with my content writers as they didn't see any fruits for their labour.

Well, the sad fact is, that Google STILL freshdates the two original pages daily, STILL has the 10 original pages in the index (8 not updated since the first inclusion 3 weeks ago) and STILL hasn't added any of the new pages.

In fact one of the two freshed pages recieves hits for keywords on pages 8 and up for which I've written whole new targeted pages that would rank top 10 if they were just included.

I have now added a "sponsored by" link on some of my larger pages, pointing to various deep pages in this new site, so there must be dozends of PR6 and PR5 inbound links by now. In addition it is listed in many directories and other search engines.

Googlebot already has respidered the two "special" pages today, and I hope it will spider more of the other pages. But as long as they don't get into the index, all this spidering is useless.

I have tried the google directive "Allow" in my robots.txt

Sould I try the meta content="all" too?

What else can I do?

Curiously enough with the meta content="none" shouldn't google have stopped respidering and refreshing those two "special" pages too? The metas are exactly the same in all the pages.

I'm a bit at a loss. I've now republished the site on a subdomain to test if google will pick it up from there.

The pages are VERY simple in construction, no javascript or any funny busines, CSS styled, very little navigation fluff.

I'm at a loss, how do I get google to include the spidered pages in the SERPS?

I appreciate all your help and your time to read my "epic" post, I wouldn't bother all of you if I wasn't very desperate.

SN

killroy

12:17 pm on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



* BUMP *

Just got aproved...

SN

BlueSky

12:44 pm on Aug 19, 2003 (gmt 0)

10+ Year Member



I had problems with Googlebot and metatags. On some pages I put the noindex, nofollow metatag. The little guy kept ignoring it and indexing those pages. Luckily, he recognizes regular expressions in robots.txt which is how I finally got him to behave. If Googlebot is paying attention to your metatag, I'm kinda envious.

No idea what's happening on yours. Have you checked the headers on the indexed pages to see when they expire? You mention about changing your inbound links. What I've done is leave the links the same and get new links when I put up new content. Maybe he doesn't like you changing them around like you're doing?

killroy

12:50 pm on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, he should follow the internal links though too, even without inbound. In fact he never even updated the other 8 pages after the initial index, while hte two "special" pages are fresh dated every day.

SN

killroy

9:47 am on Aug 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Several days laterI'm still experiencing similar problems. The one bout of new swpiderigns after I corrected the tags has never resulted in any other updates or new inclusions, while the two magical pages still get fresh dated every day, while the other 8 pages from the original spidering have yet to be brought up to date.

In fact Goolgebot's visits have slowed again to its regular checkup on the magic pages.

I have now a total of over 10000 backlinks from PR0-PR2 pages to all the internal pages, but none are followed properly.

Would it perhaps help to relink everything to the homepage, to create PR that will let googlebot follow more deeply into th enew site?

Does it ignore transfered PR to pages it doesnT spider?

I'm still desperate for a resolution of this issue.

Thanks,

SN

MonkeeSage

9:51 am on Aug 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As I understand Google has two kinds of bots, one is for fresh pages, and one is for deep crawling. Mabye you are only getting hit by freshbot and Google just hasn't been deep crawling lately?

Jordan

killroy

10:32 am on Aug 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, After recent discussions I was under the impresssion that this split behaviour is now a moot point.

The issue with fresh crawls would be everflux, on and off. But that is not the case. The initial ten pages are rock solid. 8 still in their "virgin" state (as inddexed) several weeks later, while the two "magical" pages get a fresh date every day and updated content. While the few other pages it grabbed right after changing the tags have never entered the serps, not in an everflux fashion or otherwise, and I chack so often I shurly would've seen them

SN

George Abitbol

10:53 am on Aug 20, 2003 (gmt 0)

10+ Year Member



Well, like I say in [webmasterworld.com...] I have a very similar problem. The difference is that this is not a new site, it's been online for more than 2 years now. Like you, I got two pages that are updated everyday but none of the about 10 pages I added during last 2 or 3 weeks were indexed. I can see they have been crawled. I changed nothing on my pages, didn't modify the metas or anything (since everything had always worked fine, I didn't have any reason to change). Google doesn't seem to pay much attention to metas anyway...

New pages are linked by the home page (which is PR4 and which is one of the two "magical" pages that are updated everyday).
The last new page indexed by Google on my site is dated 1st of August 2003.

Has someone else noticed similar behavior?

George Abitbol

killroy

11:25 am on Aug 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That is two with "magicla" daily fresh pages, a new site and an established site. if there are any others, please speak up. If this is a trend it might be something google wants to look into.

SN

mcavill

11:35 am on Aug 20, 2003 (gmt 0)

10+ Year Member



My situation is similar my site is only about a month old.

Google initially indexed about 20 pages - index page freshed everyday - initial 20 pages still in index - entire site of about 65 pages was hit for 2 days in a row about 4 days ago, most pages appeared in the index for about 2 hours and then disappeared.

I'm not spamming, trying to follow bretts guidelines with good content etc - got some good links from PR3 - 5 sites / Dmoz / GoGuides etc - I'm hoping it's just a case of early days.

<added>
Another thing I've noticed is that I'm getting some hits from google.ca on some competitive keywords, i'm using a UK based host that I believe is hosted in Gloucester, UK
</added>

killroy

11:42 am on Aug 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do you have recent inbound links of PR3 or up?

SN

mcavill

12:11 pm on Aug 20, 2003 (gmt 0)

10+ Year Member



yes a couple of pr 5's (both on the toolbar and in the directory) - but just to my index page

albert

12:33 pm on Aug 20, 2003 (gmt 0)

10+ Year Member



Just some thoughts:

clean, easy-to-spider URLs?

no JavaScript links?

sitemap?

build some deep links from the spidered pages to other pages?

go for inbound links to some of your deep pages?

[edit]important: validate HTML[/edit]

George Abitbol

12:50 pm on Aug 20, 2003 (gmt 0)

10+ Year Member



Albert : as for my site, the URLs of the new pages are similar to the ones of the pages that were added until the 1st of August, those which were correctly indexed. This is a movie reviews site. So I regularly add reviews, but reviews added after this date were not indexed (yet they were crawled, so this is not an URL problem, I think). No Javascript link, I use CSS-based design and valid XHTML Strict 1.1, looks even fine with Lynx. New pages get a direct link from the daily-updated home page.

George Abitbol

BlueSky

12:53 pm on Aug 20, 2003 (gmt 0)

10+ Year Member



I have now a total of over 10000 backlinks from PR0-PR2 pages to all the internal pages, but none are followed properly.

Is this part of some link exchange? Are these sites all related in content to yours?

When I go out to get links to my new pages, I go to a few PR6/7 sites which cover the exact same subject area. I experimented with this. If I get a link from just one of them, Googlebot comes by within 24 hours and grabs the new page and it shows up in the SERPS within 48 hours later.

I've never attempted to get tens of thousands of links. How exactly are you getting these links and could they be the reason for the odd behavior you're seeing? Maybe it's flagging that many as being unnatural for a new site?


clean, easy-to-spider URLs? -- Yes, changed my URLs to static looking ones via mod_rewrite
no JavaScript links? -- None
sitemap? -- Yes
build some deep links from the spidered pages to other pages? -- Yes
go for inbound links to some of your deep pages? -- Yes
important: validate HTML -- Yes, valid XHTML

killroy

12:54 pm on Aug 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



All that is taken into account. In fact this site was constructed form ground up to be ideal structure over style, content over fluff. I'd say near perfect themed strucuter (as far as possible for such a small site). Great URLs and sspider-perfect html structure (all valid of course).

considerign I have a few sites with a total of about 30000 pages in google that are completely messy, don't valuidate, use frames, use javascript and so on and rank top accross the board, I simply don't understand it.

Perhaps google likes messy code better? ;)

SN

mipapage

9:45 pm on Aug 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are we maybe asking for too much too soon?

I've seen new pages within a 'mature' site get added to the index rather quickly (talking days here to get stable).

I have no experience with fresh new sites, tho we will be launching one later this week - that's what brought me to this thread.

With the 'old Google' we'd never expect a site to be in within three weeks, would we? With the 'new Google', maybe fresh new sites take a bit longer than an actual mature site with that have their own pr. Maybe once you get some pr attributed to the site things will change.

In addition, over the last PR update, we had some internal pages of our business site move up to pr5, and now they get freshed everyday.

Not sure if any of this blither helps - what I figure is that maybe you still may need a month or more for a new site to really root into Google...

oliphaunt

10:46 pm on Aug 20, 2003 (gmt 0)

10+ Year Member



I have a very similar problem.

Although Google came in the beginning and spidered about 10 Pages and then came quite frequently to take the robots and index file, it stopped now for 10 days completely.

The site is 2 month old and even though it has over 500 backward links, which are shown in Google, it has no ranking at all.

I have over 20 sites, which I look after, but this never happened before and I canīt find any reason for it.

GOOGLE HELP :)

yankee

11:04 pm on Aug 20, 2003 (gmt 0)

10+ Year Member



I've noticed the same thing. New pages from low PR sites take a real loooooooong time to get in. This is something Google needs to improve on with freshdeepbot. The old monthly deepbot would get new pages from low PR sites in the index every cycle.

killroy

10:47 am on Aug 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm, just got into DMOZ, three editors visited my site yesterday, I guess they wanted to make sure. It shuws up for a search on DMOZ, but not yet the category listing.

Maybe this is now the "seal of approval" that Google needed.

Kudos to hte DMOZ folks, pretty quick turn over for a brand new site. Thanks heaps.

SN

George Abitbol

11:19 am on Aug 21, 2003 (gmt 0)

10+ Year Member



Well, I've been in DMOZ for years now, so I don't seem to need that seal ;-)
Anyway, seems like the number of pages from my site that Google has in his index is dropping everyday now... Better get prepared to no listing at all soon, I guess...

George Abitbol

dirkz

1:31 pm on Sep 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a very similar problem. Fresh site, Googlebot picked it up 3 days after the first launch, crawled two levels deep (but only a few links). Then after that, I appeared in the SERPs (only by a search for my site, but that's allright, since I have not much links yet). It only shows my index file (in minor variations). Until today Googlebot only visits "/robots.txt" (which is not existing) and "/". Every day. Nothing else. And it doesn't even update the text in the SERPs, although I changed it. This goes on for about 20 days now.

edavid

4:24 pm on Sep 9, 2003 (gmt 0)

10+ Year Member



I am experiencing the same problem. Great results a few weeks ago, new pages showing up, the works. Then everything stopped.

Hopefully a semi-dance is on the way that will update all this stuff at once and then we'll go back to the more frequent updating, once they have worked out the kinks.

killroy

4:39 pm on Sep 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, a little update might be helpfull.

In the meantime google has added 115 pages. It had spidered around 160 last week, and I was almost certain they would apear on the sunday night update but they haven't.

One of hte two ever-fresh pages dropped to a non-freshed state form 3weeks ago for 2 days but is now back to beeing freshed every 24hrs. (together with the root page).

I have now been hit heavily by scooter, slurp and ZyBorg, and am happy that my homepage (a pretty fresh version) shows in MSN. All this without any PFI.

I can only pray that all spidered pages will be indexed soon, what is the timetable for these spiders?

In the meantime I'm glad for my income and can't wait for it to double when google lists the >200 page I have by now, all of which it picked up last weekend.

So it's been 30 days from 0 to 100 pages indexed and 500 unique visitors/day and 12 PR5 pages. Not bad I'd say.

SN

dirkz

5:16 pm on Sep 9, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I experienced the same with Scooter and Slurp. The latter tried to slurp my whole site from the first time he hit it, never paid for it. Do you really think that google updates every weekend?

benc007

1:51 am on Sep 11, 2003 (gmt 0)

10+ Year Member



Killroy and friends,

What tool are you using to track pages that Google has indexed?

ralent

3:01 am on Sep 11, 2003 (gmt 0)

10+ Year Member



benc007 ... try this search parameter.

site:www.widgets.com -qwerty