Forum Moderators: open
I've launched a new website. I've kept the design plain, the content rich. I linked to it from several PR6 pages.
Within hours googlebot was all over it. It kept coming back, respidering everything for around 3-4 days, then it popped 10 pages into the index.
So far so good.
Then, as content development progressed I started to notice that no new pages were added. Even though I changed my PR6 inbounds to point to various deep locations not yet in the index. For the next 2 weeks, Googlebot would come back every 2 days and grab the homepage and one other page (which isn't any different from the other content pages). And sure enough those two pages always updated in the SERPS and had a fresh tag from the day before. None of the other pages have been updated since (I changed the titles and those are not reflected in the SERPS).
I started becoming suspicious and frantically checked all relevant areas. Robots.txt was my first guess as I had added it later on, so I renamed it. Checking it, everything seemed in order but I wanted to make sure.
Well, after some more soul searching I realised that somehow my brain must have short circuited and equalled "NONE" to "NOT(noindex,nofollow)", when it in fact means "noindex, nofollow". That's what happens if you code too long.
Hopefull that I'd found my culprit I removed the offending tag.
And, lo and behold, the next day (google still kept coming back daily for those other two pages) It grabbed a whole new set of pages, perhaps another 10 or so.
Please note that at this point I had over 90 pages of great content, and was starting to get morale problems with my content writers as they didn't see any fruits for their labour.
Well, the sad fact is, that Google STILL freshdates the two original pages daily, STILL has the 10 original pages in the index (8 not updated since the first inclusion 3 weeks ago) and STILL hasn't added any of the new pages.
In fact one of the two freshed pages recieves hits for keywords on pages 8 and up for which I've written whole new targeted pages that would rank top 10 if they were just included.
I have now added a "sponsored by" link on some of my larger pages, pointing to various deep pages in this new site, so there must be dozends of PR6 and PR5 inbound links by now. In addition it is listed in many directories and other search engines.
Googlebot already has respidered the two "special" pages today, and I hope it will spider more of the other pages. But as long as they don't get into the index, all this spidering is useless.
I have tried the google directive "Allow" in my robots.txt
Sould I try the meta content="all" too?
What else can I do?
Curiously enough with the meta content="none" shouldn't google have stopped respidering and refreshing those two "special" pages too? The metas are exactly the same in all the pages.
I'm a bit at a loss. I've now republished the site on a subdomain to test if google will pick it up from there.
The pages are VERY simple in construction, no javascript or any funny busines, CSS styled, very little navigation fluff.
I'm at a loss, how do I get google to include the spidered pages in the SERPS?
I appreciate all your help and your time to read my "epic" post, I wouldn't bother all of you if I wasn't very desperate.
SN
No idea what's happening on yours. Have you checked the headers on the indexed pages to see when they expire? You mention about changing your inbound links. What I've done is leave the links the same and get new links when I put up new content. Maybe he doesn't like you changing them around like you're doing?
In fact Goolgebot's visits have slowed again to its regular checkup on the magic pages.
I have now a total of over 10000 backlinks from PR0-PR2 pages to all the internal pages, but none are followed properly.
Would it perhaps help to relink everything to the homepage, to create PR that will let googlebot follow more deeply into th enew site?
Does it ignore transfered PR to pages it doesnT spider?
I'm still desperate for a resolution of this issue.
Thanks,
SN
The issue with fresh crawls would be everflux, on and off. But that is not the case. The initial ten pages are rock solid. 8 still in their "virgin" state (as inddexed) several weeks later, while the two "magical" pages get a fresh date every day and updated content. While the few other pages it grabbed right after changing the tags have never entered the serps, not in an everflux fashion or otherwise, and I chack so often I shurly would've seen them
SN
New pages are linked by the home page (which is PR4 and which is one of the two "magical" pages that are updated everyday).
The last new page indexed by Google on my site is dated 1st of August 2003.
Has someone else noticed similar behavior?
George Abitbol
Google initially indexed about 20 pages - index page freshed everyday - initial 20 pages still in index - entire site of about 65 pages was hit for 2 days in a row about 4 days ago, most pages appeared in the index for about 2 hours and then disappeared.
I'm not spamming, trying to follow bretts guidelines with good content etc - got some good links from PR3 - 5 sites / Dmoz / GoGuides etc - I'm hoping it's just a case of early days.
<added>
Another thing I've noticed is that I'm getting some hits from google.ca on some competitive keywords, i'm using a UK based host that I believe is hosted in Gloucester, UK
</added>
George Abitbol
I have now a total of over 10000 backlinks from PR0-PR2 pages to all the internal pages, but none are followed properly.
When I go out to get links to my new pages, I go to a few PR6/7 sites which cover the exact same subject area. I experimented with this. If I get a link from just one of them, Googlebot comes by within 24 hours and grabs the new page and it shows up in the SERPS within 48 hours later.
I've never attempted to get tens of thousands of links. How exactly are you getting these links and could they be the reason for the odd behavior you're seeing? Maybe it's flagging that many as being unnatural for a new site?
clean, easy-to-spider URLs? -- Yes, changed my URLs to static looking ones via mod_rewrite
no JavaScript links? -- None
sitemap? -- Yes
build some deep links from the spidered pages to other pages? -- Yes
go for inbound links to some of your deep pages? -- Yes
important: validate HTML -- Yes, valid XHTML
considerign I have a few sites with a total of about 30000 pages in google that are completely messy, don't valuidate, use frames, use javascript and so on and rank top accross the board, I simply don't understand it.
Perhaps google likes messy code better? ;)
SN
I've seen new pages within a 'mature' site get added to the index rather quickly (talking days here to get stable).
I have no experience with fresh new sites, tho we will be launching one later this week - that's what brought me to this thread.
With the 'old Google' we'd never expect a site to be in within three weeks, would we? With the 'new Google', maybe fresh new sites take a bit longer than an actual mature site with that have their own pr. Maybe once you get some pr attributed to the site things will change.
In addition, over the last PR update, we had some internal pages of our business site move up to pr5, and now they get freshed everyday.
Not sure if any of this blither helps - what I figure is that maybe you still may need a month or more for a new site to really root into Google...
Although Google came in the beginning and spidered about 10 Pages and then came quite frequently to take the robots and index file, it stopped now for 10 days completely.
The site is 2 month old and even though it has over 500 backward links, which are shown in Google, it has no ranking at all.
I have over 20 sites, which I look after, but this never happened before and I canīt find any reason for it.
GOOGLE HELP :)
Maybe this is now the "seal of approval" that Google needed.
Kudos to hte DMOZ folks, pretty quick turn over for a brand new site. Thanks heaps.
SN
In the meantime google has added 115 pages. It had spidered around 160 last week, and I was almost certain they would apear on the sunday night update but they haven't.
One of hte two ever-fresh pages dropped to a non-freshed state form 3weeks ago for 2 days but is now back to beeing freshed every 24hrs. (together with the root page).
I have now been hit heavily by scooter, slurp and ZyBorg, and am happy that my homepage (a pretty fresh version) shows in MSN. All this without any PFI.
I can only pray that all spidered pages will be indexed soon, what is the timetable for these spiders?
In the meantime I'm glad for my income and can't wait for it to double when google lists the >200 page I have by now, all of which it picked up last weekend.
So it's been 30 days from 0 to 100 pages indexed and 500 unique visitors/day and 12 PR5 pages. Not bad I'd say.
SN