Welcome to WebmasterWorld Guest from 54.227.157.163

Forum Moderators: phranque

Message Too Old, No Replies

How do you get a 1000 pages crawled easily?

     

knotz

11:06 pm on Mar 26, 2005 (gmt 0)

Inactive Member
Account Expired

 
 


I have about a 1000 or so pages on my site and only 40 of them get crawled by google and even less by Yahoo. I'm just wondering how I can improve this. All the files are located in the root directory and all end with .shtml - Should I change this setup? Any advice would be appreciated
Knotz
1:17 pm on Mar 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 1, 2005
posts:741
votes: 0


Hi,

Its very unprobable that a SE indexes more than 40/50 pages of a "big" site.
That depends on the SE algorithm that assignes relevance to the pages on the base of statistics criteria.
So I don't believe that you can do more.

Regards

1:55 pm on Mar 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3476
votes: 76


>>>Its very unprobable that a SE indexes more than 40/50 pages of a "big" site

sorry to be so forthright, but this is completely untrue.

our main site runs to 5 figure numbers of pages and nearly all pages are indexed by google

knotz there is no need to change your set up but ensure you have decent site map/s.

9:48 am on Mar 28, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 1, 2005
posts:741
votes: 0


I forgotten to add that I talk about NEW sites/domains.

However I'm well curious to see how to set a "sitemap" of over 1,000 pages...

Regards

11:03 pm on Mar 28, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member topr8 is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 19, 2002
posts:3476
votes: 76


>>>how to set a "sitemap" of over 1,000 pages...

as i said sitemap/s ... in my view each category or section of a site should have its own site map or maybe you want to call it an index page, or even just integrating site maps into the structure of the site itself. rather like links pages, they need to be implimented imaginatively.

imo one of the largest and best designed sites on the web is the bbc, every section you get to has fantastic site maps, every page links fantastically to many relevant pages and so on.

4:09 pm on Apr 27, 2005 (gmt 0)

New User

10+ Year Member

joined:Apr 26, 2005
posts:14
votes: 0


I have about a 1000 or so pages on my site and only 40 of them get crawled by google and even less by Yahoo. I'm just wondering how I can improve this. All the files are located in the root directory and all end with .shtml - Should I change this setup? Any advice would be appreciated
Knotz

knotz. here's what to do:

1. make a list of your 1000 urls
2. use your word-processor or perl to turn it into a list of 1000 links to each url (ideally with the title of the page as the wording of the link)
3. use your word-processor or perl to paste 50 links onto the bottom of each of the 40 pages listed

forget about it for at least 1 month so you don't stress out unnecessarily or do anything stupid like go back on yourself.

and presto hey. your site will start to see more and more listings as each month passes. it probably won't all go up straight away, but it will in the end.

step 4 is optional and leads to a world of chaos where you no longer know when robots are coming and going. also, if you follow step 4, you will be carrying out what many people consider to be a form of activity which google may penalise; i don't personally hold sway with that theory, but it seems to have the majority vote.

4. paste a set of 20 to 50 links to other pages on every page of your site - that way each new link it eats will give it a list of 20 to 50 further links and speed up the rate at which it finds all the 'new' ones.

5. when all the pages are up, remove those extra links from your pages unless they are pages whose content will be regularly updated;

nb here is a huge discussion i found on this forum about whether google "punishes" sites for "aggressive link building"...

[webmasterworld.com...]

the only post you should read and obey on that thread is post number 124 (which outlines 'a simple method to get into Googles good graces') - be positive about the whole thing - that seems to be key

4:10 pm on Apr 27, 2005 (gmt 0)

New User

10+ Year Member

joined:Apr 26, 2005
posts:14
votes: 0


"use your word-processor or perl to turn it into a list of 1000 links to each url"

by that i mean (coz you can't apparently edit on this forum), "turn it into a list of 1000 links - 1 link going to each url"

(it's better to be clear, eh).

1:13 am on May 4, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 1, 2005
posts:135
votes: 0


There are many reasons why G and Y have only crawled 40 pages out of 1000. Some might be internal linking issues, coding issues, duplicate content, seasoning of domain, too many links with the same anchor text too fast, etc. A site map helps, but this scenario definitely appears to be something deeper.
1:43 am on May 4, 2005 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


Its very unprobable that a SE indexes more than 40/50 pages of a "big" site.

Stats for my site are:
Google claims it indexed 38,500 pages
Yahoo claims is indexed 36,800 pages

They are both wimps, they stopped short :)

10:40 pm on May 7, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 13, 2005
posts:82
votes: 0


Incredibill, just out of curiosity, I understand you tried your best to optimize your thousands of pages; How many clicks does each one of them recieve daily on average? What are your stats for this site, at least for those 36k pages?
11:41 pm on May 7, 2005 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


I understand you tried your best to optimize your thousands of pages

I get 1.5M page views a month, heading for 2M around June based on current trends.

The bulk of all these pages are dynamic so all of the content, except for maybe 100 static pages, are optimized on the fly and the optimization also changes based on how the content is accessed.

It's actually pretty tricky :)

I have various search shortcut links that are both great for customers to simplify accessing content but also gives the search engines a specific point of entry which helps massively with creating more SERPs all over the place and dead-on AdSense targetting. Each search shortcut pulls up specific data streams based on various keyword terms of interest and the various optimizations performed on the page are specific to the terms of each shortcut query. Not only that, but I make each search shortcut look like a static entry point with a keyworded page name. When I implemented the last draft of this dynamic optimization last summer I got a 300% traffic boost in a month.

Most of the hits to my site are deep link hits because of this optimiztion which is why the primary keyword which drives them to the home page only accounts for an average of 2.5% of the total site traffic which is still tens of thousands of visitors a month.

12:08 am on May 8, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 7, 2003
posts:1179
votes: 0


If you have static html pages you should be able to get between 80% and 90% of your site indexed
12:35 am on May 8, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 6, 2003
posts:2523
votes: 0


> I forgotten to add that I talk about NEW sites/domains.

Still not true Spectre. I have a new domain (purchased April 5th) that started with 50 pages, now has 110, all but the ones I added last night have been spidered and are bringing traffic.

I agree with the other suggestions, and in addition, you might try changing your links from relative to absolute, and try acquiring a link from a page that is PR 5 or better.

12:53 am on May 8, 2005 (gmt 0)

New User

10+ Year Member

joined:Apr 26, 2005
posts:14
votes: 0


Google claims it indexed 38,500 pages
Yahoo claims is indexed 36,800 pages

They are both wimps, they stopped short

here's a little maths-teacher's demonstration of the way things are (followed by an english teacher's postscript)...

check out the number of kelkoo uk's pages listed on google (a thing i call plog); do it by entering the following in google's search box:

site:www.kelkoo.co.uk

note that there are 459,000 pages listed

now scroll down and click on 'search within results'

(if you're keen, also go to kelkoo.co.uk and note
that on the vast majority of pages you go to, the word "shopping" appears right at the bottom middle)

search (within the results) for

"shopping"

(include the quotemarks)

only 66,900 of kelkoo's allegedly listed pages are actually properly indexed;

now, bearing in mind that a number of their pages may NOT say 'shopping', go back and change "shopping" to "on" or "and" or "the"

they all end up being approximately the same.

then i tried amazon.com...

56,500,000 in the site:www.amazon.com search
and
52,100,000 in the search within results for "Home"

hence, be positive; in the long run, google WILL ingest ALL your pages.

it's got 52 million for amazon. why worry about your 1000 here or 40,000 there? google's strength is derived from its comprehensiveness. it needs to list all your pages.

1:16 am on May 8, 2005 (gmt 0)

Full Member

10+ Year Member

joined:June 18, 2004
posts:327
votes: 0


are optimized on the fly

thats what i call ingenious...

8:28 pm on May 8, 2005 (gmt 0)

New User

10+ Year Member

joined:Apr 26, 2005
posts:14
votes: 0


because my money depends on it, i have done yet more research into why when google lists a big site with many 1000s of pages, the majority of the listed pages appear without any content and can be seen on the results pages for a site:www.mysite.com search as listings where the title is the file-pathname and there is no content and no option for cache;

this time i tested the domain
search.ebay.co.uk

they have approx 1 mill. pages listed on that domain
and approx 200 thousand have content, the others don't.

hence the estimate of 15 to 20% of pages listed is definitely a good conservative approach to business planning.

nb - what this proves, since ebay is one of the most profitable e-sales portals on the web, is that these pages which turn into contentless titles are occurring not as a result of any penalisation system, but as a result of difficulties in swift and accurate keyword ranking;

the more data there is, the harder it becomes for previous keyword scoring algorithms to end up matching the right pages;

hence it is likely that at its current 8 billion pages, google is entering the 'second teething' period of the lifespan of a being - aka adolescence.

the temptation is for most people to imagine that the reason only 5 or 6 thousand of your 40,000 pages are listed, or whatever, is something to do with someone wanting to make things hard for you.

in fact, google are probably MORE pissed off about it than YOU are.

8:43 pm on May 8, 2005 (gmt 0)

New User

10+ Year Member

joined:Apr 26, 2005
posts:14
votes: 0


except for maybe 100 static pages, are optimized on the fly and the optimization also changes based on how the content is accessed.

that's true of any properly designed online data archive;

eg a bogstandard retailer's website contains a script for lists of products in particular category -

1. the categories script can be accessed in a few dozen to a few hundred ways and depending on how it's accessed, it will give a page with specific words on it, including keywords generated for each particular category - and since each category could have up to a few hundred pages, that's a few thousand optimised-on-the-fly pages to begin with

2. product pages: then you get their 1500 to 15000 product entries, each a database item which is accessed from a single script - in the case of each manifestation of the script, the page produced is optimised by means of category keywords and related item keywords which all end up on the same product page;

it's a totally natural form of 'optimisation' - in fact it's the real optimum, which not-so-optimum sites aim to be like and thereby put themselves through 'optimisation'.

information sites, like for example the classics.edu site, with its many books etc; or some cookery site with thousands of recipes in different categories - ALL (if built correctly) have this same natural optimum presentation of information - which includes repeating the correct keyword metatags in the correct places and repeating the correct html menus on pages on the right pages, to be sure that you have abundant information for robots to work with.

there is no way on earth that google seeks to only partially represent these collections of pages, let alone to penalise them for repeating keywords across pages where relevant (since it's perfectly ethical and correct for those sites to do so)

even if google DOES have methods for countering actual spam, which consists of people mimicing the above sort of structure but with zero content and plenty of irritating popups, which themselves pull in a lot of bucks, sadly,

those methods have to always take second priority to the main purpose of google, which is to comprehensively list everything which IS legitimate;

i think the main problem in the last couple of years things have seemed so turbulent is that there has been, apparently, a big increase in spam and viruses;

but that will surely die down one way or another and then things will be swimming along nicely.

google DOES want to keep all of US internet-businesses in the money, rather than the brick-and-mortar businesses who are, largely, the root cause of the virus turbulence of recent times!

7:05 pm on May 10, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 8, 2004
posts:63
votes: 0


I notice the best results with using site maps and interlinking them on each individual page to one another. So on one page you have a link to another page, from that page you have a link to another page..so on and so forth. These 2 methods seem to work best for me. Good luck to ya