Forum Moderators: open
I have been combing through this forum and also the large FAQ on this site and had a few questions:
1. When I do a search on Google the majority of top results usually end in .htm, .html, or .shtml. Does google prioritize those pages? Or was it just random that the exact listings I searched for resulted in this?
2. I read in the FAQ that there is a limit on the URL size, but didn't get a number. Is anyone aware of this?
3. I have 6 sites that will be newly indexed by google. I have included a table of contents of every site on every site, but some of the table of contents are rather large. What is the maximum size of page google will spider? Will google penalize me if the page size is too large, or will it just index a part of it? Is there a limit to the number of links per page?
4. I have come to the belief the google does not like URLs with "?" in them and so I have re-coded all of my sites to use "/". Instead of the "&" symbol I use "-u-". Therefore, most of my pages look like this:
http://www.domain.com/page.php3/var1=2-u-var2=5-u-var8=1
Does anyone see a problem with this?
5. I originally tested the sites using "/" as the "?" and as the "&" but saw that the page rank reduced the deeper I went into "Directories". So if I have a pagerank of 4 on the main page, then:
http://www.domain.com/page.php3/var1=2-u-var2=5-u-var8=1 has a pagerank of 2, whereas
http://www.domain.com/page.php3/var1=2/var2=5/var8=1 has no pagerank.
Is this normal?
Lots of questions, I know, but I am incredibly nervous about messing up this next crawl.
Thanks in advance everyone.
2) I wouldnt like to commit myself to a specific size, but if it looks reaonable its probably not a problem - just avoid 'bloated' urls
3) I think max size is 100 links per page (its on a google guidline somewhere). Am i readying your post correctly that you are putting a full Table of Content for all of the sites on each of the sites - ie eachj site has 6 tables of contents, one for each site? - if so you may want to check your not forming a link farm/linking purely to get PR.
4) not a problem in itself, but be careful 'var1=2-u-var2=5-u-var8=1' doesnt get to be too long with lots of id numbers and the like - the nicer the url looks the more google likes it!
5) I suspect that this is a case of some of the pages not having gone through a full cycle - the PR would be an estimate. Estimated PR's I think are based on Directory levels hence the / effects it, however real PR is NOT effected in this way, so i wouldnt draw conclusions from that.
Regarding #4, what do you mean by "nicer url"? If I can avoid certain characters I will... I just don't know which ones google doesn't like. :-)
Regarding number 3:
1. If it has over 100 links, does google ignore it or does google still spider up to 100?
2. "link farm/linking purely to get PR" - That is the idea, but I have it setup as follows: A large table of contents for every site on itself...so the TOC for one site will not be on any other site except itself.
However, there is one site that I want linked to from every other site. So if I linked its table of contents from every other site, would that be harmful for my rank?
you also refer to the one site that you want to link to from other all the others - you ask will it hurt. I dont know how your planning to link, whether you mean a single link or alot of links. I also dont know if your linking because the other sites are relivant for your visitors, or whether your mearly linking for PR. If you ALWAYS do whats best for your visitors then your not likely to have a problem. If your concerned then read up on the subject - there are lots of discusisons about it in the forums here.
Also if you ahvent looked already look at what google say in their webmasters section - I would sum it up as 'dont try to fiddle or deceive google' & that seams fair to me!
Thanks for the response.
I have another question...
I have certain forums on a site and it lists the threads at 20 threads per page, and then has page numbers on the bottom of the page that users can click on.
Would it be better to display 40 threads in the list to google? Or would it be better to display 10 threads? What would increase the likelyhood that google will spider the entire site?
You certainly sound like you have done your research which could be the most important thing!
>>When I do a search on Google the majority of top results usually end in .htm, .html, or .shtml.
Most pages use these extensions, hence their dominance in the search results.
>>I read in the FAQ that there is a limit on the URL size, but didn't get a number. Is anyone aware of this?
Some search engine only read a certain size of pages, or results could be full of endless pages! You should keep pages small for the sake of usability. Google recommend you don't have more than 100 links per page:
(http://www.google.com/webmasters/guidelines.html).
>>http://www.domain.com/page.php3/var1=2-u-var2=5-u-var8=1
I wouldn't particularly like to say one way or the other, but it looks OK to me.
>> the page rank reduced the deeper I went into "Directories"
The PR you see with the toolbar is not the real PR. It is quite often 'estimated' PR and so the lower you go into a site's structure the lower it gets. PR is based on links and has nothing to do with directories on your site, althoug many sites use logical directory structure which distributes PR in a way that makes it seem like the deeper you go the lower it gets.
>>Would it be better to display 40 threads in the list to google? Or would it be better to display 10 threads?
For very specific questions your best bet is to experiment and see what works with your particular site.
Hope this is helpful Decius , and you definitely sound like you're on the right track.
domain.com/123/789/product-detail.html
I have *.html mapped to a java servlet, and for each "page" (for example, product-detail) I have a parameter mapping. So the "123" and "789" get parsed out of the URL and are given names (for example, "category_id" and "product_id"). Possibly too complicated for some, but I get super-nice URLs.
It looks a lot better than:
domain.com/ProductServlet?page=product-detail&category_id=123&product_id=789
The only give-away that it's a dynamic page is that no file size is returned.
The reason I wasn't keen to give you a direct answer is that it isn't quite as simple as the question implies. This factor (as is the case with many other factors in Google's algo) depends on things like the keywords you want to show up for (how much competition, how popular the query is etc.) or the structure of your page. To get visitors and encourage frequent spidering, make sure your best content is the first thing spiders see on the page, and also update your content as frequently as possible, either by adding pages or updating existing pages.
>>I am incredibly nervous about messing up this next crawl.
Seriously Decius, from your post you don't have too much to worry about :)
It looks like you have read up on a lot of the important issues, and if you have good content and have been keeping optimisation in mind you have nothing to worry about. If your site hasn't been indexed at all yet, expect Google visitors :)
Even if you don't get them then you will be able to find out why. Although it can be frustrating, trial and error is the only real way to test what you have learned. It's only a month between updates and as long as you don't use blatant spam techniques I see no major problem at all.
Thanks andy, and I understand what you mean.
I have a reasonably good page rank, and that isn't actually my concern.
Lots of the sites I have have very rare and unique content that isn't available in many places, therefore my competition is not tough.
My concern is to make sure google crawls my entire site and contains that information, regardless of page rank.
I will show up in the top ten whether or not I have a PR of 5 or 1.