|Length of query string|
Why wont google visit normal looking pages?
My site has PR 7, so I get a visit from googlebot every 3-5 days. For a long time google hasnt indexed any of my deeper pages, and I thought it might be because they contain many long query strings.
So after reading this article [promotionbase.com] on making SE friendly links, I decided to change the path to this:
Method 1 mysite.com/file/12_long_description_that_is_not_a_parameter.php
(the name of the php file is actually "file" and everything after "12" is removed). In this case googlebot visited all my 8000 pages. However, only about 400 of them made in onto google.
Then I read an older thread here [webmasterworld.com] (see msg #10) about directory depth and PR and decided to try my luck with this format:
Method 2 mysite.com/file.php?id=12_file_description_that_is_not_a_parameter
According to my own tests, this did raise the PR by one on each page. However, I started running into some problems. It seems that google refuses to even visit any pages where the query strings are longer than 15 chars.
Eg. google will visit and index this: mysite.com/file.php?id=12_file_descrip
but not this: mysite.com/file.php?id=12_file_description_that_is_not_a_parameter
Now I have reduced the querystring to max 15 chars, which has seemed to help - google indexed more pages last night, but still nothing close to the 8000 pages I have.
The file descriptions are included in the query string and the filename as Im hoping this will please google.
- has anyone else experienced google not wanting to look at a file with long query string?
- should I sacrifice the extra 1 PR for the Method 1 that google seems to like better?
- should I drop including the file description in the URI all together?
Might be time to start translating those pages into apparently static urls... One thing you might look at is your directory and linkage structure, i.e., does the spider have nice, well-organized paths to get to the dynamic content? I'd say either many levels of dynamic pages (i.e., one query-string page leads to more, and those to still more) or an extremely flat structure (e.g., one page with 8000 links, or 10 pages with 800 links) could be a problem.
Welcome to WebmasterWorld, by the way! :)
Welcome to WebMaster World!
I can't find the thread right now, but just before the last update, Googleguy said something to the effect that Google had recently improved it's ability to crawl dynamic sites. Among his tips were:
1) "In the past" making dynamic pages look static might prove to be an advantage, but in the future, it might actually hurt.
2) The parameters after the? are going to be crawled fine, but he suggested that having elements in them that don't actually effect the content of the page could hurt. I'd assume this is a way to prevent people from spamming by putting keywords into a parameter even though the page doesn't need those keywords to display its content.
I don't know if any of this will be useful to you, but I hope it is.
i went nervous the moment i read your post so i did a site search. Fortunately GoogleGuy didn't say in his thread "rewriting might hurt":
He just said Changing dynamic urls to appear static will be less important over time as Google crawls dynamic urls better. [webmasterworld.com] ... and that it's better to estimate server load / response time, balance the crawler hits depending on the load ... etc.
puhhh... thanks for the adrenaline! ;)
And new_shoes (welcome!),
he said "PageRank is on a page-by-page basis, so the number of slashes don't matter" - this is a clear answer of one part of your question, i guess. New to me also, BTW.
Google is definitely crawling dynamic pages now....and listing them randomly in the results ! Very weird.
Hi Guys, and thanks for the warm welcome :)
After reading your replies, I have decided to go back to Method 1. Already it seems to be paying off... googlebot is indexing my pages as I write and is indexing far more pages than normally.
I made two small modifications to the "Method 1". I restricted the file descripiton to 40 chars max and added the extension .html and not .php as I recall there was some speculation to googlebot preferring .html over other extensions (I imagine this isn't true, but why take the chance?).
My URLs look like this now mysite.com/file/12_file_description_not_more_than_40_chars.html
Thank you all for your replies.
Does anyone know if google still has a limit of 15 characters for the query string? I find that a pretty small limit. I help run a birding website, that has many dynamic pages and we've got the query strings as small as possible, but most of the pages we want indexed (the photo pages) have about 16 characters or more.
[edited by: heini at 3:20 pm (utc) on Nov. 30, 2002]
[edit reason] no urls as per TOS & Charter please ¦ thanks! [/edit]
If, in fact, 15 characters is the magic number, that is very helpful to me. I use something like this:
I have been trying to get more pages in for quite some time now without success. I suppose it very well could be that I would be advised to use this instead:
As longer surnames would far exceed 15 characters when added to "?Surnames=" and may, therefore, be part of my problem. Thanks for the post and if Googleguy is out there, can you please verify that the number of characters in the query string is important?