Forum Moderators: open
just for starters, this forum is an amazing resource!
Now to my question;
has anyone perceived any restrictions in database strings for google?
an example:
google visits my page at [blah.blahblah.bl...]
daily and fetches [blah.blahblah.bl...] and [blah.blahblah.bl...]
then every month or so (this site is now up for 4-5 months now) it spiders all url's that go like:
[blah.blahblah.bl...]
and even url's like
[blah.blahblah.bl...]
However, no internal links that have an url like
[blah.blahblah.bl...]
are being spidered.
Now, just for the heck of it i tried to find pages in google with an "id=xxx" string in the url and the closest I came was only one url, way down in the results, it was from the manufacturer of the most installed OS for PC's (OK editor?) and had a string with "ID=xxx" in it instead of "id=xxx"
Can anyone comment on these findings?
My site's mainly a hobby and testcase site for me, and there's no commercial interest for me in it whatsoever, but I'm curious if the id string might provide restrictions, on itself, or maybe in conjunction with another declaration in the url like my fictitious url mentioned above...
Maybe I should just post those individual url's with the id string in it by hand? Or is this a bad idea?
First of all welcome to WebmasterWorld.
A lot of the people who I suspect would be able to give you a good response to your particular queries are currently on route or in London to attend PubCon. It not that thay dont care :)
There is no doubt that having complicated paremeter strings in a URL can make getting spidered and listing more difficult. Over the last year or so the big boys have been able to spider sites that have what are called dynamicly driven links
eg page.asp?id=xxx
I have a whole heap of such pages that are requested regularly by google and fast and are all in the index for the particular search terms on the page.
I have come across instances where the number of parameters means that the page doe snot seem to get indexed. From my experience I like to keep the parameter cound down to 1. This works for me. I have come across suggestions that the depth that google will go to depends on the PR of a site and that seems to hold true. Whether the same holds true for the complexity I would say not, but I am open to be corrected and would be interested to know for certain?
The difference in ID=xxx and id=xxx would have no effect. The parameter can be called whatever they want, but again I try to keep them simple and short.
If I have your thought right you are thinking of hand coding the URL in the HTML instead of driving it from a database. I wont make a difference, database driven pages (ASP, PHP etc) are produced server side so as far that the spider is concerned the page is sort of static as it is just HTML. So you would be wasting your time to hand code what you can generate from a database
Hope that helps
Cheers
So the next project will be to make sure all urls on my site have only one database parameter in them... Thanks for that one.
What I meant with "I should just post those individual url's with the id string in it by hand" was not hand coding all pages, but submitting those url's at google's "submit your site" page. I have read somewhere that this is a necessity to get things going, but that one should keep the posting frequency low (e.g. just one url per day).
I wonder wether this might help the spider in visiting these pages, or that I should wait for the spider to get there from the main pages. Maybe someone has got an idea?
Still I think it's strange that google did find and spider the url's with two parameters like
[blah.blahblah.bl...] but it won't spider the same url when "subj1=xxx" is replaced by "id=xxx"
Oh well the mysteries of life and google....
I wonder wether this might help the spider in visiting these pages, or that I should wait for the spider to get there from the main pages.
If you have built in an easy to follow navigation google and others should find all your pages no problem. Think of it this way, the like of macromedia and nasa has thousands of pages. They dont need to submit them all, google simple traverses it way throught the whole site on his/her own.
I cant answer your other query WRT id
Cheers