Forum Moderators: open
My site consists mostly of dynamically created pages. I use a combination of SSI, database pulls, reading from textfiles using FileSystemObject, and Response.Write to build my html pages using asp. My question is: when G or other SE crawl sites like mine, do the bots read the content of the asp file directly, or, do the bots behave like a browser and goes to the server to "request" the html file generated by the asp file, and base the indexing decision on the generated html?
If the bots read the asp file directly, then I have no idea how I can optimize my site for SEs without turning my back on the convenience my scripts allows for.
If the bots "request" the output html, then I guess, I have prepared for this and the generated html is already SE friendly.
BTW, my querystrings are all very simple and short as in widget.com/showwidget.asp?id=1
Please educate me on this oh great forum members.
Iam not an expert in anything technical so I guess Iam just missing something here.
1) First is that some of the query string may be something that doesn't lead to a unique page. Example
example.com?a=1&userid=123
example.com?a=1&userid=456
example.com?a=1&userid=789
etc.
2) Second, let's suppose that I have a web page titled "Numbers under 1 billion":
example.com?a=1
example.com?a=2
...
example.com?a=999999999
3) Third, let's suppose that I have a dynamically generated page that generates hyperlinks like this:
<a href="example.com?randomnumber=562951413>randomly generated querystring</a>
where any random number is dynamically handled and also creates hyperlinks like that.
The first example has essentially an infinite number of pages from different user ids. In each case, the content is the same. The spider has to deal with that.
The second example has pages that are almost identical, generated from the querystring.
The third example creates a spider trap, where the spider can index the site from now to eternity, since any querystring will generate another page.
In all three cases this creates a burden on the search engine trying to provide relevant results. They cannot afford to spend time indexing millions of pages that are identical. Nor can they store the results of millions of pages that are almost identical. Finally, they cannot index the same site forever.
Any one of these on your site will likely kill your ranking. The more complicated the querystring, the more likely the spider will run into one of these situations, so the engines don't bother indexing complicated querystrings.
Are you essentially saying that the bot is trying to "guesswork" the range of values contained in querystrings?
In your second example, widget.com?a=certainvalue, does not the bot just take the certainvalue indicated in the referring page? Why must it "guess" all possible values?
I have my site setup like this
level 1 - main page (myindexpage.asp)
level 2 - subject pages (mysubject.asp) contains links to individual article pages whose URL are HARDCODED with individual querystring id values as in showarticle.asp?id=1, showarticle.asp?id=100 and so on
level 3 - article pages (showarticle.asp?id=value) all dynamically created, template-based, whose actual content varies significantly
Do you think the bot will have a hard time dealing with my querystring values in level 2 even if they are right there on the page hardcoded?
If the answer to this question is YES, then I still don't get it?
1. font size matters on the link to the "index" page.
2. Descriptions in the links on the index page MATTER. ie don't use ISBN: 0000000001 as the link. Use the title of the book or in your case the title of the article.
3. Make SURE that you put similar text from 2 in the page header title to what the link referenced, exactly the same has been best.
4. Make sure that keywords from the link are also found in the page referenced.
I have gotten much better breakdown through G and Inktomi this way.
If you know of any other ways of doing this please let me know.
1) First is that some of the query string may be something that doesn't lead to a unique page. Exampleexample.com?a=1&userid=123
example.com?a=1&userid=456
example.com?a=1&userid=789
etc.The first example has essentially an infinite number of pages from different user ids. In each case, the content is the same. The spider has to deal with that.
Not, if you modify you robots tag in the metatags depending of your variable
if ($userid=="0") {$robotsmetatag="all";} else {$robotsmetatag="none";}
echo "<meta name='robots' content='".$robotsmetatag."'>
This way SE will only browse one page (over million) and you are safe regarding content duplicate.