Forum Moderators: open
just a quick Google question. I'm hoping (fingers crossed) that I'll get listed in Google after the next update and have been keeping an eye on the server logs to watch out for visiting googlebots. My site has been visited a few times but only as far as the first page, the spider never follows any of the links. All the links contain a? as the pages are dynamically generated and are of the form :
/shop/?page=pagename
rather than :
/shop/index.php?page=pagename
would this prevent the spider following the links? Also I do not have any robot specific meta tags such as <meta name="robots" content="index,follow">, will this make a difference? I understood that these meta tags are still not used by the majority of search engines.
Any thoughts/advice appreciated.
Thanks.
Of course if there are no links pointing at your site at the moment, I would work on that as well as that is a major factor in a good listing in the big G!
Craig
I've had the same problem a month ago. Google has crawled only www.mydomain.com/ and didn't follow any link. Although I've installed a robots.txt and META-Tags.
But now my page is completly spidered. The trick:
Mod_Rewrite in .htaccess. I mean something like this:
URL: www.shop.com/shop/page/firstpage/
Rewrite to www.shop.com/index.php?page=firstpage
If it's possible for you to create your own .htaccess-File I am able to help you with the syntax.
regards,
tino
From telnet or SSH -
>lynx -source [yoursitehere.com...] > index.html
(all on one line)
That will create index.html which will be the exact html output that you get when you click on index.php.
The problem with .php is that often these sites have some kind of usertracking via sessions. And if Session IDs get appended to get-urls google will not crawl those links.
See
[searchengineworld.com...] - Q: [Indexing] Does Google index dynamic content?
and
[webmasterworld.com...] - for the discussion
unfortunately my ISP doesn't allow the use of Mod_Rewrite in .htaccess files. This leaves me with the option of implementing my own URL parsing trick so that I can use links without the?, or generate a load of static pages that will be spidered (as suggested by bjseiler). However, I thought this was a technique that could lead to sites being blacklisted by some search engines.
The reason I've not gone down either of these routes yet is that from everything I've read so far I understood that Google was quite happy to follow dynamic links. This issue is obviously not as clear cut as it first appears and perhaps it's time to look at my options again.