Forum Moderators: open
1. Keyword in Title Meta tag
2. Keyword in Description Meta tag
3. Keyword in Body text
4. Keyword density - 1-7%(?) - No more, no less. Results will vary. 5. Page Rank/Links - best if from related sites with higher PRs
6. Sufficient Content (Google seems to like bigger sites, more content)
7. Keyword in incoming links
8. Keyword in incoming link text
9. Keyword in text surrounding incoming link text
10. Keyword in <H1> tags
11. Keyword in outgoing links - best if to related sites with higher PRs
12. Keyword in outgoing link text
13. Keyword in text surrounding outgoing link text
14. Keyword in alt image tags
15. Keyword in bold
16. Keyword in italics
17. Keyword in domain or sub domain name
18. Keyword in keywords Meta tag
I think there is a general misunderstanding here. Php or html or asp or jsp or cgi or php3 or ... does not matter as far as i know.
The issue with dynamic pages is more about avoiding long querystrings in your url with lots of "x=this&y=that&ID=12345&ID2=987654321&general=stuff&such=nonsense"
- especially to avoid session ID's. Here's a few quotes from Google's own webmaster pages:
"If you decide to use dynamic pages (i.e., the URL contains a '?' character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small."
and:
"Allow search bots to crawl your sites without session ID's or arguments that track their path through the site."
Link: [google.com...]
/claus
Would you care to elaborate, please?
One of my site is pure php...dynamically generated but with shorter URL parameters...all pages have been indexed by Google(1,300+) and everytime I add new content, a few days later it's in the index.
On a related topic, it occurred to me that PHP could easily be used to feed different information to googlebot crawlers than everyone else who visits your site, along these lines:
if (preg_match("/googlebot\.com/",gethostbyaddr($_SERVER['REMOTE_ADDR']))) { HTML PAGE FOR GOOGLEBOT HERE }
else { 'REAL' HTML PAGE HERE } This opens up all sorts of malicious and devious possibilities, but I can see more legitimate uses for this idea too. For example, at the top of some of my pages are some tabs (Add to favorites, go to home page, etc.) and Google annoyingly almost always uses this text in its snippets. I could simply use PHP to hide these from googlebot spiders, couldn't I?
And it could be used to increase (or decrease) the apparent keyword density. It could also be one solution to all you posters who didn't want to use huge <H1> headings. Easy: just use php to feed <H1>Keyword</H2> to the googlebot (as some appropriate place on the page) and not to your visitors!
I admit this a is somewhat dishonest technique, and its usefulness may be limited, but can anyone see why it wouldn't work? Or maybe you have other ideas how it could be used in positively...?
The Googlebot does not know that you don't want your tell-a-friend page spidered. To avoid this, you should mention this file in your robots.txt and to get it removed from the index you should first use this tag on the tell-a-friend page:
<meta name="robots" content="noindex,nofollow"> Here's more info on the robots.txt from Google: [google.com...]
>> it occurred to me that PHP could easily be used to feed different information
You can accomplish this using all kinds of file formats, even standard plain html - this is not specific to php at all. There's even a whole cloaking forum [webmasterworld.com] that deals with such issues.
>> more legitimate uses for this idea too
Yep, there are all kinds of good legitimate reasons as well. One related issue (that does not necessarily involve php) is rewriting your URL's to avoid those session ID's and parameters that the SE spiders tend to choke on.
>> can anyone see why it wouldn't work?
It will work nicely - meaning, you can make your pages do that easily if you know how to. Search Engines definitely don't like it, however, so if you plan on doing this you should be prepared for what happens when it gets discovered. The Google FAQ also have a topic about cloaking, and other SE's might have similar views:
[google.com...]
I'll suggest reading a bit in the cloaking forum before you go ahead with it (if that's what you want to do) - there are all kinds of things apart from Googlebot and your own pages that you would want to consider as well.
/claus
Welcome to WebmasterWorld World_Wide_Wibble :)
[edited by: claus at 11:30 pm (utc) on Sep. 29, 2003]
Googlebot does not know that you don't want your tell-a-friend page spidered.
No, of course; I'm not greatly bothered if it does spider it; there's not much interesting content so it won't get high in the rankings if it's indexed, and if it does, well, that can't be a bad thing!
I'll suggest reading a bit in the cloaking forum before you go ahead with it (if that's what you want to do)
I doubt I will use 'cloaking', though will bear it in mind as a possible workaround for problems. Thanks for drawing my attention to the online literature, etc. I'm sure if it's a significant problem for search engines (as you suggest) Google has (or will have) an anonymous spider that checks sites out... so as you say, probably not wise to use this idea :)
Sure. Watching the trends of Googlebot over the past 3 years. Working on 000's of sites for SEO this is my personal opinion, nothing more. Take it or leave it. However if you can demonstrate the same I'm no stickler to traditions and my opinion can change.
So,
If linked to page = php = then don't crawl all the links because it could be some kind of trap (with some of the amazing techniques that SEO's are using a simple spider will treat with caution any dynamic page first time around, because they are become easier to fool as the scripting gets more advanced...as previous posts show!). Or crawl with caution. IE. Crawl a bit this month a bit next month.
However, if link to page = .html then crawl away like there is no tomorrow.
Exceptions:
If PR = <high or number of page indexed = high, or page in index <x years old treat content as "honourable", and add new pages as they go, almost like HTML.
Google is getting better at parsing variables as it seems now to crawl a dynamic +1 variable. Plus 2 occassionally and plus 3 rarer still. Please don't post your plus 4's, I'm talking about generalisations here.
Jumping left: Fantastically I took a domain the other day which was completely new. In 3 days Googlebot had captured "home". In 5 days it had crawled all the first level directory. I did no submission, just placed a link on 4 other websites. Next up who do you think got the domain? It was Inktomi, then Fast and on Altavista, I'm still waiting!
Don't know if this will help anyone but that's a small part of my Google ruleset.
Glyn.
My question is... those of you who have been saying Google loves fresh updated content, do you think the fact that the page is a little different every time Google looks at it is likely to (a) improve the site's rankings and/or (b) make Google come and crawl my site more frequently? and (c) any other advantages?
Are there any disadvantages? For example, for the past few days googlebots have requested my home page about 3 times daily. Do you think the googlebot is clever enough to get suspicious and say "hold on, this is being updated too frequently, it's probably 'random content'..."?
Sure. Watching the trends of Googlebot over the past 3 years. Working on 000's of sites for SEO this is my personal opinion, nothing more.
Well, it could be true to any extention as well and not exclusive to php. Extensions such as .asp, .cgi and even .html/.htm/.shtml can be programmed to trap a spider.
So, I have to disagree with your observation that Google avoid new php sites because it could be a spider trap.
Cheers
As to Googlebot, i really don't think php is any problem at all - spidering problems have always had another reason in the cases i've seen (i gladly admit that these are not too many). Then again, the line of reasoning, that Gbot should "take care out there" sounds allright to me. I suspect they use other methods than looking at file extensions though, and that they don't think php is the only tool in the box.
For the thread topic, did anyone mention this link? There's good information found on those Google webmaster pages: [google.com...]
Also - as it's a "101" - i'd like to add to the discussion that Google is a "page-based" search engine, not a "site-based" one. It's not your site that gets shown in the serps, it's your individual pages. That's important, as sub-sub-sub page can easily perform better for some search than your index page.
/claus
Ash
(on the road in Mumbai, India)
>>jk3210 - Do you use H3 for the text or P? And does it make any difference?<<
No. My response to you was just an attempt to answer your question as to what Trillianjedi was referring to, not advocating H1, H3, H2.
I always use <P> and I rarely use anything other than css-modified H1's. :)
all this optimization for nothing now... =(
Look at this news here...
PageRank is Dead: [snip]some blog[//snip]
Someone that knows anyting more?
[edited by: heini at 11:12 am (utc) on Oct. 18, 2003]
[edit reason] please don't post urls, thank you! [/edit]
Does it look like it matters to google what your pages are called?
I.e. will www.something.com/keyword.html
rank differently than
www.something.com/nonsens.html?
I think it used to be the case, but it doesnt seem to matter any more. Also, will it effect the relevancy of the page that links to it?
i.e. Should my widgets page link to www.something.com/widgets.html or www.something.com/nonsens.html ?