Forum Moderators: open

Message Too Old, No Replies

My Forum isnt getting spidered by Yahoo (or MSN)

         

chopin2256

7:40 am on Aug 4, 2005 (gmt 0)

10+ Year Member



Hi guys,

I recentely bought a 10 year old site with a pagerank of 7. The site and forum are constantly getting spidered by Google. I write content, its spidered the next day. However, no other search engines seem to be spidering the forum, mainly yahoo and msn. It is an invision board (search engine friendly URLS with a mod rewrite application) so I don't understand it. Why has yahoo or msn not visited the forum?

An interesting note. My 10 month old site which has been penalized severely by Google is very Yahoo and MSN friendly. I get about 45 percent referrals from yahoo, and the other 45 or more is from MSN. Google is 1 percent, but obviously something unknown happened there. The point is, why is my 10 month old site doing better in yahoo and msn than my 10 year old pr 7 site? And why is that site so Google friendly? It is really weird.

martinibuster

7:54 am on Aug 4, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you take a sentence from your forum (around ten words) and run it in Yahoo using quotes, does it return any results?

If you do the same in Google, does it return non-supplemental results?

jd01

10:44 am on Aug 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



my 10 year old pr 7 site?

I don't want to pick on you, but I think when we stop thinking what Google says is a pages value means something to *anyone* we will be able to look at our sites more objectively, and probably make a good deal of progress in determining how other SE's are evaluating things... It's kind of like saying I'm in the DMOZ, why not Yahoo? different places, different criteria, different worlds.

About your site:
When you search for your site, do you see dynamic URL's indexed by any of the SE's? IOW have they recently been converted from dynamic to static?

If you see any dynamic URL's can you access the same static version of the URL? If you can, there is a difinite chance of a duplicate penalty.

Have you viewed the source on your site to see where the text is actually displayed on the page? One of the sites I just looked at from invision had almost no text on the page and what it did have started *very* low in the code.

What is the viewable text to code %? (I usually run 53-55% viewable text, but I use css extensively - I believe 50% is usually a good goal. Tough to get to some times, but still a good goal.)

What are the descriptions of the pages? Similar descriptions are a killer!

What are the static URL's based on? Are they similar?

How many deep-links do you have? Compare linkdomain:yoursite.com to link:yoursite.com in both Y and M... If they are close to the same number, it is safe to say, you do not have many deep links.

Do you have a site map(s)? M and Y both love/need site maps - The best information I received on this came from M's site 'All pages should be accessible within 3 clicks of the homepage' (I run a good size directory ~18K pages - I use almost 300 site maps to get to the 3 click rule.)

Where are your links in from and where do you link out to? Are they quality, on-topic, and only quality on-topic or are there some that are questionable? Who are your ROS links to? Are they to another similar site you operate or other sites that you operate period?

What is your users' behavior? - I believe M and Y are better than G at tracking this and use it in there rankings to some point - Mere speculation on my part.

Have you run header checks on your index and rewritten pages? Do they serve a proper 200 and 304 (not modified) or something else?

Did you change anything when you purchased the site? EG convert to static URLs?

Were the URL's indexed and spiders visiting when you purchased the site?

Do the home page topics change or stay the same? IOW do the topics rotate, so M and Y may be having a tough time determining what the site is about?

This should be a starting point. Hope it gives you some ideas.

Justin

chopin2256

5:13 pm on Aug 4, 2005 (gmt 0)

10+ Year Member



If you take a sentence from your forum (around ten words) and run it in Yahoo using quotes, does it return any results?

Google shows 7000 links
Yahoo shows 2
MSN shows 9

1. If you do the same in Google, does it return non-supplemental results?

Yes it does.

2. When you search for your site, do you see dynamic URL's indexed by any of the SE's? IOW have they recently been converted from dynamic to static?

Yes i see dynamics and htmls in google. Nothing in Yahoo or MSN. I am working on this problem, as the mod that converts the forum doesn't have 301 redirects! Should I take this mod off until the 301's can be implemented?

3. If you see any dynamic URL's can you access the same static version of the URL? If you can, there is a difinite chance of a duplicate penalty.

Yes, but I just started the forum. This is it's first month, and the links were not even spidered by yahoo or msn yet. I know this from the logs.

4. Have you viewed the source on your site to see where the text is actually displayed on the page? One of the sites I just looked at from invision had almost no text on the page and what it did have started *very* low in the code.

This I am not sure about. I don't see why it would be a problem if at least one search engine is spidering it like crazy. Are you saying its a defect in the invision board code?

5. What is the viewable text to code %? (I usually run 53-55% viewable text, but I use css extensively - I believe 50% is usually a good goal. Tough to get to some times, but still a good goal.)

I have no idea. It's the default invision board font size.

6. What are the descriptions of the pages? Similar descriptions are a killer!

What do you mean by descriptions? If you mean the title topics, they are normally descriptive to the members tastes. They are composers asking questions or getting critiques, so I don't think this would be a problem.

7. What are the static URL's based on? Are they similar?

Static urls take the name of the topic/forum/member name in this formate: topic1-topic2-t34.html, forum1-f45.html, membername-m54.html

8. How many deep-links do you have? Compare linkdomain:yoursite.com to link:yoursite.com in both Y and M... If they are close to the same number, it is safe to say, you do not have many deep links.

4000 in yahoo
1000 in msn

9. Do you have a site map(s)? M and Y both love/need site maps - The best information I received on this came from M's site 'All pages should be accessible within 3 clicks of the homepage' (I run a good size directory ~18K pages - I use almost 300 site maps to get to the 3 click rule.)

Nope, no site map...should I do a google site map?

10. Where are your links in from and where do you link out to? Are they quality, on-topic, and only quality on-topic or are there some that are questionable? Who are your ROS links to? Are they to another similar site you operate or other sites that you operate period?

I am assuming the links are natural (it used to be a big forum...it died down, so I bought it, now I converted it to invision board...the links may be from its past users)

11. What is your users' behavior? - I believe M and Y are better than G at tracking this and use it in there rankings to some point - Mere speculation on my part.

How do composers act? They are enthusiastic, encouraging, sometimes snobby and sarcastic. It is moderated, so no garbage is on this site.

12. Have you run header checks on your index and rewritten pages? Do they serve a proper 200 and 304 (not modified) or something else?

I don't know :( I didn't write the mod, I downloaded it from a very smart coder. She just didn't do 301's yet! I brought that to her attention though.

13. Did you change anything when you purchased the site? EG convert to static URLs?

Everything is changed. It was impossible to keep the same page names using a new forum. The old forum was from 1997 and written in cgi. The forum is on completely new pages.

14. Were the URL's indexed and spiders visiting when you purchased the site?

Hmm...I noticed Google was visiting more than yahoo and msn from the start. I just wasn't sure why.

15. Do the home page topics change or stay the same? IOW do the topics rotate, so M and Y may be having a tough time determining what the site is about?

My homepage is a static html. For the main site I use ssi and css but I write using frontpage in html. So no, the topics in the main site stay the same.

I am wondering if I should take off that mod until the coder can do the 301's. Opinions?

jd01

7:28 pm on Aug 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1. If you do the same in Google, does it return non-supplemental results?

Yes it does.

Good.

2. When you search for your site, do you see dynamic URL's indexed by any of the SE's? IOW have they recently been converted from dynamic to static?

Yes i see dynamics and htmls in google. Nothing in Yahoo or MSN. I am working on this problem, as the mod that converts the forum doesn't have 301 redirects! Should I take this mod off until the 301's can be implemented?

I would not worry about removing the rewrite(s) if the forum is new AND there are *no* links to the php version of the URLs, but if not: redirecting php with a query string (stuff after the?) is tough - A lot of people do not have the skill to do it, and if they do it takes time...

This is a rewrite for a couple of php pages to static URL's
RewriteCond %{THE_REQUEST} /\index\.php?(category)_name=(.+)&page=([0-9]+)\ HTTP/ [OR]
RewriteCond %{THE_REQUEST} /\index\.php?(author)_name=(.+)&page=([0-9]+)\ HTTP/
RewriteRule . http://www.yoursite.com/%1/%2/page/%3/? [R=301,L]

3. If you see any dynamic URL's can you access the same static version of the URL? If you can, there is a difinite chance of a duplicate penalty.

Yes, but I just started the forum. This is it's first month, and the links were not even spidered by yahoo or msn yet. I know this from the logs.

If they do not have either URL then there should be no problem with duplicates, as long as the links on the forum only go to one version of the pages. IOW if you link to the .html page, you must always link to the .html page, not to the .php page - the converse is also true.

4. Have you viewed the source on your site to see where the text is actually displayed on the page? One of the sites I just looked at from invision had almost no text on the page and what it did have started *very* low in the code.

This I am not sure about. I don't see why it would be a problem if at least one search engine is spidering it like crazy. Are you saying its a defect in the invision board code?

One is spidering - different places, different criteria, different worlds... Not everyone like the same flavor of page, nor the same style of code. Generally speaking the closer to the top of the code the text that a user sees the better.

5. What is the viewable text to code %? (I usually run 53-55% viewable text, but I use css extensively - I believe 50% is usually a good goal. Tough to get to some times, but still a good goal.)

I have no idea. It's the default invision board font size.

Oops! I should have been more clear... This is a comparrison between the amount of words that it takes for a browser to produce the page, and how many words the user can see on the page. Generally, the more user viewable text the better.

Example: <code>word</code>

My code makes up just over 67% of the above text, and the viewable text (what someone reading the page can see) is under 33%

Example2: <code>word words</code>

My code makes up just over 50% of the above text, and the viewable text (what someone reading the page can see) is under 50%

6. What are the descriptions of the pages? Similar descriptions are a killer!

What do you mean by descriptions? If you mean the title topics, they are normally descriptive to the members tastes. They are composers asking questions or getting critiques, so I don't think this would be a problem.

Oops... My fault again. These are coded into the page - they are what a search engine 'sees' to describe the page to them, and are sometimes (depending on the engine) used in the search results to tell searchers what the page is about.

If you use FireFox at all, they are very easy to see if you have them: From the tools menu, select 'page info' and they should be in the first tab of the menu.

7. What are the static URL's based on? Are they similar?

Static urls take the name of the topic/forum/member name in this formate: topic1-topic2-t34.html, forum1-f45.html, membername-m54.html

As long as they are not all starting/ending with very similar text they should be fine.

8. How many deep-links do you have? Compare linkdomain:yoursite.com to link:yoursite.com in both Y and M... If they are close to the same number, it is safe to say, you do not have many deep links.

4000 in yahoo
1000 in msn

Again, I was not clear enough here. There are two separate commands to get the number of links in both M and Y. One is the standard link: command. This one shows how many links go to a specific page. The other is linkdomain: This one shows the total number of links in to a domain, not just a page.

A comparrison example is:
In MSN
link:http://www.mysite.com -site:mysite.com

shows 1623 links (these are links only to my homepage)

linkdomain:www.mysite.com -site:mysite.com

shows 2811 links (these are links to my entire site.)

By subtracting the links to my homepage from the total number I can see I have 1188 deep links according to MSN, almost as many as links to my homepage

9. Do you have a site map(s)? M and Y both love/need site maps - The best information I received on this came from M's site 'All pages should be accessible within 3 clicks of the homepage' (I run a good size directory ~18K pages - I use almost 300 site maps to get to the 3 click rule.)

Nope, no site map...should I do a google site map?

Unfortunately, Google site maps only work for Google and you are being indexed fine there. There are some downloads that will create site maps of forum archives for many of the popular message boards, you might be able to find one of these and the create one for the main topics of the forum.

If not, creating one for the main topic pages should help, and while the board is small, you might be able to run a site map creator on the pages to help get them in the index.

10. Where are your links in from and where do you link out to? Are they quality, on-topic, and only quality on-topic or are there some that are questionable? Who are your ROS links to? Are they to another similar site you operate or other sites that you operate period?

I am assuming the links are natural (it used to be a big forum...it died down, so I bought it, now I converted it to invision board...the links may be from its past users)

This would be good to at least look in to- using the same link: and/or linkdomain: commands have a look at the sites that are listed as linking to you are they good sites? are they the type of sites that tell a SE that your site is quality? Or, are they just lists of links?

(Make sure when you use these commands, you subtract the links that are internal, by adding -site:yoursite.com - see above.)

Who you link to is as, if not more important also.

11. What is your users' behavior? - I believe M and Y are better than G at tracking this and use it in there rankings to some point - Mere speculation on my part.

How do composers act? They are enthusiastic, encouraging, sometimes snobby and sarcastic. It is moderated, so no garbage is on this site.

More along the lines of what a SE sees people as doing - do they click on your site, then immediately back to a SE because they did not find what they were really looking for, or do they click through an stay on the site?

Look at your stats and see what the number of visitors who viewed one page is compared to the number of users who viewed multiple pages. (You will need a good log analyzer for this, so it may not be possible right now, but good for future reference.) Also, look to see how many visitors visited more than once, and what the average time on the site is. There are some other things to look at, but having a look at these things for a little while can give you an idea of what people are doing and what the trends are.

You might also look and see what people are finding you in SEs for... is it something you have information about, or do they have your site misclassified.

* Aside on the user behavior - I firmly believe this are large part of the SEO of the future - can you create the user behavior necessary for the SEs to think the user found what they were looking for? If not, find a way...

12. Have you run header checks on your index and rewritten pages? Do they serve a proper 200 and 304 (not modified) or something else?

I don't know sad I didn't write the mod, I downloaded it from a very smart coder. She just didn't do 301's yet! I brought that to her attention though.

If you look in your control panel here on WebmasterWorld you should see a link that says 'server headers' on the left side. If you click there, it will ask you for a URL - try your main site and a couple of the .html pages that you know exist. You should see 200 OK after you submit the form (in the status code) if not this is something to look closer at.

13. Did you change anything when you purchased the site? EG convert to static URLs?

Everything is changed. It was impossible to keep the same page names using a new forum. The old forum was from 1997 and written in cgi. The forum is on completely new pages.

It appears SEs are looking not only at the date a URL was discovered, but also the date the individual page were discovered - You may be (looks like you are) starting over, as if it is a new site in the 'eyes' of the SE. (Google does this almost for sure, M and Y I am not sure about yet.)

14. Were the URL's indexed and spiders visiting when you purchased the site?

Hmm...I noticed Google was visiting more than yahoo and msn from the start. I just wasn't sure why.

Again, check the links in and out. I would also look to see if the old forum is still live? If it is could there be some duplicate issues from rewrites in the past.

15. Do the home page topics change or stay the same? IOW do the topics rotate, so M and Y may be having a tough time determining what the site is about?

My homepage is a static html. For the main site I use ssi and css but I write using frontpage in html. So no, the topics in the main site stay the same.

ssi? - do you allow access to all spiders without an include?

Hope this helps.

Justin

chopin2256

8:48 pm on Aug 4, 2005 (gmt 0)

10+ Year Member



What are the descriptions of the pages? Similar descriptions are a killer!

You mean the meta tag? They are all identical. It is just a short blurb about what my site does. You think identical meta tag desciptions will lead to a penalty? If that is the case, should I just delete all my meta description tags? I create too many pages to constantly keep updating new descriptions. My other site that got dumped from Google also had identical meta tags...should I take these off?

Again, I was not clear enough here. There are two separate commands to get the number of links in both M and Y. One is the standard link: command. This one shows how many links go to a specific page. The other is linkdomain: This one shows the total number of links in to a domain, not just a page.

Ok, In MSN
link:http://www.mysite.com -site:mysite.com (1359)

linkdomain:www.mysite.com -site:mysite.com (1530)

This would be good to at least look in to- using the same link: and/or linkdomain: commands have a look at the sites that are listed as linking to you are they good sites? are they the type of sites that tell a SE that your site is quality? Or, are they just lists of links?

Alot of kids/compositional sites. Overall, I would say decent quality sites. I am trying to mature the audience by trying to link to sites such as naxos, and other high profile music sites.

More along the lines of what a SE sees people as doing - do they click on your site, then immediately back to a SE because they did not find what they were really looking for, or do they click through an stay on the site?

Oh its a very well liked site. According to awstats 60% of my visitors bookmark the site. I don't get much traffic yet, and I know that when traffic goes up, that percentage will go down, but this is much higher than my other site, which is about 3% bookmarks. The forum gets 3000-5000 pageviews with about 130 members. So, I'd say its very well liked. it's just a small community for now.

It appears SEs are looking not only at the date a URL was discovered, but also the date the individual page were discovered - You may be (looks like you are) starting over, as if it is a new site in the 'eyes' of the SE. (Google does this almost for sure, M and Y I am not sure about yet.)

Well the domain didn't expire. I don't see why changing a few pages would have a huge effect. I did create brand new html pages, and they are not in the sandbox...they were spidered by Google immediately. I hope I don't have to start from complete scratch again. But I do have the old pages the guy used...should I keep these page names?

ssi? - do you allow access to all spiders without an include?

the ssi is for the borders. What do you mean do I allow access to all spiders? I don't have a robots.txt so I assume all robots have access.

jd01

9:22 pm on Aug 4, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They are all identical. It is just a short blurb about what my site does. ...should I remove them

Yes, absolutely - I have a large directory, that used to have descriptions for the pages that were very similar, because the directory is similar to a phone directory, etc... I could only get to ~200 before I hit 'similar pages' in all major SEs, and had less than 400 pages indexed. I have since removed them to let the SEs decide what my pages are about, and can now go 98 pages on Y, a full 100 pages on G and 25 page on M (25 is all they give you.) and depending on the DC I hit have ~20K pages in G, ~2400 in Y, and ~1000 in M... (these numbers do highly depend on the DC in M and Y - they have just now begun picking up the changes I made a couple of months ago, probably because I made major navigation and structural changes at the same time, and just recently re-worked my site maps to be more Y and M friendly.)

I would be very careful about changing page names - new pages *must* start over, they are new. The only work around that might help is some 301 redirects from similar topic pages (old URLs) to the new ones. With removing all the old pages at the same time, you have removed the history of the domain with it, so essentially, taking down all the old pages and putting up new URLs is very much like starting over...

I know it does not apply here per se, but the G patent app. makes some direct references to the age of a domain based on the initial date a URL is found, the number of new pages to old pages, etc...

W3C 'cool URLs don't change' is another hint that we should create pages for the long term.

If you are steadily growing links and building for the long term you should be fine... work on the deep links to specific topics of interest, remove the duplicate descriptions, try to get a site map, and make sure there is enough text on your pages to keep the SEs thinking you provide information, not just code and you should be fine.

My guess is the descriptions are killing you right now, but that's just an educated guess...

Justin

chopin2256

11:43 pm on Aug 5, 2005 (gmt 0)

10+ Year Member



Maybe you can help me out with something.

I am trying to redirect this link

http://www.example.com/forum/index.php?showtopic=29

To be shown as this link

http://www.example.com/forum/user-topic-key-words-t29.html

The link just above this line was converted by this code:

RewriteRule ^(.*)-t([0-9][0-9]*)-s([0-9][0-9]*)\.html(.*)$ index.php?showtopic=$2&st=$3
RewriteRule ^(.*)-t([0-9][0-9]*)\.html(.*)$ index.php?showtopic=$2$3

Can you help me out with what coding I would need to do this? I am trying to 301 redirect my whole forum, but like you said, this is so complicated.

jd01

12:19 am on Aug 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We should meet in the Apache forum for this one - at it's simplest it's complicated - from there it gets worse.

The first thing you will need to do is change the links in the forum to the .html version of the pages you would like to use the information from, and then use mod_rewrite to put the information from the dynamic page to the static location that does not really exist.

http://www.example.com/forum/user-topic-key-words-t29.html

RewriteRule ([a-z]{1})([0-9]{1,2})\.html(.*)$ /index.php?showtopic=$2 [L]

Anything file request that ends in a single letter, followed by one or two numbers and then .html will be served the information that is located at yoursite.com/index.php?showtopic=THENUMBER

If the letter is necessary, you could remove the middle )( and use variable 1 or you can use $1 for the letter and $2 for the number independently.

If you are going to try to redirect the dynamic URL's back to static it will require the use of THE_REQUEST as the condition. This redirects original requests (EG typing a URL in a browser or clicking on a link) to the new location, but allows secondary requests (EG 'silently' serving pages to the .html equivalent.) so, your script will still run off of index.php?category=t-26, but the only page that will be accessible in the browser will be the html version.

Anyway, start a thread in the Apache forum and we'll get started helping you out. you might post a portion of this message and *exactly* what you are trying to accomplish there so it's easier for other who might have some contributions to understand where we are trying to go.

This might help you with understandig mod_rewrite a little.

[webmasterworld.com...]

Justin

chopin2256

8:01 pm on Aug 8, 2005 (gmt 0)

10+ Year Member



I just noticed that yahoo and msn are spidering my old pages (old pages from the old forum that do not exist) as well as other html pages that dont currently exist. Maybe yahoo and msn are lost and need to be guided back on track?

Should I redirect all these error pages to my homepage?

jd01

10:46 pm on Aug 9, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are a couple of schools of thought on this one...

Some people believe you should let the SE's work things out for themselves, because redirecting everything to the homepage may cause problems.

Others think rather than having a large number of errors, you should send the pages back to pages 1 (the home page) and basically 'send' the SE's back through the site...

I use both to some degree - In this case, since you are not getting spidered now (correctly anyway) and I am guessing if your pages aren't indexed, they don't rank very well, there is really nothing to loose - If you rank well in G you might think twice about it.

Another option is to use the robots.txt to deny the old pages from all SE's and then add a couple of deep links and let them find those.

Justin