homepage Welcome to WebmasterWorld Guest from 54.196.201.253
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

    
What is wrong with Live.comīs indexing and updating?
jaha




msg:3299130
 2:12 pm on Apr 1, 2007 (gmt 0)

Dynamic website with both aspx and some html pages, dynamic pages with a maximum of 3 parameters, it has been online for 13 months, during that time Live.com has indexed only 160-168 pages, (this number varies between the checked pages), it has indexed some of the dynamic pages with the 3 parameters, so it donīt look like the problem lies there.

March month it took 16 days to have the index page updated, only some of the others has been updated during the last month, although they have been changed, this goes for both dynamic and html pages, there was set up a 301 redirect to lowercase urlīs several weeks ago, it even has updated some of the pages the last couple of days with the old uppercase url, what is this? The dynamic urlīs was 301 redirected to the existing urlīs months ago, it seems to have figured that out, but there is no doubt it has serious problems with 301 redirects, which I have seen mentioned here several times earlier.

My website has daily hits from the msnbot and the msnbot-media (last 3 months: jan about 1250, feb about 2000, mar between 4-5000), but it does not seem to index and update the content anyway, so my question is this: is there any way to get the indexing and updating speeded up? This seems to be rater hopeless, what is the point in the visits from the bots if the pages is not indexed after all.

aj

 

engine




msg:3301097
 3:15 pm on Apr 3, 2007 (gmt 0)

Welcome to WebmasterWorld jaha,

Is there a pattern developing, from what you've said? Could the pages being picked up be the ones that have new content?

jaha




msg:3301235
 5:33 pm on Apr 3, 2007 (gmt 0)

Thank you.

No I donīt see any pattern here, what is updated seems to be very random, but there has been done canges to most of the pages, including some smaller updates in the dynamic pages, and they continue to update with the uppercase even in the latest days updated pages, so there is some updated with proper lowercase and others with the old uppercase, seems to be most with the uppercase though.

Did a site-search today, with www.domain.com/ 160, (down from 168 the day I posted), with only domain.com/ 170, with domain.com 169, the number changes for every different way of writing the domain, although all returned pages has the proper www in front (301 redirect set up 9-10 months ago). Also noticed today that there is many pages without the "Cached page", not sure if this is new, maybe I just not noticed it before.

Not sure what is going on, there is changed pages been hit by the robots several times during the last month, without being updated in their index, not to mention several thousand of hits not been indexed either, just hope there is some delay in their indexing right now, but who knows these days.

caveman




msg:3302217
 3:48 pm on Apr 4, 2007 (gmt 0)

Hard to say without a close inspection. Some questions though:

1) Are you certain you don't have a structural problem. For example, is your system generating dynamic pages under multiple URI's? Or, is there some other issue that is causing the same page to be shown under multiple URI's? Sounds suspicioiusly like a dup content issue.

2) How's your backlink profile? Do you have many backlinks and if so are they from quality, unaffiliated sites? Are they mainly bought links? Does your site share a common backlink footprint/profile with other sites you own? Do you have another similar site?

3) What sort of content do you have on your site? Is it mainly from feeds or otherwise comprised of content large accessed from other sources?

jaha




msg:3304177
 12:29 pm on Apr 6, 2007 (gmt 0)

Hi

I have answered you questions below, and added some other things I found in msn site search, a lot of reading, sorry about that.

"1) Are you certain you don't have a structural problem. For example, is your system generating dynamic pages under multiple URI's? Or, is there some other issue that is causing the same page to be shown under multiple URI's? Sounds suspicioiusly like a dup content issue."

No, this should not be a problem as far as I know, I have a httpd.ini file with rewrites and redirects set up, and that rules do return the 301 properly. (Got the rules from the Isapi Rewrite staff themselves)

"2) How's your backlink profile? Do you have many backlinks and if so are they from quality, unaffiliated sites? Are they mainly bought links? Does your site share a common backlink footprint/profile with other sites you own? Do you have another similar site?"

Last time I checked there was around 800 links in msn, most of them from websites who have linked to me during the years, (this is a 3-4 year old domain used for the new website), so there is a lot of them linking with the old info from earlier website, although it is a related topic, it still is a very different website, old one mainly of affiliate links, today I have none in website.

And most of them has linked to the website without any reciprocal, (the quality Iīm not to sure of, I have not checked all these links, and there is not to much to do about it anyway, how would I possibly remove those?)I had a link directory on earlier website with a maximum of 2-300 links, did remove almost all of them when I started this website, kept only the one that changed the info to new site, today I have 30 links in a page, have stopped the reciprocal linking, and instead submitted to directories and search engines,(one by one), but have not had to much time to focus on that.

The only bought links I have is from paying to get in to directories about a year ago, not many though, (I think the rules here is very unclear on what exactly is defined as a paid link, if that involves the directories as well, and if that is a "bad thing" to.)And finally, I do not have any other websites, and I only have a link to the linkpage on the index page and the sitemap, nowhere else.

"3) What sort of content do you have on your site? Is it mainly from feeds or otherwise comprised of content large accessed from other sources?"

The content is accommodations, this is a totally (affiliate)XML driven website on the dynamic part, so there might be a problem with the content, not sure when it comes to msn, it donīt answer why it donīt index pages, but in google it is in the famous "supplemental hell", still, it has more than 26000 indexed, and have managed to get the lowercase url updated(I have blocked it in the pages now to get it to remove the dynamic pages, until I can get the pages more updated. It is only two urls, the rest have been blocked in robots.txt from the start, as they never should be indexed.)

One thing that I have noticed the 2-3 last days is that googlebot have started to ask for pages in my test folder, which also have been blocked in robots.txt since day one, the only way it can find this has to through the google toolbar, I had to put up a login on that folder, stupid bots not obeying robots.txt, same for msnbot getting scripts and other things from folders blocked in the robts.txt.

...............
I did a further check on the site search in msn and did see that it has updated more of the pages, but has also several pages indexed twice (html pages), one thing I did see was that the pages from one url without the cache was due to the description that was to similar with only some words different, this was changed weeks ago with just a few words description, where only 3 is similar, rest dynamic, so the updated pages do have the cache date, the other url I still donīt understand as the description is different on each page(dynamically added).

Still think it has something to do with the 301 redirect problems in msn, as it donīt manage to update the lowercase urls, and believe it or not, the urls without the cache date has all the lowercase added, very strange. Number of pages still changes from page to page checked, about the same number as earlier, hoping every day that it could speed up the indexing, but no.

PS: One thing that was done in the last days was to add the "modified since date" to pages, not sure if that is a problem, but finally found a way to do it, not to experienced in these dynamic things and windows servers, as you might understand, a lot of traps you never have to deal with in other servers, but learning along the way, or maybe rather in the hard way.

Anyway, thanks a lot for trying to help with this problems.

aj

caveman




msg:3304376
 4:38 pm on Apr 6, 2007 (gmt 0)

No, this should not be a problem as far as I know, I have a httpd.ini file with rewrites and redirects set up, and that rules do return the 301 properly. (Got the rules from the Isapi Rewrite staff themselves)

"should not be a problem..." Ya, I've heard that before. Thinking it's not a problem doesn't mean much. What you need to do is search on some query strings that are unique to a given page and see if the page comes up more than once. (You can do same for URI stings too.) Or, just scan the SERP's for pages indexed and see if there are dups. But that can be tedious when there are 1000's of pages.

... this is a totally (affiliate)XML driven website on the dynamic part, so there might be a problem with the content, not sure when it comes to msn, it don�t answer why it don�t index pages, but in google it is in the famous "supplemental hell", still, it has more than 26000 indexed,

Hmmm, this sort of site will almost certainly never rank well in today's Google, and MSN is increasinly adept at eliminating this stuff too. Bascially you're talking about a large with with mainly undifferentiated content, delivered by feeds. Good luck with that.

... and have managed to get the lowercase url updated(I have blocked it in the pages now to get it to remove the dynamic pages, until I can get the pages more updated. It is only two urls, the rest have been blocked in robots.txt from the start, as they never should be indexed.)
...............
I did a further check on the site search in msn and did see that it has updated more of the pages, but has also several pages indexed twice (html pages), one thing I did see was that the pages from one url without the cache was due to the description that was to similar with only some words different, this was changed weeks ago with just a few words description, where only 3 is similar, rest dynamic, so the updated pages do have the cache date, the other url I still don�t understand as the description is different on each page(dynamically added).

I'm not sure I understand everything going on here, but aside from the fact that you've got a site mainly built on a foundation of duplicate content, it seems as though you're saying that pages are indeed indexed under multiple URI's? If so, between that and the dup/feed content, here's what you need to do:

Be certain you are eliminating all dup URI's. One per page. 404 or 410 the rest. Must be done.

Second, find a way to add substantially more unique content to every page, or at least a vast majority of pages.

Failing to do both of these things will continue to have your site looking like just another spam/feed site with no or minimal unique content, and that's a model that is fading. Fast.

Alternatively, though an ad network feed on the site, if it's not there already, squeeze what you can out of it, and move on.

jaha




msg:3304526
 7:54 pm on Apr 6, 2007 (gmt 0)

I have checked a lot of urls with a unique query string for each page, and there was no duplicate urls in any of them, as I expected, there has never been any issue in msn regarding this, only in google there has been some episodes with testing urls by adding different stuff to them, but this has been solved by rewrite rules as soon as they was found. So I do mean there should not be a problem regarding this, it has to be other things. Have not seen any duplicates in google either for that matter, at least not in the 1000 you can see.

"Hmmm, this sort of site will almost certainly never rank well in today's Google, and MSN is increasinly adept at eliminating this stuff too. Bascially you're talking about a large with with mainly undifferentiated content, delivered by feeds. Good luck with that."

I think you missunderstood me here, it is content added in my website by XML elements added in asp.net coding, in my own webdesign, it is not as there is duplicate pages in any other websites, it is only the xml content (mainly the accommodations info from the owners themselves) itself that would be the same, even that would be some different depending on xml elements used, so you can not find any pages exact the same as mine. But I do agree on the fact that there has to be more content added, and I am working on solutions on that matter. There are still many websites like this both in google and msn, at least for now.

caveman




msg:3304616
 10:11 pm on Apr 6, 2007 (gmt 0)

Hey jaha, I think understood. The majority of content in your site is found in the same exact form on many other sites around the Web. Is that right? If so, like I said, good luck with that. ;-)

Also, that is why I said: "Second, find a way to add substantially more unique content to every page, or at least a vast majority of pages." Enough unique content per page might, might help.

The engines have no interest in featuring pages from sites like this.

When a given search, i.e., "cityname hotels" yields the exact same information served up by the first ten pages listed in a SERP, that doesn't help anyone, and they know that.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved