Welcome to WebmasterWorld Guest from 54.146.1.178

Forum Moderators: phranque

Message Too Old, No Replies

How would you clean up the SERPS?

     
10:01 pm on Sep 9, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 8, 2003
posts:659
votes: 0


There seem to be a lot of threads about auto-generated sites feeding garbage to the search engines. First of all, I don't believe that all sites that auto-generate content put out garbage.

I mean the bad sites. How could you ever devise an algo that could clean up the SERPS? Not all the sites are using duplicate content. It doesn't take too much imagination to figure out ways to churn out lots of pages without lifting someone elses text. I don't even think that it's possible to filter out the bad sites without somehow getting user input.

10:30 pm on Sept 9, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member essex_boy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 19, 2003
posts:3187
votes: 4


You really need a human to review the pages by hand but boy o boy suitable is a highly subjective term depending on what your ultimate view of teh web is.

Eg do you allow sites that are aff only ....

10:32 pm on Sept 9, 2004 (gmt 0)

Administrator from CA 

WebmasterWorld Administrator bakedjake is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 8, 2003
posts:3878
votes: 57


How could you ever devise an algo that could clean up the SERPS?

Conspiracy theory of the day: Bad SERPs = More Clicks to Ads

10:35 pm on Sept 9, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member trillianjedi is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 15, 2003
posts:7249
votes: 0


That's not a conspiracy theory Jake, that's just plain logic ;-)

But then, so is the logic that says:-

"Brand is based on consumer trust".

I can imagine the brand managers and accountants fighting over that one right now...

TJ

6:13 pm on Sept 10, 2004 (gmt 0)

Preferred Member

10+ Year Member

joined:Dec 2, 2003
posts:515
votes: 0


Google kicked ass in the SE domain because they used data that wasn't in the page, but data about it: links.

One thing I'd like to see used more is people's bookmarks. New services that let you share them with your friends could be mined as recommendations. Bloggers that share links can be another source, as would be fora like Google Answers and WW.

I'm sure someone will think of a deceptively simple way to do this or use another form of metadata. It might be Google, or it might be some new upstart- these days I think there would be room for another player.

7:34 pm on Sept 10, 2004 (gmt 0)

New User

10+ Year Member

joined:Aug 16, 2004
posts:25
votes: 0


You can also use this:
[google.com...]

If you're lucky, maybe they will remove some "cheaters" that are ranking higher than you on a given keyword. ;)

9:58 pm on Sept 10, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 15, 2004
posts:1300
votes: 0


How could you ever devise an algo that could clean up the SERPS?

Convince the world that having almost all access to web based resources filtered through a tiny handful of companies seeking to find ways to maximize their incomes in whatever way possible is not a good idea.

Get major funding for an open source search engine, ideally through a mix of business and government funding. Make the algos as bullet proof as possible, obviously making your algo open source is a challenge since the spammers have access to it, but that would also be its strength, none of this guessing about what's wrong, or how to fix it, you'd know, and it could get fixed.

There are small projects like this, but none have the funding to create the huge server farms needed to actually actively spider the web.

I see this as the next major goal for the open source movement now that Linux is pretty much here to stay.

10:28 pm on Sept 10, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 8, 2003
posts:659
votes: 0


>huge server farms needed to actually actively spider the web

I don't know. All my sites get hit by Gigabot. I think redundant storage and returning SERPS is why Google has all the servers. I built little spiders with Perl and LWP that were lightning fast.

An open source search engine is probably a great idea if there was some way, maybe with a browser plugin, to gather information on what sites users stay on for a long time, but scripts could probably written to fool that too.

10:43 pm on Sept 10, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 15, 2004
posts:1300
votes: 0


My sites get hit by gigabot too, but only for a few pages, I've never seen it download the whole site, it's not the spidering per se that is so processor intensive, it's analyzing and cataloging the results real time for 4.5 or so billion webpages, and running all the other stuff that has to run to create meaningful serps when users click on search. And to have the speed necessary to deal with traffic, etc.

But given that the new msn, yahoo, and google serve up pretty similar results, I'd say it's really not all that hard writing a decent search engine algo, not easy, but not impossible.

Having your only real access to the web come through for profit corporations, I don't know, there is something really troubling about that, there should at least be one real alternative that is viable to help force things towards a bit of honesty.