How would you clean up the SERPS?

Forum Moderators: phranque

Message Too Old, No Replies

How would you clean up the SERPS?

SlowMove

10:01 pm on Sep 9, 2004 (gmt 0)

There seem to be a lot of threads about auto-generated sites feeding garbage to the search engines. First of all, I don't believe that all sites that auto-generate content put out garbage.

I mean the bad sites. How could you ever devise an algo that could clean up the SERPS? Not all the sites are using duplicate content. It doesn't take too much imagination to figure out ways to churn out lots of pages without lifting someone elses text. I don't even think that it's possible to filter out the bad sites without somehow getting user input.

Essex_boy

10:30 pm on Sep 9, 2004 (gmt 0)

You really need a human to review the pages by hand but boy o boy suitable is a highly subjective term depending on what your ultimate view of teh web is.

Eg do you allow sites that are aff only ....

bakedjake

10:32 pm on Sep 9, 2004 (gmt 0)

How could you ever devise an algo that could clean up the SERPS?

Conspiracy theory of the day: Bad SERPs = More Clicks to Ads

trillianjedi

10:35 pm on Sep 9, 2004 (gmt 0)

That's not a conspiracy theory Jake, that's just plain logic ;-)

But then, so is the logic that says:-

"Brand is based on consumer trust".

I can imagine the brand managers and accountants fighting over that one right now...

danieljean

6:13 pm on Sep 10, 2004 (gmt 0)

Google kicked ass in the SE domain because they used data that wasn't in the page, but data about it: links.

One thing I'd like to see used more is people's bookmarks. New services that let you share them with your friends could be mined as recommendations. Bloggers that share links can be another source, as would be fora like Google Answers and WW.

I'm sure someone will think of a deceptively simple way to do this or use another form of metadata. It might be Google, or it might be some new upstart- these days I think there would be room for another player.

netmar

7:34 pm on Sep 10, 2004 (gmt 0)

You can also use this:
[google.com...]

If you're lucky, maybe they will remove some "cheaters" that are ranking higher than you on a given keyword. ;)

isitreal

9:58 pm on Sep 10, 2004 (gmt 0)

How could you ever devise an algo that could clean up the SERPS?

Convince the world that having almost all access to web based resources filtered through a tiny handful of companies seeking to find ways to maximize their incomes in whatever way possible is not a good idea.

Get major funding for an open source search engine, ideally through a mix of business and government funding. Make the algos as bullet proof as possible, obviously making your algo open source is a challenge since the spammers have access to it, but that would also be its strength, none of this guessing about what's wrong, or how to fix it, you'd know, and it could get fixed.

There are small projects like this, but none have the funding to create the huge server farms needed to actually actively spider the web.

I see this as the next major goal for the open source movement now that Linux is pretty much here to stay.

SlowMove

10:28 pm on Sep 10, 2004 (gmt 0)

>huge server farms needed to actually actively spider the web

I don't know. All my sites get hit by Gigabot. I think redundant storage and returning SERPS is why Google has all the servers. I built little spiders with Perl and LWP that were lightning fast.

An open source search engine is probably a great idea if there was some way, maybe with a browser plugin, to gather information on what sites users stay on for a long time, but scripts could probably written to fool that too.

isitreal

10:43 pm on Sep 10, 2004 (gmt 0)

My sites get hit by gigabot too, but only for a few pages, I've never seen it download the whole site, it's not the spidering per se that is so processor intensive, it's analyzing and cataloging the results real time for 4.5 or so billion webpages, and running all the other stuff that has to run to create meaningful serps when users click on search. And to have the speed necessary to deal with traffic, etc.

But given that the new msn, yahoo, and google serve up pretty similar results, I'd say it's really not all that hard writing a decent search engine algo, not easy, but not impossible.

Having your only real access to the web come through for profit corporations, I don't know, there is something really troubling about that, there should at least be one real alternative that is viable to help force things towards a bit of honesty.