homepage Welcome to WebmasterWorld Guest from 54.161.236.229
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 99 message thread spans 4 pages: < < 99 ( 1 [2] 3 4 > >     
Creating alternative to Google serps
EditorialGuy




msg:4690191
 6:50 pm on Jul 23, 2014 (gmt 0)


System: The following 6 messages were cut out of thread at: http://www.webmasterworld.com/google/4690067.htm [webmasterworld.com] by goodroi - 9:17 pm on Jul 23, 2014 (utc -5)


Need in-depth research information about a product? Google won't provide it so searchers have to go elsewhere.


Yes, and in most cases, they go elsewhere by clicking on Google search results.

 

jmccormac




msg:4690447
 7:29 pm on Jul 24, 2014 (gmt 0)

OK, let's look at some practical issues.
Not even close to being the right questions.

1) Where does the underlying data come from?
Ever hear of the World Wide Web?

2) How do you correct for "garbage in, garbage out"?
By being smart rather than stupid about what the search engine spiders and includes.

3) You talk about "demoting Amazon" and, presumably, other brands that compete with your search engine's sponsors.
It's got sponsors now?

4) Where is the demand for this "alternative Google" coming from?
This might be hard to hear, but not everyone thinks that Google is great. A new search engine has to be better than Google (with the banjaxed results, that might be getting easier), give the searchers what they want and provide opportunity, if necessary, for monetisation.

Regards...jmcc

iammeiamfree




msg:4690448
 7:29 pm on Jul 24, 2014 (gmt 0)


1) which set of Google SERPs do you scrape?


It could be whatever engine each webmaster thinks is best to build their mini index with so not necessarily google or just google.

2) How do you correct for "garbage in, garbage out"? Let's say that Google is ranking scrapers higher than the original sources, or that John Doe's site has disappeared from the SERPs because of a wrongly-applied manual penalty.

The webmaster would remove any spam sites in their mini index and contact other participants in their niche asking them to do the same. Then if the spammer also set up a mini index their site would be included but rank lower because the community in that niche had more often removed or demoted that site. I am thinking like that the software could have a bunch of settings so that webmaster could tune it to perform well for the particular niche and clean up the results.


Are you going to write a complicated algorithm to run on top of Google's already complicated algorithm? Are you going to hack into Google's data centers to extract pages that aren't being shown? And who decides whether the penalty against John Doe was legitimate or unfair from a searcher's point of view?

John Doe could manualy add their site to the network so it would then get positioned by the webmasters involved in that topic.

3) You talk about "demoting Amazon" and, presumably, other brands that compete with your search engine's sponsors. Is this idea driven by what searchers want or what you want? And who decides which sites should be "demoted"? (Not the sponsors, presumably, or Amazon could just kick in a few million bucks and demand a spot at the top of the heap.)

Amazon could join the netork and form relationships with the other webmasters in the niche and see where they end up. The participants would have an interest in giving a fair ranking to amazon because they would want the project to be successful and the users to be happy with the results. If amazon attempted to bribe the webmasters in the niche they would need to approach a large proportion so there could be a risk of exposure or being banned from the index. It could be like traffic bribes ok but monetary leads to a ban perhaps?



4) Where is the demand for this "alternative Google" coming from? Duckduckgo certainly hasn't taken the world by storm, and Bing hasn't been able to take market share from Google despite huge expenditures on search technology, promotion, and advertising. Is there an audience for an "alternative Google" beyond disgruntled SEOs and site owners?


Absolutly. With the backing of the webmasters users will not need to go to google since all the participating sites will already be part of a search engine and the users will therefore already be on a search site. They will get much better results and traffic will flow much more acorss sites. They will discover how the web was always supposed to be.

superclown2




msg:4690452
 7:53 pm on Jul 24, 2014 (gmt 0)

Are you suggesting that the search engineers at Google, Bing, Yandex, etc. are marketers?


Is that a serious question? Wow. I think further comment is unnecessary.

There are none so blind, etc etc .......

CaptainSalad2




msg:4690453
 7:55 pm on Jul 24, 2014 (gmt 0)

If there was a serious grass roots for non profit project id be happy to invest several hundred pounds per month in helping it get going (as a business expenses of courses ;)) I already donate money to the W3C (I'm not sure why anymore) basically in a few years the way googles going I like my clients am broke anyway so why not :)


Pool resources, build something free/open and save whats left of the internet, people can do anything if there is a will! Hell if google goes all ads I'm back to being a personal trainer, and although the sides effects (abs and random women wanting photos with you... Much to ur wife's disgust) is awesome it's rather to sit on my butt tapping the keyboard all day and read eg telling me how google is the best search engine ever, love u eg ;)

brotherhood of LAN




msg:4690455
 8:09 pm on Jul 24, 2014 (gmt 0)

https://aws.amazon.com/datasets/41740

That would be an excellent starting point for someone to build an index/ranking algorithm without laying out large amounts of cash and setting up infrastructure.

Throw in freebase, wikipedia and other publicly available data for some easy contextualising.

Surely everyone is a marketer. If you built an engine of high quality you'd really be doing yourself a disservice by not telling anyone.

coachm




msg:4690456
 8:19 pm on Jul 24, 2014 (gmt 0)

If the goal is to create better results and save time, there is a solution that I have been using for years, leveraging Google's own custom search engines. If you don't know, you can create a search engine, BUT choose what pages and sites should be included in the search. These are called niche search engines.

I've done this for about seven topics, some related to specific business topics (leadership and management is one), and my main hobby, which is vintage audiophile stuff.

It's free, but displays ads (which I believe I get paid for), but there's a paid option.

Anyway, the thing is that in many, perhaps most tight enough niches, you can get excellent results just search 20-30 authority sites. That's the case with audiophile stuff, and also business topics, since there just aren't that many fabulous sites in each niche.

I chose sites to include in the various engines by choosing information rich sites that had minimal commercialness, or if the sites are commercial, their information/sales pitch ratios were high.

The results are that if I need to find out information about, let's say a specific Advent speaker made in the 70's all I do is enter the model name, and zap, the results are focused, and with no BS, and no junk.

I have made these engines publicly accessible (the url is in my profile, I think), and I haven't really told many people about them, so the use is miniscule (except for me). I use them for research purposes.

For me, the only way we'd get better results is to go niche, and this capability really works well. Can one scale it?

That would depend. Google probably won't like it and probably limits api's but you can actually create a custom search engine and give others permission to add sites and manage the chosen content. You can also tag sites and pages and the user can be presented with "tabs".

So for example, in my audiophile search engine, there's a tab for buying and selling, one for repairs/parts and one for info to choose, and of course the user can see all the search results.

I think it's a model that works, but scale, and Google reaction to widespread use would be potential problems, in addition to letting enough people know about the niche engine to actually "make a dent'.

EditorialGuy




msg:4690462
 9:21 pm on Jul 24, 2014 (gmt 0)

The original proposal in this thread was:

Let's create a new engine that doesn't have it's own tech, but rather, scrapes Google, moves the ads back to the right sidebar, demotes amazon, demand media, etc, and sends the results back to the user.


We seem to have gone beyond that (and certainly beyond the topic of Google Search). IMHO, this thread would be a better fit for the Alternative Search Engines forum, which could use the traffic:

[webmasterworld.com...]

jmccormac




msg:4690466
 9:36 pm on Jul 24, 2014 (gmt 0)

Discussing a Google Killer search engine in the Google section? Sheer heresy! :) Wonder if a few more of the Alternative Search Engines posters are still around?

Regards...jmcc

rish3




msg:4690469
 10:05 pm on Jul 24, 2014 (gmt 0)

The original proposal in this thread was:


Proposal is perhaps a bit of a stretch. I posted that to point out the irony that Google would be unhappy about it, even though that's basically what the KG is.

But, as often happens, the thread has taken a life of it's own :)

iammeiamfree




msg:4690473
 11:02 pm on Jul 24, 2014 (gmt 0)


I think it's a model that works, but scale, and Google reaction to widespread use would be potential problems, in addition to letting enough people know about the niche engine to actually "make a dent'.


Seriously we have to take control out of googles hands. Come Monday morning they will be working to coopt these ideas and get webmasters to stay onboard the sinking ship. Google is rife with CIA and NSA. It is basically part of the security services whereas we as webmasters have a different job to do. Our job is to take forward what has been created by our forbears in the computer world. We as webmasters are properly responsible for managing search. This is why we are called webmasters. It is a type of modern day wizard. A wizard would have 10 or 15 years training. We may not all have had the appropriate training but nevertheless there are those of us who are relatively wise and competent to take on the task of ensuring the right direction for the future of search and web development. This is really important.

jmccormac




msg:4690476
 11:38 pm on Jul 24, 2014 (gmt 0)

We may not all have had the appropriate training but nevertheless there are those of us who are relatively wise and competent to take on the task of ensuring the right direction for the future of search and web development. This is really important.
Building search engines is a bit different to building websites and requires people to be able to think in a different manner. The last thing that such a venture would want is to turn into another Wikia Search with a bunch enthusiasts and no professionals with real world experience. Wizards? Pah! I'm a NetGod. :)

Just to illustrate the difference in thinking. In World War II, when Special Operations Europe (not a great example) was working out how to take out buildings, equipment and bridges efficiently, it didn't ask architects and engineers, it asked insurance claims adjusters.

Regards...jmcc

EditorialGuy




msg:4690486
 1:06 am on Jul 25, 2014 (gmt 0)

We as webmasters are properly responsible for managing search.


No, search should be "managed" (I'd probably say "conducted") by third-party specialists, for the reasons jmccormac mentioned but also to preserve the impartiality of search.

iammeiamfree




msg:4690521
 8:03 am on Jul 25, 2014 (gmt 0)

I don't find that argument convincing. The current state of things is anything but impartial. A game has been created setting webmasters against webmasters (divide and conquer). The thinking that the other webmasters are competition and that a beguiling imposter at the heart of the military industrial complex should manage search. In reality webmasters are colleagues who should work together to innovate in their niche. Rather than sites being seperate entities all reinventing the wheel they should be interconcted to improve efficiency and quality. The search specialists absolutely have their place in the project but the webmasters themselves should be directly involved in managing their own mini search engine that forms a part of the whole. It is not about blowing up buildings. We are talking about something more like building roads and if you want to build an efficient road network then the architects are definately people who should be desinging and planning how their individual building is going to fit in with the neighbouring community. Is there schools in the area, suitable bus service nearby etc. Big mistake to think the webmasters can be disenfranchised. We build the web and have the power to get people to download a toolbar with our search engine and convince the public that it is the better way. This idea about the mini search engine is about giving the webmasters powerful tools to help their visitors find the best that is available and the micro engine is something that has a lot of power even out of the gate as a standalone index. Be like you can design the route to get to the local bakery, there is the post office, here is the primary school etc. Oh dear we have discovered there is no veterinary service in the area. Let's get one set up.

mcneely




msg:4690584
 1:23 pm on Jul 25, 2014 (gmt 0)

And who'd dominate the open-source contributions?


Okay - so you've got my attention.

The course of this conversation seems to be getting the cart before the horse ... you're talking all this big scalable stuff, big money, when in reality, most of this sort of thing can start with very minimal effort and cost.

I could see a possible Mozilla type of model - with contributing entities .. i.e. those with boxes available to run on the side - a conglomeration of connected databases across the web. (for instance - I have 4 boxes to apply to the cause, and I wouldn't be at all above letting a trusted source access my niche search db's for just such an effort as this)

I've put a lot of work into what I've got and my listings are meticulously clean.

No tracking, and no amounts of immense data storage beyond what would be needed to supply the properly queried results.

I don't like Nutch .. never have. Nutch has been sorely abused over the years and it along with variations of it has been disallowed for quite literally years on my sites.

mcneely




msg:4690585
 1:28 pm on Jul 25, 2014 (gmt 0)

I think DDG runs spiders.


Yes .. it's DuckDuckBot 1.0 and it's the crawler that DuckDuckGo uses these days.

RedBar




msg:4690588
 1:32 pm on Jul 25, 2014 (gmt 0)

@iammeiamfree

Please could you use some line breaks, your posts are difficult to read, thanks.

mcneely




msg:4690598
 1:43 pm on Jul 25, 2014 (gmt 0)

@EditorialGuy

You can't scrape Google or Bing, or anyone else for that matter. If you did, you would be spending the rest of your life sifting through the garbage that these indexes think are relevant.

You'll write the rules into the bot and send it out into the World Wide Web ... You would have your clean results based on what sort of rules you assign for any given parsing sessions -- it all grows from there.

Provide a set of results that totally blows away the crap that these other engines value so highly. Real world stuff, from dog groomers to seo firms and everything else in between.

In the marketing/search industry, you don't go out looking for demand ... You create the demand - just like Yahoo, Alta Vista, HotBot, Excite, and others did back in the day.

All of the underlying data comes from us - I'm pretty sure that there are those of us here that can pull it off without having to scrape wiki's and other indexes.

You want new, fresh, and relevant results? .. Then your new search engine has to start at the beginning and move along from there.

EditorialGuy




msg:4690601
 2:11 pm on Jul 25, 2014 (gmt 0)

@EditorialGuy

You can't scrape Google or Bing, or anyone else for that matter. If you did, you would be spending the rest of your life sifting through the garbage that these indexes think are relevant.


Yep, that's what I was saying, too.

You want new, fresh, and relevant results? .. Then your new search engine has to start at the beginning and move along from there.


Precisely. And that's the hard part. Start building a DIY search engine today, and you just might catch up with AltaVista or HotBot in a few years.

bumpski




msg:4690602
 2:11 pm on Jul 25, 2014 (gmt 0)

A BitTorrent style approach could address many of the issues mentioned in previous posts. It provides distributed, free, processing. Security would be a bear though. Open source could certainly help there.

There's really no reason a Bit Torrent style approach couldn't provide a free "cloud", and replace facebook too.

It would evolve into SkyNet and consume Google!

But I digress.

mcneely




msg:4690613
 2:29 pm on Jul 25, 2014 (gmt 0)

@EditorialGuy

So Alta Vista and others might have not been the best example ... and even still, I don't mind being patronized -

My point being is that you've got to have your own ...

In the early days, everyone scraped everyone else, and when push came to shove (Yahoo purchasing AllTheWeb for instance) some search engines, like Google, were left having to fend for themselves - They had to hot foot it into building their own.

The problem I see with regard to starting a search engine these days is that too many people, including you, are much too focused on Google - oooh Google is too big, we can never beat Google.

Focusing on beating anything is your first mistake here. The only one that's going to beat Google, is Google itself. So all bets are off for anyone who thinks that there is a Google killer out there separate and independent of Google.

Google's core has been on the decline for a while now, and this might not be such a bad time to really start considering some alternatives.

When Google goes down, and it will, there's going to be a vacuum to fill. Wouldn't it be great if there was something there to fill the void?

If you go into this with the mindset of beating Google, or Facebook, or whoever, then you might as well just hang it up and go home.

If you go into this with a genuine passion for creating a great search experience, then chances are good that you'll make it.

But you've got to start at the bottom .. just like everybody else did.

EditorialGuy




msg:4690647
 4:11 pm on Jul 25, 2014 (gmt 0)

If you go into this with a genuine passion for creating a great search experience...


But is that the goal of the people who are calling for "an alternative go Google"?

Would a crowdsourced search engine be any less corrupt than the Open Directory Project?

CaptainSalad2




msg:4690656
 4:53 pm on Jul 25, 2014 (gmt 0)

No man let's make a massive profit and use 100% profit after expenses to solve cancer, then let's plough 100% of the profits from the cure to cancer into solving another human condition, then let's plough 100% profits into space travel! Let's Star Trek it all from open source people working for people, for the better meant of mankind :)

It could be epic it just needs a non commercial spark! As a species we can do anything if we think open source, selfless, work together and use the current commercial climate to fund the betterment of all mankind! Okay I'm kinda drunk but I love he idea of human beings working for the greater good rather than narrow minded personal profit :)

coachm




msg:4690667
 5:22 pm on Jul 25, 2014 (gmt 0)

I'm surprised that on the few occasions I've mentioned using Google custom search to create niche search engines, that few comment or are excited by what I think is a powerful business model, that yields much better search results. I haven't pursued it, because I'm much closer to retirement than I was, and have no desire now to start a new enterprise.

Seriously, any of you can make a search engine that is better within its niche than any universal engine, AND use adsense revenue to fund it, if one wants to.

The key is small, targeted and simple. For most niches, let's say art history, or sculpture or the topics I have used as "proof of concept" (again, see my profile), you need less than one HUNDRED sites to cover the entire topic well, if they are hand chosen. The niche engine can be created by ONE person in an afternoon, and it kicks *ss in terms of SERP quality.

Not only that, but the code can be made available to other sites for their use, again at now charge, which makes it a semi-viral process. Get such an engine on informational sites, let's say art museums (if we're talking about art) and you increase that visibility.

(I make the code for my engines available to anyone that wants to use them on their sites).

The only issue I don't know about is the scalability -- whether there are limits on how many searches Google will allow, and whether that would be a problem.

Try it out for yourself. Create your own engine on the topic of your website, hobby, business, whatever, and you'll find that you can surpass SERP quality so quickly, and easily, it's amazing.

NO universal search engine can compete. You don't come at it trying to be everything to everyone. You niche your engine.

So, as a business model, the issue is ONLY about getting people to use the thing, and that's the main reason that at this time in my life I haven't pursued this as a business.

Crowdsourcing doesn't work, though. And you don't need to do it for niche engine. A couple of people -- let's say ardent hobbiests can create an engine of excellent quality. You don't have to crowdsource, like DMOZ, so you avoid the corruption issue.

Will webmasters be unhappy if they aren't included? Who cares IF the goal is to create engines that allow searchers to find what they want without the spam, popups, and ridiculous commercialism.

jmccormac




msg:4690670
 5:44 pm on Jul 25, 2014 (gmt 0)

The webmaster would remove any spam sites in their mini index and contact other participants in their niche asking them to do the same.
Self regulation is no regulation. It doesn't work when financial advantages are involved.

The Wikia Search venture was meant to be a social media driven search engine with voting on SERPs. It was a great idea in theory but the fundamental problems (lack of an indexing strategy and complete lack of index maintenance expertise) gave it a reality check from which it didn't bounce back.

@Coachm The idea of a pure interest driven search engine is a classical vertical search engine. The problem with the approach from a business point of view is that the business doesn't have control over the data and does not own the search engine. However merging it with a high quality site or directory would be a good idea. The problem is that one just cannot trust Google not to change the terms of access if it becomes successful.

Regards...jmcc

mcneely




msg:4690672
 6:16 pm on Jul 25, 2014 (gmt 0)

@coachm

Spot on ... Since I run multiple niche search indexes, with my smallest using any combination of 21105343 keywords, minus the 'ifs' 'ands' and 'buts' I can't help but totally agree with you.

I don't use Google however - I've written my own and they all scale quite nicely thank you very much :).

as an aside, i've been called out on my UA not adhering to robots.txt in the past, but I think I've got that pretty much all cleared up now.

RedBar




msg:4690689
 7:56 pm on Jul 25, 2014 (gmt 0)

The problem is that one just cannot trust Google not to change the terms of access if it becomes successful.


Their current stance:

choosing the content your users search: your site, a collection of sites that you choose or the entire web. You can also prioritise and restrict search to specific sections of sites.

webcentric




msg:4690716
 11:26 pm on Jul 25, 2014 (gmt 0)

Bottom line, if you own the engine, you are the quality control. Want to let people bribe their way to the top of your results? It's your engine. Want to list everything on the web? Have at it. What is interesting though is that discussion is pushing the idea that search sites, directory sites and content sites are all just websites in the grand scheme of things and you can easily blur the line between these types of sites on the way to developing some very high-quality, and appreciably authoritive Internet resources. Want to compete with the big fish? Think small and let them worry about what scales. Be great on your own scale. As soon as I get in front of an actual keyboard, I'll have more to add. This is a great topic but typing on a touch pad sucks (or I suck at it).

iammeiamfree




msg:4691079
 4:30 pm on Jul 27, 2014 (gmt 0)

Self regulation is no regulation. It doesn't work when financial advantages are involved.


For sure there is a problem when you are dealing with commercial queries and that applies to whichever method you use.

With individual sites having their own mini search engine for their niche and then using an amalgamation of many mini niche indexes as the basis for a big search engine the problem is there too.

For something like a model railway niche it could work without any particular problem. The sites are probably quite cooperative and could put together good indexes.

For something like a real estate niche then you would probably have some sites creating dozens of extra mini indexes on seperate domains so they could get more influence in the rankings.

The thing is that I am sure there is a solution. You could have people giving favour to sites that send them traffic and this would mean that those sites rank higher than sites that do not refer much traffic. This would be a good thing coz those sites would receive traffic but then send it on.

You could give a boost to the sites who have updated their mini indexes most recently.

You could have users rate the mini indexes as they browse thru the niche.

The idea is to make each web page content followed by resources so you could track the flow of a user as they browse the niche. They might enter at site 1 and then visit site 2 from site 1 and site 3 from site 2 (more efficient that hitting the back button). Site 1 is ranked number 1 because it has a very good selection of links and its index is one of the best in the niche. Sites 2 and 3 have given good position to site 1 because it is generous with traffic and has a good resources and content (including giving sites 2 and 3 good position). Site 9 might be good content but it doesn't link out much and has not participated in the engine.

The real estate agent with 10 spam sites trying to game the system might have been excluded from indexes in sites 1, 2 and 3 for 9 out of the 10 domains and sites 1, 2 and 3 might have selected a differnt domain from spam real estate agent to include (they do have some decent properties). Then tracking the flow of the traffic thru the niche it is discovered that the visitors are not happy with the 10 spam sites because they are all practically the same so they get demoted and spam agent decides to remove them and play by the rules. Now his rankings improve for a single domain and because he is driving traffic into the network from offline advetising sites 1, 2 and 3 respond by giving ex spam site improved position and it moves into number 4.

Anyway just an example of how it could work. The point being that the problem can be solved I believe if we put our minds to it.

Here is another idea site 1 might also have very good index for a number of related queries. The real estate vistitors might want to know about primary schools in the area they are interested in buying a house in and because site 1 has been working hard on their index and monitoring the keywords users are entering he or she also has good idexes, content and links for more related queries (e.g. primary schools) than sites 2 and 3 so that factor is helping his or her rankings in the niche. The sites in the primary school niche rate site 1s content for primary schools highly and some of them have also set up pages, indexes and links for real estate. So the webmasters in the primary school niche are effecting the rankings in the real estate niche and they are relatively impartial. If you consider the influence of some other related queries on the real estate rankings like let's say employment niche it quickly gets to the stage where the most effective method of improving rankings is to work on your search engine, improve content and buid relationships with your colleagues. Trying to game the system just doesn't work. Then again it might be a recipe for gang warfare but you could have a death toll penalty.

jmccormac




msg:4691101
 6:43 pm on Jul 27, 2014 (gmt 0)

For sure there is a problem when you are dealing with commercial queries and that applies to whichever method you use.
It is a problem in all areas due to naturally occurring competition.

The thing is that I am sure there is a solution. You could have people giving favour to sites that send them traffic and this would mean that those sites rank higher than sites that do not refer much traffic. This would be a good thing coz those sites would receive traffic but then send it on.
Perhaps. But it would require that human traffic be distinguished from bot traffic.

You could give a boost to the sites who have updated their mini indexes most recently.
It would essentially reward content churning.

The point being that the problem can be solved I believe if we put our minds to it.
There is a solution: Don't use Google's GIGO approach and rely on trying to sort out the spam after the spam has destroyed the index. This is where the simpleton approach of Garbage In Garbage/Google Out used by Google just causes problems.

Trying to game the system just doesn't work.
There is a fast and elegant method of stopping a lot of the issues that cause problems for search engine submissions. One could deepsix any meatbot/spammer submitted site.

There's a lot of talk in the thread of how things should be done after the search engine is built but very little discussion on how to create an index or build the search engine. Search engine developers tend to think about these elements in some depth. Google's Infinte Monkeys approach of trying to spider everything and then throwing a bunch of meatbots at the results to bash them into shape doesn't work and it doesn't scale well.

Regards...jmcc

iammeiamfree




msg:4691364
 11:56 pm on Jul 28, 2014 (gmt 0)

I have an idea for a Version 1.0 to launch with.

Initially we could start with a very simple traffic exchange concept.

It would produce a set of 5 or 10 links on a page based on incoming traffic from those urls.

It woud need to be able to work out that the visitor was real. It could use server logs or its own visitor records.

The webmaster would be able to review new referers and deside if they should be included for which pages. By default all links would be nofollow but upon review the link could be set to follow.

Urls could be excluded where required.

New referrers could be invited to start sending traffic with a "your site here" link.

So you would install the script on your site and be able to login there and check the new referrers. A cron job would be set up to check for 404s and redirects etc. You could begin emailing sites telling them how it works so they could begin exchanging related traffic.

A piece of code would go on each page that you wish to add related links. The code would generate the links and possibly monitor referral traffic coming to that page.

Once this is set up and running effectively we can move onto version 1.1.

This seems like a good plan of action. Easy to get started. Quite a bit of power and something that should be an important part of the proper engine.

Basically there is a beginning to a database of urls.

The webmasters are getting involved and referring traffic.

More traffic flowing between sites. Less back button clicks. More overall visitor activity in our network.

Growth underway of a user base.

A firm foundation to move forward from.

webcentric




msg:4691387
 3:09 am on Jul 29, 2014 (gmt 0)

There's a lot of talk in the thread of how things should be done after the search engine is built but very little discussion on how to create an index or build the search engine


Spent today digging deeply into inverted indexes, forward indexes etc. and, while search engine design is not a simple subject, I'm of a belief that it doesn't have to be as complicated as the scale and scope inherent in the major engines dictates.

Full-text indexing and maintaining a large, constantly changing inverted index can presents a host of challenges but scaling down and being rather selective in what you index could somewhat minimize the downsides associated with rebuild the index when the underlying data changes. Small engines could do this in a matter of minutes during off hours without causing too much disruption to the end-user experience. I could see this happening once a week or less if you're really hand-picking your data.

If you're crawling billions of pages like Google, then you pretty much need to be constantly maintaining the index and it's gonna take a lot of time and server resources to do in. In a smaller context, updates can be batched during off hours to greatly minimize the impact on the end user side. So, my first point is about scale. Engineering the next Google is one thing, engineering a smaller niche engine based on a selective body of content, is another. Both will benefit from careful engineering but, if quality is used as a scoping factor, you may be able to avoid some of the issues associated with mega-indexes. Who says you have to index everything to be useful and relevant? Certainly not me.

Do I have to be fair? No. It's my index. If I want to let people know about your content, lucky you. Enjoy the traffic.

I just built an inverted index for the King James Bible in less than 30 seconds. Translate this to a couple hundred thousand rows of page data (which would make a pretty decent library of information on a given subject) and we've got the foundation of a fairly manageable index.

This 99 message thread spans 4 pages: < < 99 ( 1 [2] 3 4 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved