Forum Moderators: phranque

Message Too Old, No Replies

Mining world knowledge for analysis of search engine content

My research on search engines

         

JohnKing1

1:47 am on May 11, 2021 (gmt 0)

10+ Year Member



Here is a journal paper that I co-authored with Yuefeng Li, Xiaohui Tao and Richi Nayak. It was published in the Web Intelligence and Agent Systems Journal. It has received sixty-two citations including two Highly Influential Citations. For the full list of citations see [semanticscholar.org ]. You can download the PDF paper for free at [eprints.qut.edu.au ].
I think that this paper will be very useful to the members of this forum.

JohnKing1

12:33 am on May 12, 2021 (gmt 0)

10+ Year Member



My intuition is that it is better to target keywords related to subjects that are not heavily covered in search engines like Google as there is less competition and it is easier to rise higher in the rankings than subjects which have less coverage. However this is just my idea and it will need more testing.
I am sure that Google is automatically classifying webpages based on subject and that pages with information highly related to a subject are ranked higher when compared to subjects with more diverse subject matter. It is fairly easy to extract all words related to a subject and use them in a webpage and thus appear more relevant to Google. I think that webmasters should be careful to make all the subject matter of a webpage be highly related to a single subject and this will lead it to be an authority of a subject. Pages with more diverse subject matter will not be ranked as highly as pages with pages highly related to a single subject.

JohnKing1

12:51 am on May 12, 2021 (gmt 0)

10+ Year Member



Another insight from the paper is that subject based SEO is going to become more and more important over time. The more subject-centric a webpage is the more likely it is to be treated as an authority or a hub. Google and other search engine are automatically classifying webpages based on subject matter and if you want your page to rank highly then you should make the page as highly subject related as possible. This means removing all terms that are not highly related to the subject and only using terms that occur in one and only one subject.

NickMNS

12:59 am on May 12, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My intuition is that it is better to target keywords related to subjects that are not heavily covered in search engines like Google as there is less competition and it is easier to rise higher in the rankings than subjects which have less coverage.

This is true but you are not considering an important fact, if competition is low it is likely that the search volume is low and thus writing content to specifically target such search terms will not produce enough traffic to generate sufficient revenue to compensate for you cost of producing/publishing the content. That is the most likely reason that competition is low for those search terms.

Also you appear to be approaching this from the paradigm of the "keyword". The concept of "keyword" is fading and in my opinion is long dead. Google's introduced algo's such as Bert and Hummingbird, that now use NLP and other advanced concept to derive meaning from pages far beyond a direct relation with a keyword.

It is fairly easy to extract all words related to a subject and use them in a webpage and thus appear more relevant to Google.

This what spam sites are made of. This technique may have worked in 1995 but it certainly wont work today.Whether you use the word "car", "automobile" or "vehicle" Google doesn't care their algo is smart enough to know that they all refer to the same concept. By the same token if you use a phrase like "RNA is used as the vehicle to deliver vaccine to the cells", Google's algo is also smart enough to know that in that context "vehicle" is no longer the same a "car".

I think that webmasters should be careful to make all the subject matter of a webpage be highly related to a single subject and this will lead it to be an authority of a subject. Pages with more diverse subject matter will not be ranked as highly as pages with pages highly related to a single subject.

I don't agree with this claim, it depends on the topic and most of all the searchers intent, a searcher maybe looking for a broad view of some concept and thus would be best served by a pages that provides a more general explanation, or the search could be researching some specific topic. As publisher you need to know your target audience and publish content to their needs.

JohnKing1

1:33 am on May 12, 2021 (gmt 0)

10+ Year Member



Yes I was just saying that it was my intuition that low use subjects may do better, it just depends on the search engine an how their algorithms work,

It doesn't matter if you use keywords, phrases, sentences or documents, subject based SEO is important. The more highly related a webpage or site is related to a subject the more chance it has of doing well in the rankings. That is why WIkipedia does well in the rankings, most of their pages are highly related to a single subject. If you do a search for any sort of subject matter, it is highly likely that the top results in Google and other search engines are related to that subject.

Have you even read my paper? I know that it has a number of contributions to SEO that you will be interested in.

NickMNS

2:06 am on May 12, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I haven't read your paper, and I'm not convinced that I should.

The more subject-centric a webpage is the more likely it is to be treated as an authority or a hub.

There is lot more to being the "authority" on a topic besides publishing a webpage that is only focused on one specific topic. The most obvious is accuracy. But there are many others. A good place to start is the "Google search quality raters guideline" document that Google publishes, it provides an in depth explanation as what indicator are used by Google to determine "authority". Google refers to it as EAT "Expertise, Authoritativeness, Trustworthiness".

Here is a thread from September 2019 about the "Google search quality raters guideline", there is likely an even more recent version.
[webmasterworld.com...]

JohnKing1

2:26 am on May 12, 2021 (gmt 0)

10+ Year Member



Here is the abstract of the paper:

Little is known about the content of the major search engines. We present an automatic learning method which trains an ontology with world knowledge of hundreds of different subjects in a three-level taxonomy covering all the documents offered in our university library. We then mine this ontology to find important classification rules, and then use these rules to perform an extensive analysis of the content of the largest general purpose internet search engines in use today. Instead of representing documents and collections as a set of terms, we represent them as a set of subjects, which is a highly efficient representation, leading to a more robust representation of information and a decrease of synonymy.

Also you might want to do some research on my co-authors Professor Yuefeng Li, Associate Professor Xiaohui Tao and Professor Richi Nayak. They are leaders in their fields.

lammert

5:05 am on May 12, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For those who don't have the time to read the paper, here is a complete summary:
Published in 2007
Table 1: The search engines used in this paper: Altavista, AOL, Ask Jeeves, Google, MSN Search, Teoma, Wisenut, Yahoo Search
Leaders in the SEO field are not defined by the amount of citations they receive from peers residing in the same heavily subsidized academic bubble. Instead, leaders in the SEO field are defined by the millions they rake in on a continuous base for themselves and their clients through their expertise.

@JohnKing1, if you want to learn the basics of SEO in 2021, build your own site and submit it to the review forum in the supporters section of this board. There we can help you out with the basics to get your site decent search engine positions.

JohnKing1

5:23 am on May 12, 2021 (gmt 0)

10+ Year Member



So you are saying that automatic classification, data mining and large scale ontologies have no use in the world wide web in 2021? Why is it then that Google and Facebook and many other leading big-tech companies are so heavily investing in all of these? I am just saying that one of the findings of the paper could be that subject-based SEO is of use when considering how to get a site higher in Google and other search engines.

lammert

5:41 am on May 12, 2021 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I am not saying the techniques are useless. I am saying that a paper published 15 years ago before the first iPhone was shipped, is useless in a mobile-first search world where seven of the eight search engines on which the paper was based ceased operations.

JohnKing1

6:03 am on May 12, 2021 (gmt 0)

10+ Year Member



Subject-based SEO is still very relevant. And I am sure that many of the members of this forum use it without realising it. The more highly subject specific a webpage and website is, the more likely a search engine will judge that it is relevant to that subject. Note that I am not ruling other factors such as text quality measures and other off page measurements. But you can be certain that Google is running subject-based automatic classification methods on all of your websites. And Google is only going to further increase their use of these methods. The method from my paper has the capability to classify documents and collections using over ten thousand subjects. And I can easily increase the number of subjects that I can classify with.
If you want to see the full results from my work search for Search Engine Content Analysis in Google using my name.

SumGuy

11:57 pm on May 18, 2021 (gmt 0)

5+ Year Member Top Contributors Of The Month



My searches probably consist of 2 to 6 carefully-chosen words and I'm looking for a hit where all the words are strung closely together. I tend to find useful results in links pointing to discussion forums. Discussion forums can contain a huge amount of information, and the idea of pages or pagination is, I would think, not useful in that context.

By the way, does your research touch on the issue of left/right (liberal or marxist vs conservative) topics and the slant that a search engine may have (or is proven to have) in serving up search results?

JohnKing1

2:41 am on May 19, 2021 (gmt 0)

10+ Year Member



Yes there are certain terms and phrases that probably only ever appear in documents promoting or disparaging something(i.e. promoting communism or promoting Marxism). For example there may be an book author who only ever promotes communism. If that author's name is found on a webpage/document it probably means that there is more information on that page that promotes communism. The idea that I use is that if a term or phrase occurs in one and only one subject then it is possible to use that term or phrase to automatically classify that document(or search engine). I am guessing that Google uses automatic classification with many millions of subjects to automatically classify web sites and pages. If you are able to convince Google that you are an authority on a certain highly-specific subject then your page may rank higher in Google for that subject. To do this you need to use highly subject-specific terms and phrases throughout your page. Also all the terms or phrases that link to your page need to include subject specific terms or phrases and use other SEO methods that keep subjects in mind. Simply entering subject-specific terms into a search engine will tell you a lot about how the search engine treats a subject.
I was also able to detect that certain search engines were censoring their results which was an interesting result.
It was also possible to detect if a search engine is for or against things like atheism and other world views. I would be very interested at running sentiment analysis on the results of a search from the search engines to see more about what was actually happening inside them.