Forum Moderators: open

Message Too Old, No Replies

Fabricating logfiles.

fabricating logfiles

         

Ariel

8:22 pm on Oct 6, 2002 (gmt 0)

10+ Year Member



Who's has had succes fabricating logfiles just for SE's?

Lisa

9:33 pm on Oct 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Would you please explain your question in more detail. I am not sure what you mean.

Ariel

11:40 pm on Oct 6, 2002 (gmt 0)

10+ Year Member



Knowing that SE's spider log files, what is to prevent a person from creating his own fictitous, highly exaggerated log files just for the spider bots?

bird

11:46 pm on Oct 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why should it matter to a spider what kind of numbers you write into such a file? Search engines index the words on a web site. They don't interpret their meaning.

Unless I still don't understand where you're heading with this, of course... ;)

Ariel

4:23 am on Oct 7, 2002 (gmt 0)

10+ Year Member



Well, let's see what kind of information is in a log file. How about referring pages AND key words used if from a search engine.

Or, what about the number of visitors to each web page? Or even the most popular paths through the site or the time spent at a web page.

If google, for example wants the most popular pages to have a higher ranking, these details would certainly have some weight, yet I have not found anyone addressing this.

Now, site popularity, and link relativity can be trully established IF weblogs are spidered. And if they are, why not tailor web logs just the way we want them, with thousands of fictitous refers, keywords and the time spent.

?

Mark_A

6:22 am on Oct 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ariel to my knowledge SE's do not read log files unless of course a webmaster puts them into a public non protected area and links to them from a page which spiders are reading.

BTW: Log files in this case being the access log data i.e. lines of information in a text format written by the server to record access activity of users on a site or server.

I have no idea where you are going with this but .. if you really expect there is any value in people falsifying their own logs and getting them read by a search engine ..

Who would you want to find and visit them from that search engine?

:-)

gmoney

6:38 am on Oct 7, 2002 (gmt 0)

10+ Year Member



Ariel,

I don’t think search engines have access to log files. If they did, then I would pretend I was a spider and do some serious study of log file data. I really like log file data.

Maybe search engines keep track of how many times your pages showed up on their SERPs and maybe they even track when you get clicked. However, as you hinted above, it is possible to fabricate these types of results so I don’t think it would be given much weight towards relevancy

chiyo

7:12 am on Oct 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Actually log reports can be pretty good spider food! Google loved our Analog text based reports when they were in the public domain. Lots of relevant keywords, and links to all popular pages. We took it off after a while. For some reason bad neighbourhoods including p*rn sites wre in the "referrers" and that would seem to google like we were linking to them! We often get stratge referral url's that have dont actually link to us, and have no relevance - think its got seomthing to do with multiple windows open..

However, why fabricate them? To index them Google and others would have to find a link to the log report page, and you dont want your readers seeing fabricated info do you? So much for the credibility of your web site, and the practice itself is a form of "deceiving search engines" which Google states is a quick way of getting penalties. Not only are you deceiving search engines but also deceiving your readers. Why not just make a nice index or web map instead?

Ariel

2:41 pm on Oct 7, 2002 (gmt 0)

10+ Year Member



Thank you for the input. Perhaps those claiming SE's spider their logs, have their logs in a public place, or a link to them.

Is it possible to view the source code of a Google search result page?

Do SE's track hits to these links in their databases? And do they keep track of the referring page?

bird

3:06 pm on Oct 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you for the input. Perhaps those claiming SE's spider their logs, have their logs in a public place, or a link to them.

At best, a published log file might have a similar effect as a site map as far as the internal links are concerned. If that is what you're heading for, why not just do a real site map that also benefits your users?

The other side are the external links, which may place you in a bad neighbourhood if left unchecked. Of course, if you want to specifically link to certain sites, creating a honest links page would be the obvious solution. Again, this will have the same effect for the engines, and benefits your human visitors a lot more than a "fabricated logfile".

I think the reason why nobody is doing what you suggest is that at least the same results can be had with other and more useful tools, while avoiding some of the potential problems.

Ariel

3:53 pm on Oct 7, 2002 (gmt 0)

10+ Year Member



"I think the reason why nobody is doing what you suggest is that at least the same results can be had with other and more useful tools, while avoiding some of the potential problems. "

What about link popularity, or the amount of traffic your web page is getting coming from external links. If SE's track popularity by click throughs, and if they do that by web logs, then to fabricat the amount of traffic could be a very easy and very powerful thing to do.

No?

hutcheson

8:29 pm on Nov 8, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The SEs may "index" files that happen to contain logs, but they do not "READ" logfiles.

I believe this means that they (legitimate search engines, not spambots) will NOT recognize URLs that are not part of a hyperlink, so the mere presence of "my-favorite-somthing-site.com/something_index.hmtl(*)" in a file will NOT add page rank to that page, nor will it associate this forum with a bad neighborhood around that page.

All it will do is find this thread when someone is looking for 'super adult site in neighborhood with favorite page rank'.
(*)I had included the "http://" bit on that (fake) URL above, but it occurred to me that the forum software might turn that into a real hyperlink, which would have made my statement false. But that's the forum software, not the search engine spider.

[edited by: msgraph at 9:44 pm (utc) on Nov. 8, 2002]
[edit reason] changed url to something more friendly [/edit]

hutcheson

2:14 am on Nov 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm. I believe the edited post made my statement false anyway. Let me see if I can reconstruct it.

Suppose you have a log file that saves the "REFERER" field, and someone in a "bad neighborhood" (say, the index.htm file at the vicious-spam.com comain) links to you. Now your log file will contain the text string "vicious-spam.com/index.htm" -- but that is not a URL: it is merely a text string containing dots, dashes, letters, and slashes. The robot is only going to recognize a string as a URL if it begins with the magic word "http://" (which is the protocol) AND is formatted like a URL, or else if it appears in HTML context as a URL (say, in the HREF field of an A tag.)

Neither of which is true of your logfile.

So the 'bot will either just index the words ("vicious", "spam", "com", "index", and "htm") or it will completely ignore them because it's looking for a recognizable URL. And when someone searches for "vicious spam com index htm" -- the search engine results may include your logfile. But if someone searches for backlinks to their site, the results will NOT include your logfile.

rcjordan

2:34 am on Nov 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One of the reported benefits of log-spamming is the ability to skew linkpop if/when the 3rd-party files can be spidered. The scripts are not uncommon in the bad neighborhoods chiyo mentions. Think of this exploit as the ability to create hallway pages on someone else's website.

arlin

4:40 am on Nov 20, 2002 (gmt 0)



RCJordan writes: "Think of this exploit as the ability to create hallway pages on someone else's website."

Good point! And if that site is a good quality site, those fictitous hits going to your site would be great - no?

rcjordan

3:12 pm on Nov 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> those fictitous hits going to your site would be great - no?

Hi arlin, welcome to WebmasterWorld.

BTW, to be more precise, I should restate that last sentence as: Think of this exploit as the ability to create a link on a hallway page on someone else's website. These scripts generally just link back to a single page on the offending site.

As for the quality of the link, particular for those chasing PR, I'd think it would be very low. But think of the volume of links such a spider could create. And there are those who believe that even low-quality links add up in the algos.

c3oc3o

6:05 pm on Nov 20, 2002 (gmt 0)

10+ Year Member



Umm, my guess is that you misunderstood people saying search engines spider 'web logs'. They didn't mean access logs.

Weblogs, also called blogs, are a kind of online diaries that are updated daily, contain lots of links and are heavily inter-linked in cliques and groups. They also tend to link to similar pages (Person A sees the link on the site of friend B and decides to link to it on their own weblog, and so on). Blogs were in the past responsible for pretty much all so-called 'Google bombing', because they had the power to strongly influence page rank with that network of different pages, frequent updates, strong community and hundreds of links. Google has since tried to lessen their influence a bit.

msgraph

7:11 pm on Nov 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>>Weblogs, also called blogs, are a kind of online diaries that are updated daily

Web logs are access logs too, its those bloggers that messed everything up. :)