Forum Moderators: open

Message Too Old, No Replies

unusual

         

wilderness

9:52 am on Mar 15, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Anybody have a clue?
The first one I simply accepted as trash.
The second one makes me wonder, although CC has many open proxies.

I've broken the link in the UA.
This page is slightly relative (at least in name) to an active news topic.

112.119.106.zz - - [15/Mar/2012:00:52:38 +0000] "HEAD /MyFolder/MySub/MyPage.html HTTP/1.1" 403 - "http:// googlenewssubmit. com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; Media Center PC 6.0; InfoPath.2; MS-RTC LM 8)"

173.12.249.zzz - - [15/Mar/2012:06:28:45 +0000] "HEAD /SameFolder/SameSub/SamePage.html HTTP/1.0" 200 - "http:// googlenewssubmit. com/how-can-it-help-me/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.2; OfficeLiveConnector.1.4; OfficeLivePatch.1.3; yie8)"

DeeCee

4:03 pm on Mar 15, 2012 (gmt 0)

10+ Year Member



Googlenewssubmit is a commercial press release company, promising they can get you to the top of Google News using various (invalid?) methods.

They are essentially advertising themselves by sending referrer spam into your logs, hoping that you will check out their prices and buy their "services". In itself an "invalid method". :-)

In my categorizations they fall under "link_spammer".

enigma1

3:08 pm on Mar 23, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem is SEs now days pulling out URLs from the page content (not just html anchors) and unfortunately this one is no exception posting log records. Perhaps it's one reason they spam this way.

wilderness

3:19 pm on Mar 23, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



enigma1,
Could you expand?

Are the SE's pulling out URL's that are NOT embedded links?

enigma1

4:55 pm on Mar 23, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are rumors they do. There are several threads implying this and at least from my logs I see weird accesses.

This post talks about googlebot trying to interpret js but what's important to note is that it parses content and "interprets it".
[webmasterworld.com...]
[webmasterworld.com...]
and various other threads I cannot recall right now.

lucy24

9:36 pm on Mar 23, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are the SE's pulling out URL's that are NOT embedded links?

Yes, there are mountains of them in gwt error pages-- sometimes even when there is a link wrapped around the damaged text.

Concrete example under "not found":

hovercraft/h..

That's quoted verbatim, dots and all. If you follow the "linked from" links you arrive eventually at a page with the world's spammiest meta tags and a list of urls, including--

Wait, I've got to do some more verbatim quoting under the vague head of "With friends like these..."

<td width="580">
<div class="msnresult">
<div style="margin-bottom:5px; padding-left: 8px;">
<a href="http://www.example.com/{directory}/{filename}.html" target="_blank" class="msneresult" rel="nofollow">{page title} - {my domain name}</a>
<div class="msnresultcnt">
{text of my meta description}</div><span class="msnresulturl">http://www.example.com/hovercraft/h...</span></div></div></td>


Notice (a) the teeny-weeny detail that the "not found" version snips off one more dot-- truncated urls on the page always have three-- and (b) they seem to have decided that "nofollow" doesn't count on this page. The dot-snipping doesn't kick in after a fixed number of characters, though it may be some physical lenghth in pixels. I ran out of interest at this point ;)

What's notable is that the "real" link, unsnipped, is only two lines away. But that one doesn't count as a link. (I checked a different gwt page.)

Someone, somewhere, programmed a computer to make these decisions.