Forum Moderators: open

Message Too Old, No Replies

Dupe content checker - 302's - Page Jacking - Meta Refreshes

You make the call.

         

Marcello

11:35 am on Sep 7, 2004 (gmt 0)

10+ Year Member



My site, lets call it: www.widget.com, has been in Google for over 5-years, steadily growing year by year to about 85,000 pages including forums and articles achieved, with a PageRank of 6 and 8287 backlinks in Google, No spam, No funny stuff, No special SEO techniques nothing.

Normally the site grows at a tempo of 200 to 500 pages a month indexed by Google and others ... but since about 1-week I noticed that my site was loosing about
5,000 to 10,000 pages a week in the Google Index.

At first I simply presumed that this was the unpredictable Google flux, until yesterday, the main index-page from www.widget.com disappeared completely our of the Google index.

The index-page was always in the top-3 position for our main topics, aka keywords.

I tried all the techniques to find my index page, such as: allinurl:, site:, direct link etc ... etc, but the index page has simply vanished from the Google index

As a last resource I took a special chunk of text, which can only belong to my index-page: "company name own name town postcode" (which is a sentence of 9
words), from my index page and searched for this in Google.

My index page did not show up, but instead 2 other pages from other sites showed up as having the this information on their page.

Lets call them:
www.foo1.net and www.foo2.net

Wanting to know what my "company text" was doing on those pages I clicked on:
www.foo1.com/mykeyword/www-widget-com.html
(with mykeyword being my site's main topic)

The page could not load and the message:
"The page cannot be displayed"
was displayed in my browser window

Still wanting to know what was going on, I clicked " Cached" on the Google serps ... AND YES ... there was my index-page as fresh as it could be, updated only yesterday by Google himself (I have a daily date on the page).

Thinking that foo was using a 301 or 302 redirect, I used the "Check Headers Tool" from
webmasterworld only to get a code 200 for my index-page on this other site.

So, foo is using a Meta-redirect ... very fast I made a little robot in perl using LWP and adding a little code that would recognized any kind of redirect.

Fetched the page, but again got a code 200 with no redirects at all.

Thinking the site of foo was up again I tried again to load the page and foo's page with IE, netscape and Opera but always got:
"The page cannot be displayed"

Tried it a couple of times with the same result: LWP can fetch the page but browsers can not load any of the pages from foo's site.

Wanting to know more I typed in Google:
"site:www.foo1.com"
to get a huge load of pages listed, all constructed in the same way, such as:
www.foo1.com/some-important-keyword/www-some-good-site-com.html

Also I found some more of my own best ranking pages in this list and after checking the Google index all of those pages from my site has disappeared from the Google index.

None of all the pages found using "site:www.foo1.com" can be loaded with a browser but they can all be fetched with LWP and all of those pages are cached in their original form in the Google-Cache under the Cache-Link of foo

I have send an email to Google about this and am still waiting for a responds.

kaled

8:34 pm on Sep 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Plumsauce,

My knowledge of http is about as thin as a cigarette paper so I am perfectly happy to accept that I am wrong. My logic is that servers simply chuck out text responses to requests in headers and that if the responses of two servers are essentially the same (as they should be) then Googlebot's response will be the same whether the server is running Apache, Windows or whatever.

However, I guess you are working on the "garbage in, garbage out" principle. If the responses of some servers are outside the norm, then Googlebot may become confused. However, I do not see this as a robot problem (though it might be I suppose) I see it as an indexation problem.

So why am I prepared to stick my head above the parapet and say it is an indexation bug? Well, first reports (and some email responses from Google) suggest that the redirection trick works best if you have a PR advantage. Other than for scheduling, I see no reason for robots to have any knowledge of PR - therefore the problem must lie in indexation. If it lies in indexation, in order for the problem to be caused by server responses, those responses must be recorded in detail and processed by the indexation service. Whilst this is possible, I would say it is more likely that the robots record only simplified versions of response headers.

Kaled.

plumsauce

9:04 pm on Sep 24, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Kaled,

You are definitely in the right neighbourhood.

Consider if you will that search engines are not one monolith, but rather a suite of processes implemented in multiple programs.

Consider also that these programs are implemented by individual teams or programmers.

Consider finally that these programmers depend upon informal and formal knowledge of the output of the programs upstream from their own program and the expectations with respect to the outputs from their own program.

For gigo to happen, requires only one mistake anywhere in the chain. The results of the observation points available so far suggest a boundary condition trap that I as a programmer could easily fall into. The reasoning then becomes: if one programmer could fall into the trap, then why not another?

On a more general note, I have received a few more urls by sticky, but as usual, could use more. I need both the victim and donor url's. If someone is reluctant to release the victim url, then the url of the external page is fine. That way, if it is a page containing multiple outbounds, I have no way of identifying the victim site.

boredguru

12:37 am on Sep 25, 2004 (gmt 0)

10+ Year Member



Hi
I use xoops. And after reading this thread i have understood the effect of the linking strategy that comes with it. It affects me more because its my users usually who post the link and its usually to their own site or sites they like. We totally have only 31 links as i never was into linking games and strategy. And the links that have been put there are because of their value to the users. I hate the word reciprocal. (My policy was always if i like what you show then you can have my link anytime and vice versa)

My site is in my profile. Just visit the links section. Any technical assistance on how to cirumvent this problem is appreciated.

After doing a complete check I found lots of things. For example a redirect page on mky site has 16500 backlinks. But luckily the site it is linking to is a behemoth with a PR9 or 10. So good enough. And my page also has not been removed because of duplicate content.

digtoo

3:08 am on Sep 25, 2004 (gmt 0)

10+ Year Member



Yes, I have experienced this problem with Google.

So not-my-page meta refreshes to my-page, and my-page is dropped, but not-my-page remains in Google search results, with my-page content but the link is not-my-page.

So I email Google and they say to do a 301 redirect, that they do not manually adjust search results, but I say that I cannot 301 redirect, not-my-page is well not mine.

End result? Nothing.

old_expat

3:00 am on Sep 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Are you sick of me yet?

WebDude "

Not even! Hang tough, Pard!

quotations

4:57 am on Sep 26, 2004 (gmt 0)

10+ Year Member



So I email Google and they say to do a 301 redirect, that they do not manually adjust search results, but I say that I cannot 301 redirect, not-my-page is well not mine.

I am beginning to think that webmaster@google.com needs to be outsourced to India.

The people answering it lately apparently have not the faintest bit of understanding of what kind of troubles they are causing.

The lack of systems engineering talent at that address is criminal.

DaveAtIFG

5:40 am on Sep 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Re: messages #206 and #164

First, I made a mistake in #206, the page with the meta refresh to the test page returns a 200 response. Next, I was wrong about cloaking in #164.

The test pages I redirected have all been spidered and cached. I redirected three existing pages listed in Google, using a 301, a 302, and a meta refresh, to newly created test pages. In ALL cases, the cache now contains the new test pages.

I expected the 301 redirect to update the cache. I expected the meta refresh to update the cache but I expected it to take longer. I did NOT expect the 302 to update the cache.

This testing was all done within one domain. The server header for this server displays "Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) PHP/4.3.0." Also, the test site has a dedicated IP and this host uses VPS/VDS (Virtual Private/Dedicated Server) technology. All redirects used fully qualified domain names and paths, absolute addresses.

It looks to me as if any site can hijack any site with any kind of redirect, presently. Let the games begin! :) Page rank may still limit who can hijack who but I'm uncertain how to test that aspect.

webdude, are you seeing the same results I'm seeing? After you confirm my results, I'll try substituting a few pages at a different domain for these test pages.

[edited by: DaveAtIFG at 6:26 am (utc) on Sep. 26, 2004]

Marcello

6:13 am on Sep 26, 2004 (gmt 0)

10+ Year Member



DaveAtIFG
Thanks for the test

When you say:
"the cache now contains the new test pages"
are they listed under the "redirecting URL" or under the "redirected to URL"?

Secondly, did you add a "Robot Tag" such as:
<meta name="robots" content="follow, noindex">
See message #2 on page 1?

DaveAtIFG

6:32 am on Sep 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good question Marcello, sorry! The 302 and the meta refresh pages are listed under the redirecting URL. I missed it in my earlier post but the URL for the 301 test page has replaced the redirecting URL. It appears a 301 is still NOT a potential hijack tool.

Both the redirecting pages and the target pages contain:
<meta name="robots" content="index, follow">

Maia

6:39 am on Sep 26, 2004 (gmt 0)

10+ Year Member



It looks to me as if any site can hijack any site with a 302 or a meta refresh, presently.

Did you catch that, rustybrick?

I was wondering if I jumped the gun when I asked the other sites who used a 302 (but no meta refresh) to remove my listings. It's a bit strange to go from asking for links to begging people to please, please remove them!

Both the sites that hijacked me used meta refresh, but the first site returns a 302 and the second site returns a 200.

Attn: Plumsauce-

The second site has Apache 1.3x in the server header, but the first site just says Apache. If that means anything to what you are researching, here.

My situation to date is: the first site still remains in the SERPS when you do a search on my domain name.

The second site did not reply to my request to remove me from their directory. Their directory has to do with travel accommodations. I have sent them a more stern email. I also found out that the guy who owns the site also owns a company that offers SEO and web design. He also designed the site of his hosting company, so I doubt I'm going to get anywhere by complaining to them about it.

That site now appears along with mine in the SERPS with my title and snippet, cached page and their URL. I'm just waiting for Google to drop my index page again.

Any new developments for you, Webdude?

plumsauce

6:42 am on Sep 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




DaveAtIFG,

point of clarification,

were all of these redirect cases interdomain?

DaveAtIFG

6:58 am on Sep 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Apparently, you overlooked this plumsauce.
This testing was all done within one domain.

I'm uncertain what you mean by "interdomain," to me that means between different domains...?

In any event, both the redirected pages and the targeted pages ALL reside within the same domain.

<added>
I'm now aware of two sites that are using 302 redirects and appear to be hijacking, one is using Apache 1.3.31, the other is 1.3.26.

I emailed GoogleGuy suggesting he review this thread and offering my test data. Based on past responses, I don't expect to learn anything specific from him. Things have changed at Google, IPO et al...
</added>

dirkz

8:21 am on Sep 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> It looks to me as if any site can hijack any site with any kind of redirect, presently. Let the games begin! :) Page rank may still limit who can hijack who but I'm uncertain how to test that aspect.

Hehe.

The Page Rank thing explains why I wasn't able to hijack google.com.

Marcello

8:58 am on Sep 26, 2004 (gmt 0)

10+ Year Member



"Page rank may still limit who can hijack"

I my case a PR3 page with a meta-refresh hijacked my PR6 Index-Page which is still nowhere to be found in the Google-Index.

Since this happened, this hijacking page is still moving up in the SERPS, in spite that Google has deleted the Cache of this page due to my DMCA-complaint. (with as only result that the proof is gone but nothing else has changed)

For any kind of text-snipped from the content of my disappeared Index-Page and keywords, foo.com is still at the top of the results.

Further consequences:
My Index-Page at widget.com contains 4 outgoing links to high-ranking sites about the same topic and field.
These links are now showing up as coming from foo.com/widget-com.html instead of from my page at widget.com

Using "link:other-site.com"
the result gives 4373 backlinks with foo.com/widget-com.html in the top position, no sign of my widget.com/ which is the real page that contains a link to other-site.com.

cmendla

2:21 pm on Sep 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I admit to not being an expert on this but I think I've found a hijacking attempt against my sites. the one site has decent pr but it only gets about 200 visitors a day.. So it's not a big fat target..

the text below is what I sent to google's spam report.

The message to google included the real sitenames. I substituted HIJACKER for the domain name adn Widgets for the subject item here.

=================================

I would like to report a page that is showing a PR of 3 on the google toolbar. I have reason to believe that they are using a method of pagerank hijacking.

I did a search in Yahoo to see who was linking to my site

The yahoo search was Link:http://www.mysite.com -site:http://www.mysite.com

I noticed a suspicious entry of “You Search : " WIDGETS", " widgets". Search more " widgets". Find about " widgets".” which went to a Russian site

Following that link took me to [HIJACKER.ru...]

The top of that page (which is unranked at this point as far as PR ) shows all sex links. However, closer to the bottom, you see links to my site in the format of

www.mysite.com: widgets

Clicking on the link they give just takes you to another of their results pages.

I realize that the initial search here was in Yahoo and I am writing to google.

However,

1.I believe that the [HIJACKER.ru...] site is trying to hijack pagerank through redirects
2.this hijacking has been widely reported in webmasterworld.com and #*$!.com
3.I see no logical reason that a page with results about sex toys, etc should have results about widgets mixed in.

I can provide a formatted copy of this in word if you want or you can contact me at

(i gave them my email and phone)

========== end message i sent

The search engines need to fix this real soon ..

This 389 message thread spans 26 pages: 389