homepage Welcome to WebmasterWorld Guest from 54.197.110.151
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Google may still be using my content after I deleted it
aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4535327 posted 5:02 pm on Jan 11, 2013 (gmt 0)

This morning, while checking the logs for one of my sites, I noticed a referral from Google to a page that I removed from the site about two months ago. This was a glossary page that I removed from the server so that it returns a 404. I submitted it to Google's URL removal tool immediately after I deleted it, and it was gone from Google's index by the following day. So at this point it has been gone from the server and de-indexed from Google for about two months.

So this morning I was surprised to see a referral to it from Google in my logs, and decided to take a closer look. When I did a Google search for the same term, which was "widget definition", the definition from my deleted gloosary page was the first result. In fact, it was set apart from the results below it, apparently to get extra attention. (It might be part of the so-called "Knowledge Graph", but I'm not sure.) Of course the link it gave to the deleted glossary page returns a 404, since the page was deleted from the server about two months ago.

So Google Search is still showing content from a page that I deleted and got removed from the index about two months ago, as well as a bad link. Does anyone have an explanation for how this could happen?

 

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4535327 posted 9:41 pm on Jan 11, 2013 (gmt 0)

Cached version maybe? [support.google.com...]

Of course it might also not be that simple since it was a 'definition' and fact is not copyrightable, so maybe it's in their system as 'the definition' and they were citing you as a source for it?

Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4535327 posted 12:36 am on Jan 12, 2013 (gmt 0)

I still get an adsense impression every single day from a domain that nobody has owned for 3+ years! Always one impression, always around the same time of day, EVERY stinking day. No matter how hard you try to remove all traces of a page there is always a copy, somewhere.

- removed url via GWT
- contacted adsense, got no response
- contacted my host, no help at all
- traced the originating IP, 95% sure it's in Yahoo's range.

In the end I whitelisted my own sites in adsense and called it a day, there is no page under my control with that adsense unit on it and I obviously don't own the unregistered domain name that adsense says is sending it so... if you figure this out PLEASE come back and share details.

Str82u



 
Msg#: 4535327 posted 2:08 am on Jan 12, 2013 (gmt 0)

I've had it happen after 2 years of a dot net being forwarded to a dot com. Decided to play with something on the dot net figuring "Hey,, it's been two years, no external links to the dot net". The SERP for a term had both the dot com and dot net together at #1 and #2 and it showed the actual content from 2 years back with the old page name that only exists on the dot com now BUT the cache was the actual current content. Switched back to 301 the domain to dot com.

This occured about mid year 2012

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4535327 posted 11:32 am on Jan 12, 2013 (gmt 0)

TMS - Google doesn't show a cache link. It's not part of the main search results but is above them in the position that an ad normally would be. It says "Web definition" then shows the exact definition from the deleted glossary page followed by a link to it. When you click the link, you get a 404 error since the page no longer exists.
There is a small gap below it followed by the normal list of search results (not definitions).

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4535327 posted 11:45 am on Jan 12, 2013 (gmt 0)

It says "Web definition" then shows the exact definition from the deleted glossary page followed by a link to it.

Yeah, that sounds like my 'not so simple' guess ... lol

I would guess it's one of those glitches it'll take them a bit to figure out how to 'undo' and get a set definition that leads to a 404 page as a reference backed out of the system and have the system to 'redefine it' again from some other site, automatically, of course ... I'm sure someone could probably go in there by hand, but they definitely prefer scalable automation over anything along those lines, so I would guess it'll be a bit before it disappears.

Maybe throw a page up and catch some clicks for a bit?

Or, if you want to try to just be done with it and get it out yourself, maybe change the 404 to 410 and see what happens? It does seem like they would almost have to have a way to drop a reference to a 404 page written in to the definition system already and maybe it just takes a longer time than we'd think...

I does seem plausible it's 'stuck' though if the 'definition generation' system isn't 'hooked into' the removal tool, so maybe if you can get a 410 to gBot rather than a 'not found for some reason, could be temporary' message it'll drop out ... (I'm thinking it might be something along the lines of how much sooner a 410 is dropped from the regular results than a 404 is.)

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4535327 posted 12:18 pm on Jan 12, 2013 (gmt 0)

Actually, the more I think about it, the tougher I think it might be to get it out, because if I remember right in reading about the Knowledge Graph, they can't have duplicates, and if there's 'downline relationships' (for lack of a better way of phrasing) based on your definition, what happens to those if they change it or delete it?

I don't have the energy to think it through right now, but it seems like it might not be a 'super simple' thing to back something out or overwrite it once it's in there if it's actually coming from the Knowledge Graph info...

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4535327 posted 3:56 pm on Jan 12, 2013 (gmt 0)

Thanks Mad Scientist

I don't want to throw the page up again because I'm working on a glossary page for another site that includes some of the same terms. Glossary pages don't get much traffic in my experience, but I want whatever does come to go to the new site.

Anyway, a few minutes ago I clicked the feedback link at the bottom of the Google results page and reported it as a broken link. I've used the feedback link a couple of times before to report obvious hacked sites in the results and within a few weeks they were gone. So maybe someone at Google does read them and take action.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4535327 posted 7:54 pm on Jan 12, 2013 (gmt 0)

Isn't the Knowledge Graph basically just a database of content that Google has scraped from the web? If so, then even if an original source is deleted, the content could stay in the database and still be used by Google. Maybe that's what has happened in this case.

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4535327 posted 8:41 pm on Jan 12, 2013 (gmt 0)

Actually, it's technically based on a bunch of content someone else scraped from the web and/or people volunteered to input (Metaweb - Freebase), which Google bought, so they could continue scraping instead of letting someone else do it ... LOL

The 'core info' is also the base for some of what Bing does, which I find interesting ... Google owns it but let's Bing use it? I guess that's, cool in an 'odd, slightly eery' sort of way lol.

The knowledge base Metaweb built is called Freebase, and it’s still in operation today. It’s a collaborative database—technically, a semantic graph—that grows through the contributions of volunteers, who carefully specify the properties of each new entity and how it fits into existing knowledge categories.

While Freebase is now hosted by Google, it’s still open to submissions from anyone, and the information in it can be freely reused under a Creative Commons license. In fact, Microsoft uses Freebase to give its Bing search engine an understanding of entities, which is the same role now played by the Knowledge Graph at Google.

...

In a semantic graph, there are no rows and columns, only “nodes” and “edges,” that is, entities and relationships between them. Because it’s impossible to specify in advance what set of properties and relationships you might want to assign to a real-world entity (what’s known in database lingo as the “schema”), graph databases are far better than relational databases for representing practical knowledge.

From the Article Linked Here: [webmasterworld.com...]

Anyway, the point is, nothing 'goes in a straight line' with the Knowledge Graph (build on top of/from the Freebase DB) as far as I can tell ... The whole thing is about relationship mapping ... The easiest way I can think of to 'put it into an understandable picture' easily, is to say, 'Think along the lines of 6 degrees of separation, only with people, places, things, events, queries, etc. and how they relate to each other.' ... Pulling something out of the middle of that isn't likely to be as easy as it seems, in fact I would also guess to get something 'out' it probably has to be replaced by something 'essentially the same' to not cause issues once it's in there.

###

I do definitely think it's along the lines of what you're saying about something 'staying' after it's deleted ... I just thought I'd people give some more background on it since it's relatively new.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4535327 posted 11:10 pm on Jan 12, 2013 (gmt 0)

Mad Scientist - Thanks for taking the time to explain all of that.

Somehow my reaction to the whole idea is negative, because it seems to me that the best database is the web itself.

TheMadScientist

WebmasterWorld Senior Member themadscientist us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4535327 posted 12:01 am on Jan 13, 2013 (gmt 0)

NP, glad you got something out of it ... My reaction to how far they're going with everything is basically 'not excited', but it is what it is I guess and it's what we're going to have to deal with moving forward, much as I don't necessarily like it.

piatkow

WebmasterWorld Senior Member piatkow us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4535327 posted 3:15 pm on Jan 13, 2013 (gmt 0)

I remember a site that was shut down (run by somebody else) gradually drifted down the SERPS for several months before vanishing.

aristotle

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month



 
Msg#: 4535327 posted 4:05 pm on Jan 13, 2013 (gmt 0)

piatkow wrote-
I remember a site that was shut down (run by somebody else) gradually drifted down the SERPS for several months before vanishing.


When you use the Google URL removal tool, as I did for this glossary page, it's supposed to disappear from the results in a day or so. In fact it did disappear from the "normal" search results within a day. But part of it is apparently still in the system somewhere, maybe in the Knowledge Graph, and still shows up in searches for definitions.

So evidently the URL removal tool only removes a page from the "normal" search results, but not necessarily from all types of searches.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved