Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google indexing large volumes of (unlinked?) dynamic pages

         

Receptional Andy

8:48 pm on Oct 28, 2007 (gmt 0)



Here's an odd one for a small site (around 300 pages) with medium pagerank.

In the last week or so Google has indexed a succession of URLs that appear to be unlinked from anywhere. These are in two categories:

- Search result pages

Google is up to 2,130 of these. They are all single word searches for words that do actually appear somewhere on the site. The search itself is simple and does not link to any search results other than next/previous pages.

- Results for an online tool

This involves a user-entered URL (using GET). I've tracked down a few hundred of these that Google has requested, for a bizarre mix of URLs, from massive sites to individual blog posts.

I'm only at the start of my detective work for this (I'm going to grab all of the search keywords indexed and the URLs checked and see if that throws up any clues, and do a bit more in-depth log analysis). I can't find any links to any of the pages indexed on Google or Yahoo.

Here's my initial speculations:

- Someone may be linking to these pages deliberately, perhaps with a bit of noindex/follow . Would seem to be a bit pointless.

- Google might be indexing the pages based solely on the toolbar or another mechanism

- These pages have either been indexed for some time, or have built up over time. It is some change at Google that has made them visible now. This would also explain why the two very different types of page both suffer from the same problem now.

- I've screwed something up so that the pages are being linked to from the site, via some misbehaving script.

I can easily block the content from search engines, but for now I'm interested in tracking down the source, and I may as well see what the effect of thousands of junk pages on the site's performance is! ;)

Anyone have any suggestions as to what may have happened here?

One aside: Google really seems to likes to make troubleshooting difficult these days. The amount of hacking around just to get a complete list of indexed pages is starting to be an annoyance!

tedster

3:23 pm on Feb 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a new related report:

Google Indexed My Site Search Results Pages [webmasterworld.com]

Receptional Andy

4:30 pm on Feb 26, 2008 (gmt 0)



There was another two over here [webmasterworld.com] and here [webmasterworld.com] too. My investigations are ongoing!

[edited by: Receptional_Andy at 4:40 pm (utc) on Feb. 26, 2008]

pageoneresults

4:58 pm on Feb 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see a common denominator amongst a couple of these; WordPress

Is there some sort of Plug-in or Widget that takes search queries and generates pages on the fly for the search engines?

Or, is there a flaw in the "search" script used? Is it possible someone found a way to run a bot against that search script and generate thousands, tens of thousands and millions of pages? All from WordPress and, other similar CMS platforms? I don't mean to pick on WP, that one just happens to be the one mentioned in the above referenced topics. This could be happening with any mass produced software.

Receptional Andy

5:02 pm on Feb 26, 2008 (gmt 0)



I've counted at least four different search mechanisms so far. Two I can absolutely confirm are not related to wordpress. They're even different technologies (one ASP, one PHP, one CGI script).

LunaC

5:31 pm on Feb 26, 2008 (gmt 0)

10+ Year Member



I've been seeing Googlebot going through search for a while with one of my sites as well. It's always for a single word that does exist somewhere on my site, but often it's something that is extremely unlikely that a human visitor would search for (ie. footer, red, etc.. always one word).

Going though the search logs (my sites search logs that is as well as server logs), I can't find where those terms have ever been searched other than by Googlebot.

I'd always had a noindex meta tag on those pages just in case this happened, I'd allowed it to follow the links if it wanted.

Sadly I had a rather bumpy emergency move to another host last week with this site and have had to change the url of the search so I no longer can see if it's still being crawled. The new url is not yet. (The new host only allows cgi scripts inside the cgi-bin).

I use the same search script on 2 other sites currrently and have never seen Googlebot do this on the others.

I ran a few link checkers through and nowhere am I linking to any search results. It is a bit odd.

Here's an old post about it: [webmasterworld.com...]

FWIW, mine isn't using Wordpress (anywhere on the site), it's FDSE.

[edited by: LunaC at 5:35 pm (utc) on Feb. 26, 2008]

pageoneresults

5:46 pm on Feb 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Let me ask, doesn't the passing of the query in the URI string present challenges to begin with? I mean, doesn't that expose a vulnerability for indexing? Bear with me hear, I'd like to hear more about this. We typically don't pass anything in the URI string at the public level. Behind a login is something different and even then, we're careful not to expose too much of the query routines in the URI.

Google have become much more efficient at indexing URI strings with queries in them. How they are generating them is the main question I think. If there are no references to those on the site itself, then I would conclude that there are references somewhere. I don't think Googlebot is smart enough to latch on to that initial query and start generating a list of keywords, no, not yet anyway. Something caused that to happen. For example, a set of cloaked pages hit the net with references to all of those single word queries. They were taken down immediately after indexing. You'd never find them but the bots got them and are now calculating them into the equation, temporarily.

Or, those references are sitting on a parked domain somewhere in the bowels of the Internet and they cannot be easily found. One way or the other, something is causing the effect. Is it your own internal search script that maybe has a flaw? Or, are there other external factors at play that you cannot see due to the technology involved?

Receptional Andy

7:49 pm on Feb 26, 2008 (gmt 0)



Let me ask, doesn't the passing of the query in the URI string present challenges to begin with?

it depends on how you look at it. It is certainly the correct approach for search forms and many other types of content. As the W3C say in URIs, Addressability, and the use of HTTP GET and POST [w3.org]:

Use GET if:
- The interaction is more like a question (i.e., it is a safe operation such as a query, read operation, or lookup).

Use POST if:
- The interaction is more like an order, or
- The interaction changes the state of the resource in a way that the user would perceive (e.g., a subscription to a service), or
- The user be held accountable for the results of the interaction.

Using POST simply to prevent indexing would be counterproductive IMO.

I agree that the references (if they exist) are likely to be hard to find, but they must be available somewhere. To trigger spidering of half a million URLs requires a bit more than a single page. I would imagine that at least one source would crop up somewhere - in logfiles, webmaster tools, yahoo link data etc.

Certainly, in the two examples I've been able to look at in detail, there is no common pattern other than that search results use GET, which frankly is the right approach. I can find no other 'vulnerability'.

The over-riding lesson is to block bots from these types of forms by default. Unfortunately, this is by no means desirable in all cases, since I have a least one form being 'spammed' (if that is the word) where, within reason, spidering is fine, since the data returned is valuable and unique.

A quick summary of what I've gathered from the small number of cases I've seen:

- Search forms use GET
- Search result pages are spiderable
- Google spiders results for single words taken from words present on the site somewhere (this necessitates spidering of the whole site, or at least most of it, to occur)
- I saw numbers of pages spidered go from a few hundred to tens of thousands and upwards over the course of a few months
- If links to the content exist, they are deliberately hidden in some way, and visitors have not followed the links
- The activity is not limited to search forms, although these are most common and so the most prominent example
- I've not (yet!) seen any spidering of URLs without forms with additional parameters, even those that accept query strings to change output
- No other spider has requested these URLs

[edited by: Receptional_Andy at 7:49 pm (utc) on Feb. 26, 2008]

bouncybunny

11:20 pm on Feb 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Receptional Andy, I'll continue this discussion here, instead of my essentially similar topic elsewhere [webmasterworld.com...]

I may have missed it, but what kind of topics are covered by your dynamic URLs?

I know that my site knowledgbase is linked to by at least one academic database source that acts almost as a kind of scraper. It's not a true scraper, in that it only takes urls and then returns them as search results for it's own users and points users directly back to my site.

I have yet to find out if this has any bearing on this though.

Receptional Andy

11:36 pm on Feb 26, 2008 (gmt 0)



Hi bouncybunny,

To address the point you mentioned in your other thread:

What appears to be more brilliant, is that it is not only obvious keywords (which Google will know from the general subject of my site), but really obscure technical (but extremely relevant) keywords from my knowledge base.

I'm seeing it for every word on a site, obscure or not. I haven't had a chance to verify that it's a complete word list, but it certainly appears that way. I'm actually thinking more and more that this is the result of some kind of spam or sabotage. There are some 'clues' in the pages spidered that point towards this, I think:

- Only single words. Clearly, content discovery would work better with more intelligent input
- Bad parsing. I've seen a couple of examples of nbsp passed as a search query. But maybe I'm giving Google too much credit, considering 9 million 'revealed' results [google.com] for that as a keyword
- URLs. On one site, this spidering also affected a form which expects a URL as input. I had a bit of spare time this evening to analyse the many thousands of URLs requested from this form by Googlebot. They are a very particular bunch, focussed almost entirely on 'social networking' types of URLs

It's still inconclusive as far as I can tell, but I'm determined to track down the source. If it's sabotage or spam, frankly it doesn't work very well as I've seen no impact other than server logs are getting quite big, and Google's Webmaster Tools taking on some surreal aspects.

In terms of links, I considered this as a possibility - the URLs get spidered because of a link to the URL containing the a relevant query string, which is then subject to mass spidering/linkage based on previous input. The fact that I've seen URLs passed when the query string should be a URL points at signs of intelligence, at least.

bouncybunny

3:18 am on Feb 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Andy

I'm seeing it for every word on a site, obscure or not

Which makes our cases different to some degree. I'm wondering if Googlebot has followed one specific link from somewhere and then just looked at the link titles of the search results and run random search queries based on those. This might explain the on-topic nature of the indexed URLs. If this is the case, then this must be something that has been intentionally set up by Google (or as a by-product of something else). Because simply following the links of search results only takes the user to static HTML pages with static looking URLs.

- Only single words

Check, same as mine. I haven't found any 'blue+widgets' URLs indexed.

- Bad parsing...

Not in my case. Of course this might simply be due to our website software working in different ways. In my case, it really is picking up the urls exactly as if a human user ha searched for a keyword. It then follows the urls in results pages using the <next> and <previous> links, which would make sense.

I think I will block these with a robots disallow, or noindex. Which would be better in this case, do you think?

theBear

10:17 pm on Feb 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sure sounds like someone is feeding Google urls pointing to your search system already prefilled the motive could be to make your site look like spam.

You have to be very careful with scripts, too many ways to self inflict major damage or for others to cause mischief ;-).

Receptional Andy

10:46 pm on Feb 27, 2008 (gmt 0)



bouncybunny, apologies that I missed your question:

I think I will block these with a robots disallow, or noindex. Which would be better in this case, do you think?

I opted for a robots disallow, since the number of requests from Google was veering towards the excessive, and robots.txt (usually) prevents spider requests. Meta elements encourage repeat spidering since spiders need to check if they've changed.

However, there's a difference between URLs that have already been 'discovered' and those that spiders have never visited. Discovered URLs hang around for some time, in one way or another.

theBear:

You have to be very careful with scripts, too many ways to self inflict major damage or for others to cause mischief ;-)

I understand where you're coming from, but I haven't looked at a static HTML site for a number of years. The web is driven by dynamic webpages, most of which rely on parameters in one form or another.

Perhaps I'm naive, but I haven't seen this kind of large-scale sabotage/changed spidering behaviour before, which is why I'm so keen to look at it in more detail.

bouncybunny

11:01 pm on Feb 27, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



bouncybunny, apologies that I missed your question:

No problem, and thanks. I've just gone ahead and disallowed googlebot, but not stopped any other spiders. The Adsense spider has been indexing these pages for a couple of years with no apparent ill effect. But that would make sense, as advertising on the results pages would feed URL information back to Google about the generated urls. But the googlebot is supposed to be separate surely? Or is this me being naive?

Yahoo and MSN don't appear to have found their way there at all and I have been unable to find any external (or internal for that matter) links to these urls from elsewhere.

theBear

2:57 am on Feb 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Receptional Andy,

I just looked at a site I'm quite familiar with that has a search routine that produces pages with adsense ads.

The site has been live since late spring 2005, it shows no signs of the ad handling bots handing off urls from that to the system that produces search results.

There are search routine results in the index from generated pages that the regular Google bots are allowed to harvest.

It is quite possible that it does happen, however I can't for the life of me think of a reason that Google would do such a thing. Now someone else playing in your sector or just out to see what "this" does might either intentionally or not cause such to occur.

Remember while your mileage may vary, most computer systems are quite consistent.

This reminds me of the query string issues that occur from time to time.

Receptional Andy

2:41 pm on Feb 28, 2008 (gmt 0)



Remember while your mileage may vary, most computer systems are quite consistent.

This is true in that if you put the same data in you should get the same result back. Having a site search and adsense does not mean putting the same data in. That doesn't qualify as consistency.

Besides, I think adsense is a red herring, because the two examples I've looked at in detail have no adsense or indeed any other advertising.

I've looked at a number of sites across different industries, technologies and even different countries/hosting etc. The vast majority are not experiencing strange pages being spidered. But that does not really tell us anything. In terms of criteria, I can certainly rule some things out:

- Technology used doesn't matter
- It's not to do with sites being hosted in the same place, or any other 'links' like WHOIS data or actual hyperlinks
- Industry is irrelevant, and this affects non-commercial sites too
- There is no 'bug' in forms that causes this to happen

I can also say that:

- It requires fairly large scale spidering to harvest data
- If there is any manual intervention at all, it's minimal
- All that's required is a GET form
- It's only affecting Googlebot

ecmedia

3:41 pm on Feb 28, 2008 (gmt 0)

10+ Year Member



I have had the same problem using b2evolution CMS. It comes with a default search script and I thought that was the culprit. After removing it a month ago, the problem continues.

pageoneresults

4:14 pm on Feb 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's a simple routine that I use to get a snapshot of indexed URIs...

"example.com"

That's it. Just enter in quotes your domain and click search. That's the first step. Go to Advanced, change to 100 results per page and begin perusing. The first thing you may find are lots of references to your domain that may not be linked. You'll visit those pages and in many instances, not be able to find your domain name reference even though it was shown in the snippet of the SERPs. Why is it there and how did it get there? Why isn't it showing in the Cached version? Why isn't there a Cached version?

You're probably going to uncover all sorts of stuff you didn't know existed. I've done some digging in this area myself and I'm convinced there is all sorts of foul play happening out there and each day I learn something new, I'm that much closer to being able to put all the pieces of the puzzle together. :)

That one simple search above is the start of the process. There are other "simple" searches that will uncover stuff that will leave you wondering why and how. There are so many rogue bots out there today that scrape sites and regurgitate who knows what. Like you said earlier, maybe a routine went bad during a scrape and all they could get were single keywords.

There's a topic in The Wall right now that discusses a new piece of software that turns Google into a Vulnerability Scanner. Download that program and review the list of Dorks and see if you are susceptible to any of those. Review the advanced queries they used in that software to find those vulnerabilities. After seeing that program at work, I'm a believer that there is an underground network of saboteurs in our industry! Ting, ting, ting...

theBear

4:19 pm on Feb 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On 2/28/2008 at somepoint in time pageoneresults was heard to thusly say,

"I'm a believer that there is an underground network of saboteurs in our industry! Ting, ting, ting... "

Yeppers ... but don't forget that foot guns are also at work in some cases.

theBear

4:32 pm on Feb 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Receptional Andy,

Let me suggest that:

- It's only affecting Googlebot

The most you can say is that Google has actually indexed something and you have no idea why it indexed that something, and further you don't see the other search engines indexing those items.

A possibility would be that someone is only feeding Googlebot and not the other bots.

You really need to see what is going on with various downloading systems (scrapers and rouge bots, are downloading systems).

What I'd like to know is if those indexed pages slowly exit the index (because there is no on site link to the page).

pageoneresults

4:39 pm on Feb 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What I'd like to know is if those indexed pages slowly exit the index.

My guess would be that they do. Depending on the routine, someone or something may be dumping those URIs into the index at some sort of interval. Put em up, let them get indexed, pull them down. Wait a month or so, put them up, let them get indexed, pull them down. I don't know about you, but common sense tells me that has to have some sort of effect. What, I don't know. But, in many instances when people are suffering in the SERPs, the backtracking "always" leads to something from this realm of discussion.

Receptional Andy

6:12 pm on Feb 28, 2008 (gmt 0)



You really need to see what is going on with various downloading systems (scrapers and rouge bots, are downloading systems).

To a large extent I do. I haven't found a request for any of the odd URLs from anything other than Googlebot. One site employs a pretty effective bot trap too, and there hasn't been anything noticeable there either.

if those indexed pages slowly exit the index

I can't say unfortunately, as for various reasons the examples I have control over have now blocked this via robots exclusion. So, the pages are going to trickle away as a result of that.

As best I can tell, the pages went pretty much straight into the supplemental index, and do pages ever disappear out of there anyway?

What I'm hoping to see is the type of snippet you see for pages blocked in robots.txt but with links to them being spidered still - that might allow identification of link text used which would be one step nearing to finding if and where links to the URLs exist.

Incidentally, while Google is much better at handling exact duplicates, that does not discount the possibility of this occurring on any site, since you can appending useless variables to the end of any URL. Perhaps unlikely to have an effect (but then, the same applies to the URLs I've seen indexed: zero impact so far) although on such a large scale, who knows?

You could certainly use such techniques to try to damage a websites theme, and perhaps there are some believers in the 'supplemental ratio' out there.

Maybe Google are going to bring back the count of pages indexed on their homepage and are determined to win this time ;)

theBear

6:31 pm on Feb 28, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"and do pages ever disappear out of there anyway?"

Yes after a period of time.

keepontruckin

12:06 am on Feb 29, 2008 (gmt 0)

10+ Year Member



It would be nice if google made available in the google webmaster tools the origin of ALL the external links with a cache of the page. Seems like a simple solution to the problem. Then we would all know where the links originate and take some appropriate action. All this guessing is enough to drive one to drink.. cheers!

Oliver Henniges

6:39 pm on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How funny, I came across a similar question in this thread [webmasterworld.com], because I recently thought it was quite convenient for my visitors to make their search-results in my internal product-search-function bookmarkable, so that they may leave notes in forums independent of my internal categorial organization.

> They're valid URLs, valid parameters and valid content

What do you mean with valid? Syntactically correct?

You may add a sequence like ?q=rubbish to any URI and the output will in most cases be the same as with no parameters at all. The "validness" of such an URI is absolutely independent of the actual content of the database and independent of the question of whether the requested page is a GET-CGI at all. No spider coming across such a link will be able to decide how the output is generated, and the rfc specifications necessarily cannot prescribe whether myfile.xyz is allowed to be a true CGI, nor what CGIs may do and what they may not.

I am not sure whether I really understood what this thread really is about, I just roughly went through it and cannot provide case-data from my logs. All I'd like to stress is that the whole topic seems closely related to the duplicate-content-filters and the canonical-issues.

It has been stressed on many occasions that it was actually google's part to get this fixed, not that of us webmasters (e.g. by defining our preferred domain in webmaster central and redirecting www vs non-www adequately). But the more I think about it, the more I see that this is nearly impossible.

Maybe the whole idea of indexing (static?) content is completely outdated in times where most content is generated dynamically.

Receptional Andy

6:47 pm on Feb 29, 2008 (gmt 0)



What do you mean with valid? Syntactically correct?

Yes, but more than that.

Valid URLs: a 200 response, and a URL that it is expected visitors will request.

Valid parameters: parameters that the script is expecting to be present in a GET request, in order to deliver an appropriate response. They aren't random variables.

Valid content: the parameters result in an expected response. Unique content will be returned in response to a request for a URL containing these parameters.

closely related to the duplicate-content-filters and the canonical-issues

To be blunt, this is not really related to this thread at all. I want to determine the cause, not the effect (which, incidentally, is nil. I've not seen any impact on search results whatsoever).

[edited by: Receptional_Andy at 6:48 pm (utc) on Feb. 29, 2008]

Oliver Henniges

9:36 pm on Feb 29, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> To be blunt, this is not really related to this thread at all.

I apologize for repeating what pagesoneresults had already tried to insinuate:

How do you check the input-parameters are "expected" and that the response is "unique"? What does your script throw out, if input is as unexpected as your logfile-entries? Does the CGI 301 to a more "static" page or simply throw out the result as is under exactly the requesterd URI?

Without any final redirect, it is quite likely that the response pages to such a high number of arbitrary words from the websites will come up with very, very similar, if not identical content (though details depend on your specific site, of course). Sry, but I think this IS a potential relevant cause: If a competitor manages to get thousands of so similar idempotent URIs from your site into googles index, it may happen such a duplicate-content-filter is applied to the whole site.

So, maybe, someone is just testing parameters for such a sabotage-strategy on your small site before lauchning a larger attack in his real field.

Another speculative reason: google is collecting data on semantic relations.

And for the online-tool: What happens to the URL typed in after posting? Any copyright-issues involved? Any automated requests to sites that google owns, like youtube, violating their TOS?

Receptional Andy

9:55 pm on Feb 29, 2008 (gmt 0)



I'll give an example which I think illustrates the problem,

Assume a tool which translates words from one language to another. It's useful that people are able to translate these words, bookmark the resulting URLs, and easily get to results for words they haven't checked yet. A GET form, with visible URL parameters is the appropriate interface to such a system.

In the context of a site dedicated to translating words, this wouldn't even be undesirable behaviour. In the context of the sites i've seen this happen on, it's an annoyance. Hell, if I was that way inclined I would just capitalise on this volume of new pages.

I have wider concerns. There are many degrees of separation between the sites I've seen experience this effect. So, any GET form needs to be blocked in some way, right? Or perhaps, any URL parameter that doesn't come from a 'real' visitor? That's a major headache.

Oliver Henniges

12:44 pm on Mar 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> I would just capitalise on this volume of new pages.

Yepp. Make people bookmark those pages and make them throw around those backlinks in forums. This is a very natural way sites may grow, and it is absolutely in accordance with what google WANTS to happen.

The only true danger I see so far, is this duplicate content issue, so I take it as a important hint to make sure inadequate input will be properly redirected with a 301 to an explanatory page. The question is whether it is advisable to also redirect URIs with nonsense get-parameters (like the example I gave above) to the original page withouth the '?...'-sequence. Does anyone have an easy .htaccess-entry at hand?

I also understand some of your concerns from bandwidth issues or security risks to googlebot going mad, and also the alarming feeling of not knowing what is going on. But I'm afraid I still did not get the real focus:

> In the context of the sites i've seen this happen on, it's an annoyance.

Why exactly? Your server has to cope with one or the other hacker-attack anyways, so what is the new quality of this kind of logfile-entries?

Please excuse my naive questions. Hopefully, trying to answer them, will help yourself get clearer about the problems involved and decide them from mere paranoia. (Though the latter is an important motivator to continue learning).

theBear

3:04 pm on Mar 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Andy,

Consider:

Step 1: A person requests copies of the pages on your site using a homemade bot or other form of downloader.

Step 2: That person feeds those pages to an indexing system.

Step 3: The indexing system provides a key word list.

Step 4: That person feeds only Googlebot a page with noindex,nocache,follow and further that that page is comprised of urls that say

http://www.example.com/searchsystem.cgi?query=firstkeyword

through

http://www.example.com/searchsystem.cgi?query=lastkeyword

What happens?

Well for one Google doesn't seem to like orphaned pages and if Google doesn't find links the next time it looks your site looks like it has a boat load of orphaned pages.

In addition to the orphaned page issue it also can look like abnormal growth compared to site history and of course it changes your site's update history.

We won't even discuss what happens if several of the searches bring up almost 100% copies of the same page.

theBear

3:10 pm on Mar 1, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BTW a homemade bot isn't difficult to construct using some commonly available household products.

Indexing system likewise.

A little bit of scripting for glue and instant fun and games for some but not all.

This 68 message thread spans 3 pages: 68