Re: Tracking and Looksmart

From: Andrew Ellam <>
Date: Wed 17 Jul 2002 11:28:01 -0500

"Darin Velin" <> wrote:

> It is very possible that Looksmart does experience technical
> difficulties with your request. The tracking code you wish to include
> contains a "?" and most search engines spiders and robots cannot read
> past this, it throws them into a continuous loop that breaks down their
> system. Looksmart could be refusing to do this for you because they do
> not want jeopardize their system. =20

Erm, that's partly correct, but the reason seems wrong - unless I'm
misunderstanding completely, which is quite frankly always a possibility :)

It's true that most web spiders don't index pages which contain a question
mark in the URL (except Google, which does).

But the reason isn't that "it throws them into a continuous loop that breaks
down their system", it's that URLs containing question marks are usually
interfaces to a program rather than a static page. The part after the
question mark is a set of parameters which are passed to the program.

So if the URL was (not a real
URL, I made it up) then it might be a database backed website, and maybe
you're asking the database to return article number 192, in printable

And the parameters don't necessarily relate to what content you're being
sent, they might get the web server to actually *do* something - process a
payment, make a cup of tea (maybe) - perhaps as the result of a web form.

So the initial reasons why search engines didn't crawl URLs containing
question marks were that 1) it might be a strain on the program or database
running behind the web server, and 2) it might have some unexpected
side-effect (such as a transfer of funds, or the tea ruining the carpet).

Google has changed this policy, I suspect largely because so many sites now
use content management systems to generate every page from a database -
which often means all their page URLs contain question marks.

Also, the convention has now emerged that question-mark URLs aren't used for
pages with potentially harmful side-effects (such as transferring money) - a
different system is used for those.

As far as I can see, this issue hasn't got much to do with continuous loops,
because there's no reason why question-mark URLs should be any more likely
to generate infinite loops than normal web pages (which actually create them
quite a lot: page A links to B, which links back to A).

There's a more interesting sort of infinite loop too: some webmasters have
written programs which, when visited, create an infinite chain of unique
pages, all slightly different, and all containing fake email addresses, in
order to disrupt spammers who use their own web-crawling programs to search
the web for email addresses. And you don't need to use question-mark URLs to
create these truly infinite loops of pages.

So if a search engine crashed each time it encountered a question-mark - or
an infinite loop of links - then it would crash all the time. The web
contains millions of such links - the search engines can cope.


As far as I understood it, Robert wasn't talking about feeding a
question-mark URL to a search engine, he was talking about using it for

> From: Robert Day <>

> All we want to do is add something typically: ?id=LSMRT
> but they are refusing. They claim "technical reasons"
> but a cynic might take the view that they just don't
> want people to know how good/bad they are in delivering
> ROI.

I can't see any technical reason why this should be the case (I program web
server extensions, so I'm up on the relevant techie stuff). It's very
unlikely a question-mark URL would in any way "jeopardize their system".

It's possible that Looksmart have made a mistake in building their systems
so that they genuinely can't store a URL containing a question mark - but
you have to wonder. Smells like bs.

So, what to do? I'd recommend trying 'extra path info'. If your web server
is Apache (this works with some other web servers too) then you're allowed
to stick extra stuff on the end of the URL, after an extra slash.

For example, if the page is:
then you can use this URL:
and the web server will backtrack until it finds baker.html. The
'From-Looksmart' part is the 'extra path info' - it'll show up in your log
files as usual.

Note that this method will only work if your website consists of static HTML
files - if your pages are created from a database, then this method won't

If you're not sure, then type it into the address bar of your browser and
give it a try, there's nothing to lose.

Hope that was some help,

