Google
 

Re: Tracking and Looksmart

From: Andrew Ellam <andy.ellam_at_2-minute-website.com>
Date: Wed 17 Jul 2002 11:28:01 -0500

"Darin Velin" <darinve_at_microsoft.com> wrote:

> It is very possible that Looksmart does experience technical
> difficulties with your request. The tracking code you wish to include
> contains a "?" and most search engines spiders and robots cannot read
> past this, it throws them into a continuous loop that breaks down their
> system. Looksmart could be refusing to do this for you because they do
> not want jeopardize their system. =20

Erm, that's partly correct, but the reason seems wrong - unless I'm
misunderstanding completely, which is quite frankly always a possibility :)

It's true that most web spiders don't index pages which contain a question
mark in the URL (except Google, which does).

But the reason isn't that "it throws them into a continuous loop that breaks
down their system", it's that URLs containing question marks are usually
interfaces to a program rather than a static page. The part after the
question mark is a set of parameters which are passed to the program.

So if the URL was http://www.example.com?article=192&printable=Y (not a real
URL, I made it up) then it might be a database backed website, and maybe
you're asking the database to return article number 192, in printable
format.

And the parameters don't necessarily relate to what content you're being
sent, they might get the web server to actually *do* something - process a
payment, make a cup of tea (maybe) - perhaps as the result of a web form.

So the initial reasons why search engines didn't crawl URLs containing
question marks were that 1) it might be a strain on the program or database
running behind the web server, and 2) it might have some unexpected
side-effect (such as a transfer of funds, or the tea ruining the carpet).

Google has changed this policy, I suspect largely because so many sites now
use content management systems to generate every page from a database -
which often means all their page URLs contain question marks.

Also, the convention has now emerged that question-mark URLs aren't used for
pages with potentially harmful side-effects (such as transferring money) - a
different system is used for those.

As far as I can see, this issue hasn't got much to do with continuous loops,
because there's no reason why question-mark URLs should be any more likely
to generate infinite loops than normal web pages (which actually create them
quite a lot: page A links to B, which links back to A).

There's a more interesting sort of infinite loop too: some webmasters have
written programs which, when visited, create an infinite chain of unique
pages, all slightly different, and all containing fake email addresses, in
order to disrupt spammers who use their own web-crawling programs to search
the web for email addresses. And you don't need to use question-mark URLs to
create these truly infinite loops of pages.

So if a search engine crashed each time it encountered a question-mark - or
an infinite loop of links - then it would crash all the time. The web
contains millions of such links - the search engines can cope.

But...

As far as I understood it, Robert wasn't talking about feeding a
question-mark URL to a search engine, he was talking about using it for
ad-tracking:


> From: Robert Day <rpday_at_btinternet.com>

> All we want to do is add something typically: ?id=LSMRT
> but they are refusing. They claim "technical reasons"
> but a cynic might take the view that they just don't
> want people to know how good/bad they are in delivering
> ROI.

I can't see any technical reason why this should be the case (I program web
server extensions, so I'm up on the relevant techie stuff). It's very
unlikely a question-mark URL would in any way "jeopardize their system".

It's possible that Looksmart have made a mistake in building their systems
so that they genuinely can't store a URL containing a question mark - but
you have to wonder. Smells like bs.

So, what to do? I'd recommend trying 'extra path info'. If your web server
is Apache (this works with some other web servers too) then you're allowed
to stick extra stuff on the end of the URL, after an extra slash.

For example, if the page is:
http://www.example.com/able/baker.html
then you can use this URL:
http://www.example.com/able/baker.html/From-Looksmart
and the web server will backtrack until it finds baker.html. The
'From-Looksmart' part is the 'extra path info' - it'll show up in your log
files as usual.

Note that this method will only work if your website consists of static HTML
files - if your pages are created from a database, then this method won't
work.

If you're not sure, then type it into the address bar of your browser and
give it a try, there's nothing to lose.

Hope that was some help,
Andy.

--
http://2-Minute-Website.com/
Web design: inexpensive, straightforward, convenient.






Received on Wed Jul 17 2002 - 11:28:01 CDT


HOW TO JOIN THE ONLINE ADVERTISING DISCUSSION LIST

With an archive of more than 14,000 postings, since 1996 the Online Advertising Discussion List has been the Internet's leading forum focused on professional discussion of online advertising and online media buying and selling strategies, results, studies, tools, and media coverage. If you wish to join the discussion list, please use this link to sign up on the home page of the Online Advertising Discussion List.

 


Online Advertising Industry Leaders:

Local SEO with Video
Houston SEO
Houston Web Design

Add your company...

Local SEO with Video
 



 


 
Online Advertising Discussion List Archives: 2003 - Present
Online Advertising Discussion List Archives: 2001 - 2002
Online Advertising Discussion List Archives: 1999 - 2000
Online Advertising Discussion List Archives: 1996 - 1998

Online Advertising Home | Guidelines | Conferences | Testimonials | Contact Us | Sponsorship | Resources
Site Access and Use Policy | Privacy Policy

 
2323 Clear Lake City Blvd., Suite 180-139, Houston, TX 77062-8120
Phone: 281-480-6300
 
Copyright 1996-2007 The Online Advertising Discussion List, a division of ADASTRO Incorporated.
All Rights Reserved.