Google
 

Re: Tracking and Looksmart

From: Steve Werby <steve-lists_at_befriend.com>
Date: Thu 18 Jul 2002 22:04:46 -0500

"Andrew Ellam" <andy.ellam_at_2-minute-website.com> wrote:
> It's true that most web spiders don't index pages which contain a question
> mark in the URL (except Google, which does).

In case anyone's interested, the part of a URL following the question mark
is called the query string.

> But the reason isn't that "it throws them into a continuous loop that
breaks
> down their system", it's that URLs containing question marks are usually
> interfaces to a program rather than a static page. The part after the
> question mark is a set of parameters which are passed to the program.
>
> So if the URL was http://www.example.com?article=192&printable=Y (not a
real
> URL, I made it up) then it might be a database backed website, and maybe
> you're asking the database to return article number 192, in printable
> format.
>
> And the parameters don't necessarily relate to what content you're being
> sent, they might get the web server to actually *do* something - process a
> payment, make a cup of tea (maybe) - perhaps as the result of a web form.
>
> So the initial reasons why search engines didn't crawl URLs containing
> question marks were that 1) it might be a strain on the program or
database
> running behind the web server, and 2) it might have some unexpected
> side-effect (such as a transfer of funds, or the tea ruining the carpet).

Nothing an honored robots.txt file and smart programming couldn't prevent.
The primary reason is actually much simpler than that. Content on pages
containing a query string were traditionally dynamic in the past and a long
time would pass before most search engine spiders would return to a
previously indexed page. Since pages without query strings tended to be
more static than pages with query strings, many search engines didn't index
such pages. And this made sense because such pages were more likely to
contain content at search time that differed from content at index time,
which would make the search engines less effective and confuse users.
Today, a higher percentage of pages are database-driven and likely to
contain query strings. Sometimes the content on those pages is static,
sometimes it's dynamic. And today more and more dynamic, database-driven
sites do not rely on query strings at all. I imagine this isn't new news
for most of you, but if you have questions about it feel free to ask.

> Google has changed this policy, I suspect largely because so many sites
now
> use content management systems to generate every page from a database -
> which often means all their page URLs contain question marks.

That's part of it - a higher percentage of pages with query strings, more
effective methods of determining whether content is static or dynamic and a
realization that pages without query strings just as easily could be dynamic
(even database-driven) as those with query strings. Most of the sites I
build are dynamic or database-driven, yet do not have query strings, thanks
to my expertise with Apache's mod_rewrite and some PHP skills which are not
hard to learn.

> Also, the convention has now emerged that question-mark URLs aren't used
for
> pages with potentially harmful side-effects (such as transferring money) -
a
> different system is used for those.

It's really just a matter of programmers today writing better code than
programmers of yesterday and clients who are more aware of technical issues.
Matt Wright's FormMail was probably one of the most widely used scripts on
the web, yet it's known to be one of the buggiest and has been taken
advantage by many, many spammers. In many ecommerce sites in the early
years of the web, it was possible to change the price you'd pay on checkout
simply by changing the total price passed in the query string. Times have
changed. Now, competent programmers do not rely on data passed in a query
string for such operations.

--
Steve Werby
President, Befriend Internet Services LLC
http://www.befriend.com/






Received on Thu Jul 18 2002 - 22:04:46 CDT


HOW TO JOIN THE ONLINE ADVERTISING DISCUSSION LIST

With an archive of more than 14,000 postings, since 1996 the Online Advertising Discussion List has been the Internet's leading forum focused on professional discussion of online advertising and online media buying and selling strategies, results, studies, tools, and media coverage. If you wish to join the discussion list, please use this link to sign up on the home page of the Online Advertising Discussion List.

 


Online Advertising Industry Leaders:

Local SEO with Video
Houston SEO
Houston Web Design

Add your company...

Local SEO with Video
 



 


 
Online Advertising Discussion List Archives: 2003 - Present
Online Advertising Discussion List Archives: 2001 - 2002
Online Advertising Discussion List Archives: 1999 - 2000
Online Advertising Discussion List Archives: 1996 - 1998

Online Advertising Home | Guidelines | Conferences | Testimonials | Contact Us | Sponsorship | Resources
Site Access and Use Policy | Privacy Policy

 
2323 Clear Lake City Blvd., Suite 180-139, Houston, TX 77062-8120
Phone: 281-480-6300
 
Copyright 1996-2007 The Online Advertising Discussion List, a division of ADASTRO Incorporated.
All Rights Reserved.