NONE: Re: ONLINE-ADS>> Weird Log Data
Re: ONLINE-ADS>> Weird Log Data
Chip Canty (ccanty_at_shell1.shore.net)
Fri, 4 Dec 1998 09:07:34 -0600 (CST)
TREBOR_at_ANIMEIGO.COM WROTE:
> First, a properly configured robots.txt file will prevent
> most well-behaved spiders from accessing the URLs. They
> check it before retrieving the target URL.
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Well, yes and no--depends on how you define "well-behaved."
Inktomi's spider is an example of an otherwise reputable
robot that spiders first and consults your robots.txt file
afterwards.
And don't assume that even the best-known spiders are
well-behaved. We've idenfied one spider from another major
SE that apparently has chosen to ignore your robots.txt file
entirely.
TREBOR_at_ANIMEIGO.COM WROTE:
> Second, make sure that all the encoded data is encoded as
> arguments to the URL, not in the URL itself.
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Good point. Richard's problem stemmed from the fact that
Excite was echoing his entire URL--several of them, in
fact--within arguments to its own URLs.
<MODERATOR'S NOTE>
actually, that is not quite right. i don't think it was
anything excite did. I think it was the way we created the
HTML banner. the values in the form were full URLs, which
were really redirects through our server. all excite can
track (at least with our HTML banner, anyway) is clicks on
the banner itself, not the values selected in the form. of
course the spider didn't know these were values in a form,
it just saw URLs and followed them. it is also why excite
was confused when I brought it to their attention. they
never saw any of the activity because it didn't happen
on their server.
the other way to set up the banner would have been to make
each value in the form a variable, and then have the form
pass that variable to a cgi script which then does the
redirect. the reason we did it the other way, though, was
because it allowed us to initiate the redirect by having the
user just select a menu option. the cgi script way would
require the user to select a menu item, then hit a "submit"
button. basically, we wanted to redirect the user in one
step, rather than two.
</MODERATOR'S NOTE>
TREBOR_at_ANIMEIGO.COM WROTE:
> Finally, keep logs by user_agent, and you can easily
> determine if you're being spidered, unless the spider is
> seriously nasty and is masquerading as a browser.
>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Many do. But in Richard's case it looked to me as though he
fell victim not to a rogue spider, but to an off-line
browser (or browser plug-in) that innocently prefetched and
cached all the URLs that it recognized. Excite's practice
of echoing his full URLs apparently left him vulnerable to
this.
Chip Canty ccanty_at_waypages.com
WAYPAGES PRODUCTIONS www.waypages.com
132 Adams St. #10 800-997-5476
Newton, MA 02458 617-964-9996
fax 617-964-9989
========================================================================
------------------------------------------------------------------------
This week's Online Advertising Discussion List sponsors:
The Mining Co. and MarketWave
NOBODY DELIVERS ONLINE BUYERS LIKE THE MINING CO. _at_Plan says Mining Co.
users are more likely to buy online than users of the average navigation
site. To reach them, contact Alan Wragg (awragg_at_miningco.com)
---
Is your advertising RETURN ON INVESTMENT closer to 0%, 50%, or 100%?
Hit List can tell you. Click here to find out how.
http://www.marketwave.com/default.htm?AdName=OA7&AdSource=OA
------------------------------------------------------------------------
========================================================================
Online Advertising Discussion List To Unsubscribe send UNSUBSCRIBE
http://www.o-a.com/ to online-ads-request_at_o-a.com