Tuesday, May 6, 2008

Understanding Website Tracking

Stumble Upon Toolbar
Most web sites track their visitors for any of a host of reasons. What the operators do with that information depends on their motives and intent. I track visitors to this site using a free service called Statcounter. My motives are to understand my audience better. One example of this is my article "Browser Wars part Deux". Others track for security, or ad revenue, or more sinister purposes. We as an audience may choose to agree or to disagree with how we are tracked and why.

There is a lot of misunderstanding out there about security and privacy on the Internet and during our browsing experience. Trying to understand the ins and outs of all of this is technical and a bit arcane. Most people don't have the time, patience or the background for it. I do. It's part of my job. Information security has been part of my job for most of my career.

With security and privacy, most of the time people get upset about the wrong things. It's not that their concern is misplaced, it just that people aren't that good at estimating risks.

Sometimes we choose to give up a bit of our privacy for things like free email accounts. Most people believe that companies like Google use their free email as vehicle to deliver ads. Automated analysis suggests something you might want. The most successful companies doing this will be the ones that are effective (suggesting something you really want or need) and not offensive or pushy.

Occasionally, these services get it wrong and the results can be quite funny. Because this blog is hosted by Google, I use a Google account just for it. A lot of the emails going through relate to scouting and astronomy. Google, through Gmail , displays a line at the top of my inbox informing me of interesting products and services. Sometimes it tells me about things relating to camping or astronomy. Regularly, it tells me about astrology, or doom and gloom sites (Nostradamus), and other weirdness. I don't mind it as it's not too pushy.

Website visitors are typically tracked by IP addresses and "cookies". Many people get upset about companies tracking their browsing habits. It's seen as an invasion of privacy and been the focus of a wide debate. The other part of the problem is their use by spammers and criminals.

As with most things, organizations that are the most aggressive in their use of technology tend to cross a line and become the focus of intense debates such as this. The debate then focuses on the use, abuse, and perceived abuse of the technology which then becomes a question of trust. This is no different from your real world business choices.

The first thing people need to know is that not all tracking is bad. In fact, some is necessary. To tell the difference, you need to understand how tracking works. There are several methods described below. If you don't want all the nitty-gritty read the italic paragraphs in each section.

People are tracked by means of IP addresses, Cookies, Web Bugs, and by coordinating the use of these things. Cookies and Web Bugs can even track you as you move a laptop from one place to another (e.g. home, work, hot spots).

IP addresses

Basically, IP addresses aren't all that good for tracking people by themselves because they can be shared and because they don't remain constant.
If you want to see your IP go to What's my IP address.

That may change in the future as we change to a newer standard for IP addresses called IPv6 which is intended to allow every device to have its own unique IP.


If you want to know more about how this works read the points below. Otherwise skip a bit.
  • IP stands for Internet Protocol. IP addresses are how are computers know to talk to each other. When you visit this blog, your browser needs an IP address. You type mangsbatpage.433rd.com and your browser asks the Internet's Domain Name System (DNS) to return an address. Today it sent back 72.14.207.121. Because this is a server at Google, it's unlikely to change too often. The computer I'm writing this on also has an IP address. It's dynamic and assigned by a home firewall/router. It's also private and can't be seen on the Internet. Finally there is the one that my firewall/router has. That one is assigned by my ISP dynamically and changes from time to time. It can be seen on the Internet. That one is also shared by other computers in my house.
  • IP addresses were intended to be unique. In practice their aren't enough of them and they get shared and reused. As a result, they aren't in themselves all that good at tracking people.
    • As I mentioned, some IP addresses are dynamic and change over time.
    • Other IPs are fixed and represent many people. Large companies typically funnel all of their employee browsing through a few IP addresses. While most Internet Service Providers assign individual IP addresses to customers, AOL is (or was) a counter example and operated much like a large company.
    • IP addresses in and of themselves aren't a great indicator of individual behaviour.
  • There should be a healthy privacy debate around IPv6.
Cookies

Cookies are more commonly used for this tracking and as a result they are both abused and misunderstood. Cookies are essentially a way of associating information with a name to provide a memory for web servers. That's actually needed because in their basic form web servers can't tell one page request (or people) apart. Cookies that are very specific and restricted facilitate transactions. Cookies that are broad and unrestricted are open to abuse.

If you want to know more about cookies read the points below:
  • Web servers are "stateless" which is just a fancy way of saying that they have no memory from one page to another. This is fine if the site is just only informational. If there is some kind of transaction happening the site must have a memory. You wouldn't want it any other way.
  • Cookies can be restricted to specific sites. So-called secure cookies are generally a good thing. For example when you bank on the Internet, you most likely use cookies. When you sign on to a secure web site the server returns you a cookie called a secure session cookie. It's really just an enormous random number. Each time you click a new button or move to a new page withing that site, that number is how the web site knows how to connect the dots between the actions on each page. Of course the if someone were to get this number they could impersonate your session. There's a whole host of things done to prevent this.
    • That's one reason why the banks encrypt their sessions.
    • The random number must be extremely strong to prevent guessing.
    • The random number is only good for a short while. When it expires, you must login again.
    • Some web sites change the random number from page to page.
  • Shopping cart sites use cookies to keep track of what's in your cart. They are similar to what the bank does but you might not need to log in.
  • Wikipedia has an article on Cookies here.
Web Bugs

Another way you can be tracked is by the use of so-called "web bugs". These are references to invisible files hidden in a web page that are associated with a unique number. By embedding the same bug in emails and different web pages along with additional reference information, you can be tracked. Unlike cookies and IP addresses, I'm not aware of a clear need for web bugs beyond tracking. In fact there are lots of examples of abuses using this technology.

If you want to know more about cookies read the points below:
  • Web bugs are often image files such as JPEGs that are drawn with a 1 pixel x 1 pixel size. There are other methods and a broader description can be found on Wikipedia.
  • The references number is usually an argument to the file name (after the ? in the URL)
  • The bug is really the reference and not the file.
  • Tracking is possible because the name of the file and the unique number appear in the logs of servers you visit.
  • Several email products can disable web bugs.
Tips
  • If you are doing transactions like banking, you might want to shut down your browser and start it up from scratch to do your banking and shopping. When done the shut down and restart again. This will minimize the possibility of information accidentally leaking between different web sites.
  • Most browsers allow you to select which cookies you'll allow to be set. You can also block entire sites and networks. The down side is that the frequent interactions when the browser asks you about each cookie can be highly annoying. There are also cookie managers that allow you to remove and block unwanted cookies.
  • Browsers often allow you to control the loading of offsite image files
  • Java script can be even more dangerous in what it can steal from your computer. I strongly recommend people use tools like Firefox's NoScript Add-on. This allows you to permit specific sites to use scripts.

No comments: