Just finished reading John Battelle’s The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture. Thought-provoking in a number of ways, and I’m sure I’ll milk it for a few blog posts. It really is one of the better books I’ve read in a while, and a funny aside–the title doesn’t lend itself to searching very well. The Amazon book search I did on the title “Search,” didn’t put it on the front page of returns. I had to go back and type in the author’s name.
Anyway, reading it right after reading the report, I found several passages that spoke to the issues raised regarding web surveys and what they represent. The first passage recounts the aha moment that Bill Gross had for his paid search engine GoTo.com, which presaged the Google AdWords system:
Put simply, it’s not the quantity of traffic, Gross realized, it’s the quality. Any business would be willing to pay a lot more than seven to ten cents a click for the right traffic. That realization that became Gross’s eureka moment—a moment that, more than any other, spawned today’s Internet advertising economy. For every single online business (even, it turns out, portals) undifferentiated traffic is worth very little, but specific traffic, traffic with an intent to act in relation to a business’s goods or services [Battelle’s emphasis], is worth quite a lot. Gross realized that businesses will pay quite a bit to acquire the right kind of traffic. All he had to do was build an engine that created intentional traffic.
This notion of intentional traffic can be an important one to survey results much as it is to advertising. By in large, the people who answer a site survey are those who come with an intent to act in relation to the site’s content. Harley is absolutely right on that you can’t take survey results and apply it to traffic overall, because a large part of almost any web sites traffic is going to be non-intentional traffic driven largely (I’d guess) by inappropriate search results or simple curiosity generated through media coverage or blogosphere links. In the last MIT OCW evaluation report [9.0 MB!], I noted that 51% of the traffic to MIT OCW was what I call “one and dones”–single page visits (page 20). Now due to a quirk in how our site is instrumented, not all of these can be dismissed as unintentional traffic (we can’t afford to instrument all the PDFs on the site, so if a visitor comes to a Lecture Note page and downloads four or five lecture notes, this still only registers as a single-page visit). I’d guess, though, from what I’ve seen of other site metrics, this is fairly representative of other site traffic.
There are other cuts at web metric data that can be helpful in defining intentional traffic as well. Returning visitors are another indication. On page 14 of the above report, I note that about 40% of our survey respondents indicate they are new visitors, whereas our analytics would indicate this figure is more like 70%. This indicates just how much bias there is in the responses, but it’s not necessarily all bad news. Yes, most first time visitors won’t complete a survey, but on the other hand, most of their answers probably wouldn’t be particularly useful. The portion of first-time visitors who do take the time to complete the survey are an interesting population, because they represent on some level traffic with enough intention with relation to the content to invest in answering questions.
One of my favorite new metrics coming off of our site is the number of zip downloads. A few months back, we set up a system that generates and makes available for each course zip packages of the online course content that can be downloaded to users local computers. We’ve been serving up somewhere between 200,000 and 300,000 of these a month this year, and they are as close as we’ll ever come to a “purchase” on our site. It’s the activity on the site that most directly indicates visitor intent. We also have Amazon links in portions of the site, and eventually when this is rolled out to the entire site, I expect the number of related texts purchased to be an interesting measure of visitor intent.
All of which is a long way of ending up where I ended up in the last related post. No, survey results don’t represent to overall site traffic, but the trick is to figure out the site metrics that can help identify what populations they do represent. They can, I think, provide a clear picture of the types of benefits a site produces and for whom, and when carefully coupled with web metric data produce some reasonable estimates of the volume of those benefits. To a certain extent, they can indicate why visitors who might become repeat intentional traffic do not. I’d love to hear from anyone out there who’s tried to address these issues, and learn more about approaches that have been tried.
The OpenFiction Project clearly doesn’t have an audience either big enough to support the use of the forum, or interested in using a forum at this point. Because all the registrations are spammers and I don’t have the bandwidth to be deleting all of them, I’ve disabled the forum for now. If I use the site for teaching at some future point, I may re-enable the forum, but in the mean time, if you are interested in using it (for your writing group or other related activity) and you are willing to put forth some effort in moderating the forum, let me know and I’ll reactivate it.
I have a Google News search feed for OCW set up, so I’ve been getting a fair amount of news reporting on Ohio Championship Wrestling. I’m not much of a pro wresting fan, but they do have an anthem we might put to good use, if only they make it available through CC.
Here are other a few other things that OCW means:
Orange County Wheelmen (http://www.ocw.org/)
Ozark Cooperative Warehouse (http://www.ozarkcoop.com/home/)
Oxford Cycle Workshop (http://www.oxfordcycleworkshop.org.uk/ocw.php)
Ozark Creative Writers (http://www.ozarkcreativewriters.org/)
Oregon Christian Writers (http://www.oregonchristianwriters.org/)
Ontario Combinatorics Workshop (http://www.rmc.ca/academic/conference/ocw/index_e.html)
Sweet sassy molassy!!!
tOFP’s been listed on Self Made Scholar, and I’ve been getting good traffic off of it. It’s a nice site with other writing resources listed. Check it out.
…that I weep at the sight of actual data about OER and how they are viewed and used? A survey of Japanese attitudes toward OpenCourseWares in Japan from goo Research via What Japan Thinks. A personal favorite:
Q9: What should be the scope of the universities that open up their lecture materials? (Sample size=1,050)
• Just well-known public and private universities 17.2%
• As many public universities as possible 14.2%
• As many private universities as possible 3.4%
• As many public and private universities as possible 64.8%
Or lighthouse cleaning, or… Finally reached a mini-milestone for tOFP. My son is currently not falling asleep unless I’m sitting in the room (he does permit me to work on my laptop, thankfully), and so I’ve finally managed to finish backtagging all the posts from before I moved to WordPress. If I remember to tag this post, there should be nothing listed as “Uncatagorized.”
Diane Harley’s group at UC Berkeley has come out with another great report supporting better evaluation of open educational resources. If you report data from your project to stakeholders of any kind, or are on the receiving end of project data, this report is a helpful look at the usefulness and limitations of web surveys and transaction log analysis (or web metrics).
Harley and crew were able to link survey respondents to their respective transaction logs, and thus determine how representative the respondents were with respect to overall site usage (and the answer in non-math terms is “not very”). If I’m reading the report right, the surveys they used receive extremely low response rates, an order of magnitude lower than those we’ve done for MIT OCW (~0.2% as opposed to our 3-6%), so it’s possible the reliability measures may vary also, but the basic point is well made: You can’t expect the people who complete your survey to represent your overall site traffic.
What Harley doesn’t address, and I think may be the next step in understanding survey results, is what they might usefully represent. One quote:
These findings confirm our fear about survey response bias; the few users who bothered to respond to the surveys are demonstrably different from the average site visitors. Since the results show that the respondents are non-representative on these three behavioral measures, we determined that it would be unwise for us to draw any conclusions from the survey about the characteristics of the site visitors overall.
I’m not sure there is cause for fear here. No, the survey results don’t represent all of the traffic to the site, but I’m not sure that information is worth having anyway. A web site survey is like conducting interviews of people passing through an art exhibit placed in a public pedestrian thoroughfare. Some may have heard of the exhibit and are coming specifically to see it, some may have just been passing by and became interested, a few may have just stopped to glance at a single piece, and a great many are just passing through on their way to lunch. It’s neither possible nor advisable to interview everybody, and only the ones most interested in the exhibit are going to sit for an interview anyway.
The good news is that these are the people you are most likely to learn from anyway. There is of course the danger you’ll only hear good news because you’re only asking the choir to sing (or something like that). You also won’t learn anything about those who choose not to stop, but for that you need a different tool. The question is, what population do survey responses really represent?
I did a calculation a while back on traffic to the MIT OCW site in October 2004 (Page 19 of the ’04 Findings Report). That month, we had 417,598 visitors. Based on the survey data, I did a back of the envelope calculation that we had a core user base of about 50,000. The data I based that particular calculation on suffers from some of the bias that Harley identifies, but if I recall, there was at least a rough correspondence between survey and web metric data on this point. I’m sure someone with a stronger stats background, and armed with Harley’s methodology, could do an even better job of this kind of thing to better identify the core group that survey results do represent well. If anyone out there is interested in this problem, I’m happy to work on it with them.
The other reason to suspect that surveys represent at least some stable portion of a user population is that the numbers appear to be very consistent over the years we’ve done or surveys, with observable trends in figures such as educational role and satisfaction with breadth, depth and quality. I completely agree with Harley that there are problems with extrapolating out survey figures to all traffic to a site. At best, survey figures will always represent a subset of traffic. My hope is over time we can better describe the subset.
On the eve of his passing, I think it appropriate to take a more scholarly look at Captain America’s place in our culture, courtesy of Harvard Law.
The OpenFiction Project had a good month, with 6,657 visits. I haven’t seen much activity on the forum, but then I haven’t been very good about approving new accounts on the forum either, as nearly all are spam, and I am spending all of my energy addressing the Wilms Kids forum administration needs. If anyone out there is interested in acting as administrator for tOFP forum, I’d be glad for the help.
Another 55 downloads of tOFP [ Print ] went out the virtual door last month. No matter how many get downloaded, it doesn’t seem to be impacting the traffic too much.
February 2007 (PDF)