Friday, December 29, 2006

I was talking with my partner the other evening about information, and the hurdles websites usually create to its distribution.

For example, a site she is involved with produces research reports, almost exclusively as .pdfs. This is not necessarily bad, per se, but only a very few search engines index the contents of .pdf files and so this research is not searchable, it cannot be discovered serendipitously by a student or policy wonk who happens to be searching using something other than, say, google.

But, even worse, this charitable research group wants to know who is downloading its files, so all links to the .pdfs actually go to a form which asks a few minor bits of information from the reader - where they are located, what professional field they are working in - before redirecting them to the file. BAM. no spiders. So none of these files are available even on the big search engines.

Maybe there are cases where you want to avoid your research getting the widest coverage, but for a charity that's pretty unlikely. Especially since part of the mission is to distribute the research results as widely as possible.

But what about libraries? this is the crux of my current complaint/whine.

I was just visiting the National Library of Australia (NLA). They have this fabulous chart from Jacob La Maire, the gent who discovered the route around Cape Horn and into the Pacific Ocean, and incidentally died for his trouble because the Dutch East India Company didn't want anyone to know about it. Yes, NLA has the chart. Clearly they have a lovely high-quality scan of the chart somewhere. But no, you can't find it online.

So why the hell do they have the damn thing, if they don't let anyone see it?

I mean, let's look at this logically: The taxpayers of Australia have paid to purchase this artifact, to preserve it, and to have it digitized. The Library exists to share knowledge and information, again paid for by the taxpayers. The chart itself is clearly not copyrightable, not even Australian, and is likely registered as a world treasure for its historic value to the entirety of human culture. What possible justification could there be for it to not be online?

I can think of one: the archival scan would undoubtedly be large, and the cost in bandwidth might be high. (There are clear and simple solutions to this problem related to IP addresses, but we won't go into it here.)

Stupidities like this, "we have to hide away our data because, well, just because", drive me absolutely bonkers. Take the BBC. If you're a UK citizen you can freely and easily download their content, because after all your fees helped create it. Everyone else in the world is SOL.

That's inane. Whatcha gonna do wid it now that you have it? Oh, sure, about 1-2% of the content can be boxed and sold in about 3 years. But the news broadcasts? the commercials? get real.

So you don't want to pay for the bandwidth? fine; make it available via YouTube or torrent or some other system. Shouldn't cost you more than a tiny fraction of your bandwidth to seed the files out into the internet.

