I recently attended a lecture by Professor Lewis Lancaster in which he described his collaboration with what he calls ‘Long Data’. Best if I first quote from a 2011 paper about the process:
Blue Dots
This project integrates the Chinese Buddhist Canon, Koryo version Tripitaka Koreana, into the AVIE system. This version of the Buddhist Canon is inscribed as UNESCO World Heritage enshrined in Haeinsa, Korea. The 166,000 pages of rubbings from the wooden printing blocks constitute the oldest complete set of the corpus in print format. Divided into 1,514 individual texts, the version has a complexity that is challenging since the texts represent translations from Indic languages into Chinese over a 1000-year period (2nd-11th centuries). This is the world’s largest single corpus containing over 50 million glyphs, and it was digitized and encoded by Prof Lew Lancaster and his team in a project that started in the 70s.
OK, so a young academic who studies East Asian Languages is put in charge of documenting a 166,000 page book that’s been translated from ancient Indian to Chinese over 1000 years. The biggest book, the oldest copy. For the first 6 or so years he reads it. Reaching some kind of crisis as I think you would, he decides that reading it was not going to get anywhere. So he talks to Samsung and they help him digitise it. He feeds it into a computer and adds metadata behind each glyph (where it sits on what page and so on).
So now you have 50,000,000+ glyphs stored in memory. (Which reminds me a bit of the 9 Billion Names of God). What do you do with that? At this point I was a little cross with taking a text and chopping it up into ‘Big Data’ cubes, but he seemed to be an honest prof so I waited for the explanation.
He had the computer make all the glyphs blue, and the one glyph, one word, he made red wherever it appears. So he can see patterns. He can ‘feel it’. He picks another word, feels how they are ranged closely, or far apart, intuits a problem. He asks the computer to plot a graph of how often those words appear together over the 1000 years of transcription. There are peaks as new words are developed and then discarded. There’s two peaks that seem oddly similar, but 200 years apart. The computer has found that pages have been accidentally jumbled up about 500 years ago.
The AVIE is the 3D environment built first at Kunst Kamp, now with a bigger version over at City University of Hong Kong. The Blue Dots were fed into the system so that Lancaster could walk through the book, touch any dot and read the glyph right there.
It is charming, and rather like a short story by Borges. But similarly it’s like the infamous German Video Artist who when asked about his work said ‘It is 4 minutes long and in colour’. The computer has seen the book as a whole, it has not seen what the book says. I think that when Lancaster sadly dies the computer will fall mute. It’s a jotting, the structure is in the brain of the man who knows which way to walk and touch.
There is a monk in Korea I am sure that can walk through the actual plates that hold the book and see it – just like Lancaster sees it now. He doesn’t know that the pages are mixed up. He might not really care.
There were other examples and discussion but one part really got me thinking – about Obama’s first presidential speech and how it was seen as far more effective than his second, and how a computer analysis found that in it he employed a rising repetition; a circle of introduction, point, point, point, re-phrase, affirm, affirm, affirm, summarise. This circle apparently can be found in The Book as well.
When Lancaster asked the audience how else could the data be visualised, it was immediately obvious to me that (a) Obama was using church sermon patterns that (b) you would also expect in a religious text and (c) are found also in the epic poems of antiquity because (d) it is easier to memorise text if it is sung because (e) the part of the brain that handles music is a long term storage processor. Which is why we teach children with songs. Do Re Mi.
That is, you can sing songs you heard and recited years ago, and will until you die because that how the brain lays down text for long term storage – connected with tonal ‘meta data’. Even the profoundly senile can sing a song. Music soothes the savage breast but it also parses language and dare I say, the kind of vague and intuitive information that ‘Big Data’ is supposed to offer.
I stuck up my hand and asked – wouldn’t it be better to sonify the data? Because music recognition is a powerful pattern recognition system? He kind of looked like I’d said rubber baby buggy bumpers. It’s a hunch, prof, it’s just a long shot, that bird songs and big data have more in common that you think. People used to track game and find water and know when winter was coming because of nothing in particular but everything at once. Maybe that’s what brains do that computers can’t.
Before the Dean got too anxious I told an anecdote about Silliac and LeapFrog, which made it all about computers again, which made it alright.
But I still got that hunch.