John Henry was an Audiobook-Readin' Man
You might remember the story of old John Henry. He built rail lines, and could work harder and faster than any man alive. When the company brought in a steam-driven rail driving machine, though, they announced that they were going to fire all of the human rail workers. John Henry stepped up and challenged that machine.
Challenged it, and beat it.
And then dropped over dead.
Keep that in mind as you read this.
Roy Blount, Jr., the president of the Authors' Guild, wrote an editorial in the New York Times on February 25th, arguing that the text-to-speech feature of Amazon's new Kindle 2 electronic book reading device actually violates the intellectual property rights of the authors he represents, as it provides the functional equivalent of an audiobook, without paying for audiobook rights.
The crux of Blount's argument is that it's critical to set a precedent now, because the text-to-speech is an audio performance of the book, and even if the digital vocalization is now lousy, it won't always be.
Not surprisingly, authors who have more willingly entered the 21st century, such as Cory Doctorow, John Scalzi, Neil Gaiman, and Wil Wheaton, have attacked Blount's argument with gusto. Wil even provides an amusing side-by-side audio comparison (MP3) of himself and the Mac's "Alex" voice reading a section of his new book Sunken Treasure.
For Scalzi, Gaiman, and Wheaton, the crux of the argument is that Blount's concerns are worse than silly, because nobody would mistake the text-to-speech for real voice acting. (Doctorow, as is his practice, focuses on the legal aspect of Blount's argument, finding it more than wanting.)
My take on this? They're all wrong (well, probably not Cory)... and they're all right, too. That is, Blount is right about the technology, but wrong in his conclusions, while Scalzi/Gaiman/Wheaton/et al are wrong about the problem, but right about the proper response. The reason that Blount's wrong is that he's just trying to hold back the tide, fighting a battle that was lost long ago. The reason that the 21st century digital writers are wrong is that they've forgotten the Space Invaders rule: Aim at where your target will be, not at where it is.
Text-to-speech is laughably bad now for reading books aloud.
Text-to-speech could very well be the primary way people consume audiobooks within a decade.
At present, text-to-speech systems that go from ASCII to audio follow a few pronunciation conventions, but otherwise have no way of interpreting what is read for proper emphasis. For the kinds of uses that current text-to-speech systems typically see, that's good enough. For reading books, especially fiction, that's not.
But it's not hard to imagine what would be needed to make text-to-speech good enough for books, too. In order to give the right vocalization to the words it's reading, an "AutoAudio Book" would have to have one of three characteristics:
- It could have been told in detail how to emphasize certain words and phrases, probably through some kind of XML-based markup standard. Call it DRML, or Dramatic Reading Markup Language. Given the existence of other kinds of voice control systems (such as speech synthesis markup language and pronunciation lexicon specification), such a standard isn't hard to imagine. It would take some pre-processing of the text files, though, to really make it work.
- At the other end of the spectrum, it could actually understand what it's reading, and be able to provide emphasis based on what is going on in the story (basically, what you or I would do).
- Somewhere in the middle would be a system that had a number of standard emphasis heuristics, and is able to take a raw text file and, after a little just-in-time processing, offer an audio version that would by no means be as good as a real voice actor, but would, for most people, be good enough.
The DRML version is possible now -- hell, I had DOS apps back in the 1990s that would let me add markers to a text file to tell primitive text-to-speech software how to read it. The "understand what it's reading" version, conversely, remains some time off; frankly, that's pretty close to a real AI, and if those are available for something as prosaic as an ebook reader, we have bigger disruptions to worry about.
But the "emphasis heuristics" scenario strikes me as just on the edge of possible. There would have to be some level of demand -- such as would arguably be demonstrated by the success of the Kindle 2 and its offspring. More importantly, it would require a dedicated effort to create the necessary heuristics; amusingly, Blount's editorial has probably done more than anything else to make irritated geeks want to figure out how to do just that. It would probably also need a more powerful processor in the ebook reader; that's the kind of incentive that might make Intel want to underwrite the aforementioned irritated geeks.
One can easily imagine a scenario in which we see a kind of "wiki-emphasis" editing, allowing tech-attuned readers, upon encountering a poorly-read section of an AutoAudio Book, to update it and upload the bugfix, thereby improving the heuristics. (Of course, that would undoubtedly result in orthographic edit-wars and dialect forking. But I digress.)
Ultimately, Blount's fears that a super text-to-speech system could undermine the market for professional audiobooks really have more to do with economic choices than technical ones. The requisite technologies are either here but expensive or just on the horizon, and the combination of technological pathways and legal precedent (as Doctorow describes) make the scenario of good-enough book reading systems all but certain. But that doesn't guarantee that the market for audio books goes away. The history of online music is illustrative here, I think: when the music companies were ignorant or stubborn, music sharing proliferated; when music companies finally figured out that it was smart to sell the music online at a low price, music sharing dropped off considerably.
The more that the book industry tries to fight book-reading systems, the more likely it is that these systems (whether for Kindles, or iPhones, or Googlephones, or whatever) will start to crowd out commercial audiobooks. The more that the book industry sees this as an opportunity -- keeping audiobook prices low, for example, or maybe providing ebooks with DRML "hinting" for a dollar more than the plain ebook -- the more likely it is that book reading systems will be seen as a curiosity, not a competitor.
None of these scenarios may be very heartening for authors, unfortunately. Sorry about that.
At least you're not likely to keel over and die competing with an automated audiobook.