I’m hoping that Amazon doesn’t actually put this into action:
Method and apparatus for programmatically substituting synonyms into distributed text content. A synonym substitution mechanism may programmatically replace selected words in textual data with synonyms for the selected words. The modification to an excerpt performed by the synonym substitution mechanism may not significantly alter the meaning of the excerpt to a human reader. By replacing one or more selected words in an excerpt with synonyms for the words, illicit copies of the excerpt may be recognized by comparing a copy of the excerpt to the original. Particular permutations of synonym substitutions may be provided in excerpts to particular requestors. The particular permutations may be recorded and used to determine a requestor as the source of a copy of the excerpt. Synonym substitution may make programmatic excerpt chaining difficult by substituting different synonyms for the same word(s) in an overlapping portion of two adjacent excerpts.
The dangers are obvious, albeit entertaining:
“We have nothing to fear, but apprehension itself.”
“I have nothing to offer, but blood, toil, tears, and elbow grease.”
“We few, we happy few, we unofficial association of brothers.”
“I am a jelly donut.”
So much for textual analysis or the linguistic turn.
[Hat-tip to John Scalzi]
24 comments
October 29, 2009 at 8:01 am
Barry
It was mentioned by Tom Clancy, as being thought up by Jack Ryan, for use in CIA documents:
http://en.wikipedia.org/wiki/Canary_trap
It’s probably very old; this is simply (!) a way of automating it.
October 29, 2009 at 8:37 am
dave
Cartographers do it all the time, though I think the difference there is that they’re tampering with their OWN work. The implication here that authors’ texts would be subject to unauthorised synonymisation is… interesting, to say the least.
October 29, 2009 at 9:01 am
ucblockhead
It is a bit different from what mapmakers do. They make one “error” and then look to see if that error shows up in a competitor’s product. Every single map from a particular mapmaker has the same error. This works because in order to figure out what the error is, you have to do all the work of finding out what the actual map should look like. If you’ve done this, you’ve done all the work of making a map yourself and there’s no point in copying someone else’s.
What Amazon is proposing is that each electronic book they sell will have its own unique set of errors, so what I buy is just slightly different from what you buy. The idea being that if a text with a certain error shows up on a pirate website, they know which customer to blame. It is amazingly stupid, because anyone who wants to pirate just needs to get their hands on more than one copy and average the results to make their own untraceable version.
October 29, 2009 at 9:44 am
Ivan Ivanovich Renko
Beyond that– this is someone’s prose. Some writer has laboured over it, edited it, revised it… made it perfect (by his or her lights).
One would think that to have it changed by anyone or anything would raise the author’s hackles.
October 29, 2009 at 10:04 am
herbert browne
Agreed, Ivan… and in a similar vein, there seems to be an expansion of misunderstanding of idiomatic expressions (not to mention irony & sarcasm) in public (& private) exchanges… perhaps a cultural thing, that’s come about by routing calls for clarification, assistance, etc to “English as a 2nd language” locales. ^..^
October 29, 2009 at 10:09 am
teofilo
It’s an interesting idea given that tracking errors and changes like this is standard methodology in the study of medieval manuscripts and the like. Here, though, rather than look for the replication of inadvertent errors as a way of tracking influences and copying over time, Amazon is proposing to deliberately insert errors, then track their propagation. It seems really unlikely to be implemented, and even less likely to work, for all the reasons already given in this thread. But it’s still interesting from that methodological perspective.
October 29, 2009 at 10:54 am
Vance
It’s my understanding that historians always insert at least one subtle but distinctive misstatement of fact in each chapter — I believe Turnitin.com tracks these automatically.
October 29, 2009 at 11:48 am
andrew
I guess this means that Amazon is basically unserious about providing access to books in a manner useful to anyone serious about reading the texts they are
buyingpaying to access at Amazon’s discretion.October 29, 2009 at 11:56 am
andrew
At least they do seem to be considering some of the potential problems:
So if they implement this and you want an unaltered text just make sure you’re buying a license to access the particular types of content that are protected from being copy protected.
October 29, 2009 at 12:16 pm
dave
Vance, no, those are what we in the trade call “mistakes”. And nobody is supposed to know about them….
October 29, 2009 at 1:36 pm
Anderson
They could achieve the same result by replacing a random word with the same word in a subtly different font — annoying the reader perhaps, but not depriving the reader of the author’s intent. (Don’t fucking cite Barthes at me, you know what I mean.)
October 29, 2009 at 5:20 pm
Pala
Sounds Borgesian.
October 29, 2009 at 7:14 pm
Jason B.
This could play hell with my attempts to teach my Comp students about the difference between quoting and paraphrasing.
!@#$ me.
October 29, 2009 at 8:58 pm
rja
Hmm. That plus this and the dissertation will write itself.
November 1, 2009 at 4:49 am
Indiana Joe
I’m wondering how they can avoid violating copyright laws. By intentionally altering the text, they would be creating and then distributing a derivative work.
November 1, 2009 at 12:27 pm
micah
They could achieve the same result by replacing a random word with the same word in a subtly different font
Not really; the whole point of this is supposed to be that it’s a marker which persists even if the ebook format is broken (at which point it’d be easy to figure out which fonts things were in).
I mean, it’s a stupid idea that doesn’t actually work, but I’m not sure there are any non-stupid ways of doing what they want to do.
November 1, 2009 at 1:08 pm
silbey
Not really; the whole point of this is supposed to be that it’s a marker which persists even if the ebook format is broken (at which point it’d be easy to figure out which fonts things were in).
Sure, but all you need are two copies of the ebook to do a difference-comparison and you have the same result.
November 1, 2009 at 1:45 pm
micah
Well, yes, that’s one of the reasons why it’s a stupid idea. I’m just saying that the font-changing thing doesn’t solve the problem that this is trying to solve at all–if all you want is to be able to differentiate between uncracked instances of the ebook, you can do that much more easily with metadata.
November 1, 2009 at 3:06 pm
Anderson
I mean, it’s a stupid idea that doesn’t actually work, but I’m not sure there are any non-stupid ways of doing what they want to do.
Okay, but *my* stupid idea doesn’t change the author’s words, at least.
Seems to me that once the content of a book is on the internet somewhere, the genie has left the bottle, and everything else is desperate, doomed conjurations to get it back in.
November 4, 2009 at 6:54 pm
Jon H
“I mean, it’s a stupid idea that doesn’t actually work, but I’m not sure there are any non-stupid ways of doing what they want to do.”
The only way I can think that it would work is if there were some way for Amazon-provided ebook viewing software to repair the text to its original before it is displayed.
The distributed etext would be corrupted, therefore, but the intention would be for authorized users to view it in uncorrupted form. I suppose they could try to get around the copyright issue of the modified text through a) throwing their market weight around and b) arguing that the corrupted form is simply a packaging format, and is no more an unauthorized derivative work than an encrypted gzipped file of the original text would be.
November 4, 2009 at 8:10 pm
micah
That would work fine until someone figured out the synonym algorithm you were using. Effectively it’s security by obscurity.
November 7, 2009 at 8:44 am
engels
The applications are endless. Music: every downloaded Eroica score will have a few octaves switched here and there. Hey, who’s going to notice?
I’m not sure why but I find the fact that someone is even contemplating doing this incredibly depressing.
November 7, 2009 at 1:13 pm
engels
Come to think of it, wouldn’t the best thing to do to be to subtly change the names of the characters? So if you’re having a converation about Pride and Prejudice and someone starts talking about the Mr Dorsey’s remark to Elizabeth Bendit then you will realise that you are dealing with a Copyright Thief and can report them to the appropriate authorities.
November 8, 2009 at 2:39 pm
nnyhav
That decides it, any future endeavor in real estate development in bedroom communities shall operate under the company name Dormitive Properties.