________________________________________________________________________________ CyberWire Dispatch // (c) Copyright 1999 // November 30 Jacking in from the "Sticks and Stones" Port: By Suelette Dreyfus Special Correspondent CyberWire Dispatch "Semantic Forests" doesn't mean much to the average person. But if you say it in concert with the words "automatic voice telephone interception" and "U.S. National Security Agency" to a computational linguist, you might just witness the physical manifestations of the word "fear." Words are funny things, often so imprecise. Two people can have a telephone conversation about sex, without ever mentioning the word. And when the artist formerly known as Prince sang a song about "cream," he wasn't talking about a dairy product. All this linguistic imprecision has largely protected our voice conversations from the prying ears of governments. Until now. Or, more particularly, it protected us until 15 April, 1997 - the date the NSA lodged a secret patent application at the US Patent Office. Of course, the content of the NSA patent was not made public for two years, since the Patent Office keeps patent applications secret until they are approved, which in this case was August 10, 1999. What is so worrying about patent number 5,937,422? The NSA is believed to be the largest and by far most well-funded spy agency in the world, a Microsoft of Spookdom. This document provides the first hard evidence that the NSA appears to be well on its way to creating eavesdropping software capable of listening to millions of international telephone calls a day. Automatically. Patents are sometimes simply ambit claims, legal handcuffs on what often amounts to little more than theory. Not in this case. This is real. The U.S. Department of Defense has developed the NSA's patent ideas into a real software program, called "Semantic Forests," which it has been lab testing for at least two years. Two important reports to the European Parliament, in 1998 and 1999, and Nicky Hager's 1996 book "Secret Power" reveal that the NSA intercepts international faxes and emails. At the time, this revelation upset a great number of people, no doubt including the European companies which lost competitive tenders to American corporations not long after the NSA found its post-Cold War "new economy" calling: economic espionage. Voice telephone calls, however, well, that is another story. Not even the world's most technically advanced spy agency has the ability to do massive telephone interception and automatically massage the content looking for particular words, and presumably topics. Or so said a comprehensive recent report to the European Parliament. In April 1999, a report commissioned by the Parliament's Office of Scientific and Technological Options Assessment (STOA), concluded that "effective voice 'wordspotting' systems do not exist" and "are not in use". The tricky bit there is "do not exist". Maybe these systems haven't been deployed en masse, but it is looking increasingly like they do actually exist, probably in some form which may be closer to the more powerful topic spotting. Do The Math ============ There are two new pieces of evidence to support this, and added together, they raise some fairly explosive questions about exactly what the NSA is doing with the millions of international phone calls it intercepts every day in its electronic eavesdropping web commonly known as Echelon. First. The NSA's shiny new patent describes a method of "automatically generating a topic description for text and sorting text by topic." Sound like a sophisticated web search engine? That's because it is. This is a search engine designed to trawl through "machine transcribed speech," in the words of the patent application. Think computers automatically typing up words falling from human lips. Now think of a powerful search engine trawling through those words. Now sweat... Maybe the spy agency only wants to transcribe the BBC Radio World News, but I don't think so. The patent contains a few more linguistic clues about the NSA's intent - little golden Easter eggs buried in the legal long grass. The "Background to the Invention" section of every patent application is the place where the intellectual property lawyers desperately try to waive away everyone else's right to claim anything even remotely touching on the patent. In this section, the NSA attorneys observed there has been "growing Interest" in automatically identifying topics in "unconstrained speech." Only a lawyer could make talking sound so painful. "Unconstrained speech" means human conversation. Maybe it's been "unconstrained" by the likelihood of being automatically transcribed for real time topic searching. Here's the part where the imprecision of words - particularly spoken words - comes in. Machine transcribed conversations are raw, and very hard to analyze automatically with software. Many experts thought the NSA couldn't go driftnet fishing in the content of everyone's international phone calls because the technology to transcribe and analyze those calls was too young. However, if the NSA didn't have the technology to do automatic transcription of speech, why would it have patented a sifting method which, by its very own words, is aimed at transcripts of human speech? As Australian cryptographer Julian Assange, who discovered the DoD and patent papers while investigating NSA capabilities observed: "Why make tires if you don't have a car? Maybe we haven't seen the car yet, but we can infer that it exists by all the tires and roads." One of the top American cryptographers, Bruce Schneier, also believes the NSA already has machine transcription capability. "One of the Holy Grails of the NSA is the ability to automatically search through voice traffic," Schneier said. "They would have expended considerable effort on this capability, and this research indicates at least some of it has been fruitful." Second, two Department of Defense academic papers show the U.S. developed a real software program, called "Semantic Forests," to implement the patented method. Published as part of the Text REtrieval Conference (TREC) in 1997 and 1998, the Semantic Forest papers show the program has one main purpose: "performing retrieval on the output of automatic speech-to-text (speech recognition) systems." In other words, the U.S. built this software *specifically* to sift through computer-transcribed human speech. If that doesn't send a chill down your spine, read on. The DoD's second prime purpose for Semantic Forests was to "explore rapid Prototyping" of this information retrieval system. That statement was written in 1997. There's also an unambiguous link between Semantic Forests and the NSA patent, it's human and its name is Patrick Schone. Schone appears on the NSA patent documents, as an inventor, and the Semantic Forests papers, as an author and he works at Ft. Meade, NSA's headquarters. Specifically, he works in the DoD's "Speech Research Branch" which just happens to be located at, you guessed it, Ft. Meade. Very Clever Fish ================ The NSA and the DoD refused to comment on the patent or Semantic Forests respectively. Not surprising really but no matter, since the Semantic Forest papers speak for themselves. The papers reveal a software program which, while somewhat raw a year ago, was advancing quickly in its ability to fish relevant data out of various document pools, including those based on speech. For example, in one set of tests, the scientists increased the average precision rate for finding relevant documents per query from 19% to 27% in just one year, from 1997 to 1998. Tests in 1998 on another set of documents, in the "Spoken Document Retrieval" pool were turning up similar stats around 20-23 per cent. The team also discovered that a little hand-fiddling in the software reaped large rewards. According to the 1998 TREC paper: "When we supplemented the topic lists for all the queries (by hand) to contain additional words from the relevant documents, our average precision at the number of relevant documents went from 28% to 50%." The truth is that Schone and his colleagues have created a truly clever invention. They have done some impressive research. What a shame all this creativity and laborious testing is going to be used for such dark, Orwellian purposes. Let's work on the mental image of that dark landscape. The NSA sucks down phone calls, emails - all sorts of communications to its satellite bases. Its computers sift through the data looking for information which might interest the U.S. or, if the Americans happen to be feeling generous that day, their allies. Now, whenever NSA agents want to find out about you, they pull up a slew of details about you on their database. And not just the run-of-the-mill gumshoe detective stuff like your social security number, address, but the telephone number of every person you call regularly, and everything you have said when making those calls to 1-900-Lick-Me from your hotel room on those stop overs in Cleveland. And here's the real scary stuff: The NSA likely already has a file on many of us. It's not a traditional manilla file with your name typed neatly on the front. It's the ability to reference you, or anyone who matches your patterns of behavior and contacts, in the NSA's databases. Now, or in the near future, this file may not just include who you are, but what you *say*. British Member of the European Parliament Glyn Ford is one of the few politicians around who is truly concerned with the individual's right to privacy. A driving force behind the European Parliament's STOA panel's two year investigation into electronic communications, Ford is worried that the NSA possesses technologies that are "potentially very dangerous" to privacy and yet have no controls over their activities. The Australian aboriginal activist and lawyer Noel Pearson once said that that the British gave three great things to the world: tea, cricket and common law. If unchecked, the NSA and its sister spy agencies in the UK/USA agreement may use this technology to lead an assault on the most important of those gifts and the common law tenet "innocent until proven guilty" may be the first casualty. How ironic: one Blair wrote '1984' as fiction, and another is helping to make it fact. = = = = = = = = = = = = = = = = An Australian-American writer, Suelette Dreyfus was educated in the UK and US, studied at Oxford University and Columbia University in New York, where she won the prestigious Teichmann Prize for excellence and originality in writing. She is the author of Underground, the first book about Australian computer hacking, available at = = = = = = = = = = = = = = = = = EDITOR'S NOTE: CyberWire Dispatch, with an Internet circulation estimated at more than 600,000 is now developing plans for a once-a-week e-mail publication. Every week, one of five well-known investigative reporters will file for CWD. If you think your company or organization would be interested in more information about establishing an sponsorship relationship with CyberWire Dispatch, please contact Lewis Z. Koch at lzkoch@wwa.com. =================== To subscribe to CWD, send a message to: Majordomo@vorlon.mit.edu No subject needed. In the first line of the message put: Subscribe CWD To remove yourself from this list, send a message to: Majordomo@vorlon.mit.edu No subject needed. In the first line of the message put: Unsubscribe CWD ________________________________________________________________________________ no copyright 1999 rolux.org - no commercial use without permission. is a moderated mailing list for the advancement of minor criticism. more information: mail to: majordomo@rolux.org, subject line: , message body: info. further questions: mail to: rolux-owner@rolux.org. archive: http://www.rolux.org