THE WAY WE LIVE NOW: 8-10-03; Sexed Texts
By Charles McGrath (NYT) 1113 words
Men — as we know now, thanks to investigators like Dr. John Gray — are from Mars, women from Venus. On our respective planets we, or our ancestors, learned to do certain things differently: shop, argue, deploy the TV clicker. To this ever-expanding list we must now add writing. Not writing in the literal sense of making marks on a page — though clearly there are vast differences there as well (legibility must be more prized on Venus) — but writing as linguistic expression. This is slightly different from conversation, in which, as Deborah Tannen, another of the scholars in the Venus-Mars debate, has taught us, the differences between men and women are so vast as to be almost unbridgeable without years of therapy.
Men and women ostensibly write the same language, on the other hand, but according to a recent article in The Boston Globe, they do so in ways that immediately reveal which sex is doing the writing. A team of Israeli scientists, the Globe article reports, punched into a computer some 600 published documents and devised an algorithm that could predict with 80 percent accuracy the sex of the author.
Let’s try this at home. Here are two passages chosen more or less at random from current magazines: Passage A: ”I was dating this guy who came from a very wealthy family, and I always felt a little uncomfortable about my humble roots. For his parents’ 25th-wedding anniversary, the family had planned a black-tie party at a ritzy hotel. I was nervous about it, but Alex told me he had everything under control. Before the event, he took me shopping and brought me a beautiful gown. The night of the party, he even rented a limo so we could arrive in style. Alex was a perfect gentleman and treated me like a princess the entire night. He even waltzed with me!”
Passage B: ”Ironwood RC-660. . . . Smokejumpers swear by it. You can finally haul that 1,800-pound keg. Whatever the emergency, this American Gladiators-looking tank can handle it. Options include a 75-gallon liquid tank and bullet-resistant enclosure. . . . Honda FourTrax Rancher AT GPScape. . . . No other ATV offers a longer name or a standard GPS system, which helps determine if you’re ripping through Amazon rain forests, shredding the Sahara or tearing up a neighbor’s lawn.”
A no-brainer, right? A is from Venus, B is from Mars. Yes, but not for the reasons you think. When the Israeli stylometricans, as they call themselves, study a text, they scrub it clean of everything that’s ”topic specific” — in other words, no ”gown,” no ”princess,” no ”keg,” no ”bullet-resistant.” This is how sophisticated language analysts work these days. They ignore the obvious stuff and concentrate instead on the seemingly unobtrusive little tics that the writer and reader barely notice. The process is a little like identifying Tom Wolfe by ignoring his suits and his spats and concentrating instead on his socks, but it gets results. Seven years ago, for example, Donald Foster, the Vassar English professor and self-styled ”forensic linguist,” fingered Joe Klein as the author of ”Primary Colors” from Klein’s use of punctuation and adverbs.
Similarly, what the gender-identifying algorithm picks up on is that women are apparently far more likely than men to use personal pronouns — ”I,” ”you” and ”she” especially. Men, on the other hand, prefer so-called determiners — ”a,” ”the,” ”that,” ”these” — along with numbers and quantifiers like ”more” and ”some.” What this suggests, according to Moshe Koppel, an author of the Israeli project, is that women are more comfortable talking or thinking about people and relationships, while men prefer to contemplate things.
But from the same magazine where I found Passage B, I could also have selected the following: ”As the sun sets on a spectacularly gorgeous Miami day, a crowd of people strolling along the Atlantic Ocean coastline are overwhelmed with the same feeling. They’ve gathered to witness a once-in-a-lifetime scene as a beauty crawls out of the frigid ocean water onto the warm sand. Given the attention, you’d assume the passers-by may have stumbled onto a real, live mermaid. This event, however, was far more memorable — a Carmen Electra photo shoot.” That writer certainly sounds like a people person to me. And how about this, from the ostensibly Venusian magazine: ”Hardware detailing is really big this season, and the buckles make these jeans a little edgy and rock and roll.” Kind of thingy, wouldn’t you say?
Tannen suggests that children’s conversational styles begin to differ almost as soon as children begin to socialize, and linguistic differences may go back even earlier. When my daughter was an infant, my wife kept a detailed scrapbook recording her development and proudly noted that by 22 months, for instance, she had already mastered most of the subordinating conjunctions — ”when,” ”if,” ”because” and even ”unless.” When our son came along, three years later, my wife was alarmed to discover that he had little interest in conjunctions, other than ”and,” but had instead amassed a formidable inventory of nouns, starting with ”lawn mower.” Both children, thank goodness, are now happy and well adjusted, but had we known enough at the time, we probably could have turned them into test cases. She, presumably, was a Venusian, interested in relationships; he was a Martian, collecting information.
But what planet are those Israeli stylometricians from, spending so much effort trying to prove something that they could have learned from looking at bylines and author photographs? It would be surprising if our prose did not reveal something about who we are — something more interesting, in fact, than our sexes — and the place to look is precisely at those ”topic specific” references that the programmers have so scientifically ignored. You like waltzing; I like A.T.V.’s. Once we get that established, then maybe we can start to communicate.
(1) Author McGrath is correct in his lukewarm assessment of the Israeli scientists’ research. The “stylometricians” say they can identify, with 80% accuracy, the gender of the author. If pure guesswork produces 50% accuracy, how much of an improvement have they really achieved?
(2) Another reason why the Israeli study is underwhelming: The words that are supposedly more likely in the writing of one gender or another are determined by the content of what the writer’s writing about. McGrath notes that “sophisticated language analysts… ignore the obvious stuff [i.e., the content-words] and concentrate instead on the seemingly unobtrusive little tics that the writer and reader barely notice.” He’s right — this is indeed one of the fundamental methods of forensic linguistic analysis.
So the stylometricians ignored content words – but they focused on words that are content determined. Not much of an improvement, methodology-wise. As McGrath correctly notes, a personal essay by a man would very likely use the personal pronouns, whereas a piece of descriptive writing by a woman would not.
(3) The study introduces further confusion by comparing apples and oranges (or, as the British like to say, chalk and cheese): in a linguistic sense, the personal pronouns don’t contrast with the “so-called determiners” — in other words, we don’t use one as opposed to the other. Forensic authorship analysis rests squarely on this kind of contrastive choice. No contrast, no conclusion.
(4) I have the highest respect for the work of linguist Deborah Tannen, but her conclusions on children’s conversational styles are irrelevant to any assessment of the validity of the Israeli study, which examined printed works.
(5) Don Foster may be a “self-styled forensic linguist,” but he is not a forensic linguist. He draws all kinds of indirect literary parallels on the basis of inferred puns and allusions, subconscious references and associations, and other matters that real linguists do not deal with. He psychologizes about his subjects.
Also, as Professor Gerald McMenamin has pointed out (in Forensic Linguistics: Advances in Forensic Stylistics), Foster becomes fascinated with his own media glory (whereas in reality he’s just a classic case of being in the right place at the right time, with his identification of a new Shakespearean sonnet just when a lot of people were wondering about the author of Primary Colors); he changes his conclusion according to circumstances (instead of gathering data to confirm or support a hypothesis); he gets the linguistics wrong; and, worst of all, he takes credit for inventing a field that was already hundreds of years old and had been used in many legal cases.
Forensic linguistics, correctly practiced, is part art and part science. As Roger Shuy (the academic “dean” of forensic linguistics) has pointed out, it is good linguistics practiced within a legal context. What I report to my clients is not literary or abstruse. It involves specific linguistic data and my impartial evaluation of them.
(6) If anybody knows of linguistic features that reliably (i.e., independently of style, context, and content) differentiate male from female language, please e-mail me at firstname.lastname@example.org and enlighten me.