[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Documentation Metrics



Sandy Harris wrote:
> 
> "David C. Merrill, Ph.D." wrote:
> 
> > I am working on the set of metrics to be used in reviewing our documents.
> 
> One thing I'd wonder about is whether any useful metrics can be
> generated automatically.
> 
> There's a whole literature on readability indexes based on statistical
> analysis of things like words per sentence, letters per word. Some of
> the key work was Lorinda Cherry and others in the Writers' Workbench
> project at Bell Labs.
> 
> There was a Reader's Workbench project at one point, U of Utah I
> think, with an ex-Bell Labs person from the Programmer's Workbench
> (make and ancestors of CVS) project involved. Anyone know where that
> went? Is the software available somewhere? Did they publish papers?
> 
> There are other things one could measure.
> 
> Frequency of technical terms (first cut at a definition is words not
> in some general-purpose dictionary) or of such terms minus a standard
> list (Linux, ipchains, RFC, ...), or terms neither on list nor in
> glossary (oops!).
> 
> Another variant would use not a standard English dictionary, but one
> of the dictionaries developed for use with non-native speakers.
> http://www.boeing.com/assocproducts/sechecker/se.html
> 
> Frequency of words which indicate rhetorical structure -- therefore,
> however, whereas, except, .. -- or of constructions that reference
> other parts of text -- either pronouns such as 'it' or 'this',  or
> non-specific nouns that refer back to more exact descriptions. In
> many contexts, phrases like 'the device' or 'the interrupt' function
> this way.
> 
> Frequency of various whateverML tags, and their level of nesting.
> Nested lists inside a table structure under a level six heading?
> Methinks I see a problem. One H1 tag followed by 14 K of text with
> only two links in it? That's problematic too.
> 
> Measuring such things precisely and figuring out all the implications
> is a big project. I'd guess there are half a dozen potential theses
> in it. On the other hand, an afternoon of Perl hacking might be enough
> to provide some interesting results.
> 
> My guess would be that at least some of the objectively, automatically
> measurable statistical properties of text would correlate with some
> of the judgements we make -- clear vs. confusing, basic vs. advanced,
> etc.
> 
> I'd love to have a tool that tells me that, compared to some sample
> that covers related docs (say, HowTos for administrators) and that
> users rate as well-written, my docs are measurably different in
> specific ways.

This all sounds interesting, but I would rather start with a more
pragmatic approach for now. I wish I had the time to investigate this,
just to satisfy my curiosity.

Regards,

-- 
David C. Merrill, Ph.D.
Linux Documentation Project
Collection Editor & Coordinator
www.LinuxDoc.org


--  
To UNSUBSCRIBE, email to ldp-discuss-request@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org