Convertingth This ts the fourth article in a series on text manage- ment and Us influence in environment. Although mation via computers will certainly give corporations a competitive edge through- | out the 1990s, much of the data to be retrieved exists only in printed form. Before this information can be har-. nessed, it. must be transformed into a machine-readable format. |.— There are two ways to convert printed matter into machine-readable MARTIN form: by manually rekeying data via a word processor or by tapping new op- | tical-scanning technology. Optical-scanning eguipment, captures a printed page into a bit-mapped im- age, which is then converted into ASCII using optical character recogni- tion (OCR) software. The software con- verts the bit-mapped page image into ASCII characters by matching the pat- terns of the page image against pat- terns stored in the software. Storage of a bit-mapped image re- guires up to 1M bytes of memory, com- pared with a page of ASCII text, which reguires less than 3,000 bytes. Current OCR software can recognize a variety of typefaces and font sizes, handle typeset text and flag unrecog- | nizable characters. Although optical-scanning systems have advanced over the last few years, | OCR software is not yet 100 percent | accurate, and there may be conversion errors and characters that it can't rec- Continued from Page 68 Systems Inc. projects its revenue growth for the year to double. Once a little-known technology, DOS ex- tenders have come to infiltrate some of the most popular PC software applications. Lotus Development Corp. tapped Rational Systems' 286 DOS extender for 1-2-3 3.0, and both Ashton-Tate and DataFase Inter- national Inc. turned to the Rational prod- uct for versions of their databases Even though support from commercial software developers is strong, many cus- tomers are eying Windows or OS/2 Presen- tatlon Manager for more long-term solu- ttons to the DOS memory crunch. Still, many of the corporate and verti- cal market developers that make up 80 percent of Phar Lap's customer base are not in this camp, said Richard Smith, presi- dent, of Phar Lap in Cambridge, Mass. The majority of these vertical-market sof Lware developers do not have the time ST experience to learn graphical-user-in- managing infor- Exten ders A arket Thrwng APPLIE ognize. Despite these limitations, though, OCR sof iverne provides considerable | benefits, primarily in the area of data- "searchin ; capabilitles. Bit-mapped page images generated by optical scanners are not searchable based on text. For example, consider the image of a page that discusses pricing in di TANJA Ni AJA kim mo put DocuHients Optical ckaraete Hecod v K a purchasing-system reference manual. — To allow users to search for any text that refers to pricing, a key word would have to be attached to the page image. If the image was processed by OCR software, on the other hand, the result- ing file could be accessed by word searches. This would let the user find the page by using a search reguest for terface programming, Smith said. "They know about finite element analy- sis or rendering, but, often they're not sys- tems-type or GUI-type programmers," Smith said. What's more, users of these specialized applications often have no need for Win- dows or OS/2 because these extra layers on a dedicated computer, users said. Take, for example, Wasatch Computer Technology Inc., a developer of high-end graphics software, which has no plans to migrate its package (which includes the Phar Lap 386 DOS extender) over to Win- dows. Neither Windows nor OS/2 offers the 32-bit support that Wasatch reguires to generate fast graphics, company offi- "We need 32-bit code so we can manage 16M-byte pieces of data with reasonable speed," said Mike Ware, president of Wa- sateh Computer in Salt Lake City. "Win- dows doesn't have what we want." B a k o a re NI ? ; a a nije —-. zaka li pi kaj slisi a ubke ke EEA po WEEKNAPPLIGATION DEVI D INTELLIGENCE. e Printed Word to Mac any word on it. ; LO | sr kev reguirement of optical | ah k raložtja] is the ability to man: age the structure of text. For example, a user might wish to use different fonts for different kinds of textual material Or format a document dif ferently for print: ing than for screen display. Machine-Readabla ČeHvetsloti Process ee ea a a uni uka ia John Avakian A technology known as document: structure markup allows for this kind of flexibility. Markup is a scheme of tags that are interspersed throughout the document file. The tags convey in- formation about; the document.s struc- ture and appearance. Markup can indi- cate horizontal and vertical spacing, page breaks, lists, type fonts and point Continued. from Page 68 processors and graphics packages, Gardner claimed. This integration would come in | handy, for example, in deriving the names and addresses for a direct-mail letter draft: ed and printed in a Windows word proces- | sor from a mainframe customer database, tend to diminish performance, and appli- cation-switching capabilities are useless | he explained. The DDE support, could also be harnessed to guickly generate and update a Windows | spreadsheet, with mainframe numerical data to create graphs and charts, he added. "All of these functions can be integrated under the same interface so they appear to | the user as one application," Gardner said. - One company turning to 1/F Builder for these purposes is Information Sciences Ine. (InSci) in Montvale, N.J. InSci has used I/F Builder to create a PC-based Windows 3.0 front end to its mainframe Human Re- | source Management System, according to Laura Hills, InSci's vice president of prod- uct management. Called InSciVision, the | Windows interface vastly reduced the hine-Readable Text uld also be used to mark ate sections within text for eas entification. —17 one scenario, a markup scheme for software reference manuals could indicate hardware implementatlons and version numbers, user interface sections and technical sections. Markup can be employed for proce- dural purposes, such as describing how to format text on a page o what is being formatted. Using this ap- proach, documents are not tied to a specific display medium such as the rinted e. € For Di, paragraphs might be marked with < PARA > at the begin- ning and at the end. When the document is printed, the style guide used might indicate that < PARA > means to skip a line and include no indentation. However, if the document is being displayed on a screen, a different style guide could be used to indicate that means to skip no lines and indent five spaces. The best-known markup language is Standard Generalized Markup Lan: guage. SGML tags are independent of any specific word-processing package, allowing for easy transfers between packages and the text collection. | Next week I will discuss how text is indexed and gueried. B The concepts in this article are de- scribed in a new volume, Text Man- agement, of The James Martin Re- port Series. For more information on this volume, call (800) 242-1240 - or (617) 639-1958. For information on seminars, contact Technology Transfer Institute, 741 10th St., San- ta Monica, Calif. 90402, (213) 394- — | 8305 (in the United States and Cana- da). In Europe, contact Savant, 2 New St., Carnforth, Lanecs., LA5 9BX United Kingdom, (0524) 734 505. |1/F Builder V New DDE Support number of function keys, codes and com- mands users need to know to navigate through the mainframe application. "Instead of having to memorize a series of transaction codes, what; users can do is click on an icon of a file folder that is labeled with the function they wish to per- form," Hills said. The interface, she said, speeds training, makes the system easier to use, and reduces keystroke errors and frustration. "This ab ' lows users to rely more on their knowledge of human resources than their ability to remember codes and transactions," she said. Due by the end of this month, I/F Build- er 2.1 will be priced at $17,500. A run-time version, called I/F Manager, is sold sepa- rately for $395 per workstation. The software can be used to create Win- dows 3.0 interfaces for a variety of host mainframe systems, including the IBM 3080, 3090, 4300 and 9370 running the MVS, VM/SP or DOS/VSE operating systems. Viewpoint Systems can be reached at (415) 578-1591. 8