From lpsmail@access.digex.netThu Nov 9 15:48:57 1995 Date: Thu, 9 Nov 1995 11:11:54 CST From: Shipment Reply to: Discussion of Government Document Issues To: Multiple recipients of list GOVDOC-L Subject: ADNOTES: COUNCIL: SCOUT FOR GOVT INFO IN THE WEB ----------------------------Original message---------------------------- THE MESSAGE BELOW IS FROM ADMINISTRATIVE NOTES, VOL. 16, #15 (Nov. 15, 1995). SCOUT FOR GOVERNMENT INFORMATION ON THE WEB Remarks by Raeann Dossett Internet Specialist, Electronic Transition Staff at the Fall 1995 Depository Library Council Meeting Scout is the working name for a group of Internet tools that will present a web-based interface for searching Federal government information on the Internet. We aren't using this designation because we want to introduce jargon into the project, but as a way to keep this aspect of the larger Pathway Services project distinct from the other components Maggie delineated for you. I hope the term Scout is useful in that regard this morning. If you are familiar with web tools such as Lycos or InfoSeek, you already have a good idea what Scout is going to do. Those tools allow you to search the Internet, especially the World Wide Web, by using keywords and various operators, such as Boolean operators. As a piece of software, Scout will look similar to these existing tools. Behind those looks, however, Scout will deal with quite a different set of information. Instead of trying to broadly index the entire Internet, we will focus Scout specifically on government information on the Internet. In general, this will mean restricting its activities to sites in the .gov and .mil domains only (e.g., www.access.gpo.gov, www.navy.mil). I say in general because we will make exceptions for official government sites housed at educational (.edu) or other sites outside the .gov and .mil domains. Here is a sketch of the software pieces that will comprise Scout: * A web crawler OR a broker/gatherer: This is the tool that will go out on the world wide web at our direction, obtain information about the content of Internet sites, and bring that information back home. That content will be: -- documents or files ASCII SGML GILS HTML PDF -- directory level information for groups of documents or files stored in various Internet protocols FTP gopher WAIS HTTP * That information will be filtered through another piece of software that will discern the filetypes and create a fielded database. -- fields such as: Titles URLs Keywords Originating agency * A database search engine. It will perform boolean searches, as well as allow for natural language queries. You will be able to limit your search to a particular field, such as title, originating agency, and URL. * Finally, because Scout is a web-based tool, Scout will provide an active link to the information resource--click on the URL and you go there. Beyond restricting the scope of Scout to government information, and going into those domains as deeply as possible in search of actual content, we will be able to manually enhance the records in this database to make it more useful to depository libraries. We will be able to add additional fields, such as a title tracing field for products that have historically been supplied in paper format. We will also be able to augment existing fields--providing keywords in addition to those created automatically, for instance. Currently, we have a prototype Scout running at GPO, and we re evaluating additional software packages. Since Maggie and I are on a tight deadline--we only have 10 months left at GPO--we re proceeding on this project as fast as possible. We hope to be putting the system through a beta-test within the next six weeks. You all will be able to see it as soon as we complete the requisition process for the software, and work out all the major functional kinks. I hope this gives Council more information about this aspect of the Pathway Project. We would welcome comments from Council, and from members of the audience, on both our plans for the system and its usefulness as it evolves over the next few months.