DATA STANDARDS After user requirements, the most important task of the Working Group is the establishment of data standards. The preparation of data is typically the most expensive individual component of an information system. In addition, to obtain the greatest efficiencies and most cost-effective benefits of open architecture systems, it is essential that data exist in standard formats to meet the range of searching, displaying, and printing requirements the new system will have to meet. Also, standard data formats are needed for timely data transfer and exchange, to make it easier to migrate older data to the new technologies that will inevitably emerge, and to help ensure access to historical data. Finally, openly established standards would facilitate the integration of legislative information such as bills and committee reports with data available from commercial sources. It would be productive to seek input from private companies that use and market legislative data produced by Congress. See discussion below under DATA SOURCES: COMMERCIAL AND NON-GOVERNMENT.
Various official as well as de facto standards are acceptable. The standard beginning to emerge as the most promising within the Legislative Branch is the Standard Generalized Markup Language (SGML), which is both an official U.S and international standard. GPO is currently playing an important role in establishing SGML as a usable standard for both Legislative and Executive Branch data. Although SGML offers the most flexibility and capability for meeting the requirements of a sophisticated information system, it is also among the most demanding and possibly expensive to implement. New tools are being introduced regularly in the technology market place that promise, over time, to reduce the cost of SGML and make it easier to use, yet it is important not to underestimate the cost of this standard.
Both the Clerk of the House and the Senate Computer Center also have undertaken projects to evaluate and possibly implement SGML. The Congressional Research Service is also evaluating SGML for its reports for Congress, and the GAO has long used a version of SGML to produce its reports.
An early task of the Working Group should be to establish a subteam of representatives from various legislative branch organizations, and possibly commercial users of this data (See discussion below under the heading DATA SOURCES: COMMERCIAL AND NON-GOVERNMENT), to analyze and make recommendations regarding data format standards. If the cost-effective implementation of such recommendations requires changes in legislative procedures, the Working Group will need to be advised as early as possible. Also, because it will take time to establish and to implement these recommendations, the subteam will need to make recommendations for interim standards.4 Finally, the group will need to establish a process for modifying standards on a regular basis as new data and new retrieval requirements emerge.
One of the challenges for the Working Group will be to find cost-effective procedures for coding data according to standard without placing an excessive burden on any one organization. One of the options made possible by a standard such as SGML is that codes could be established for a variety of data elements, such as bill number, speaker name, subject, etc., and for a wide range of functions, such as printing, data display, retrieval, etc. The resulting groups of codes, sometimes referred to as a "common tag set", could be used by all Legislative Branch agencies for tagging the data for the functions with which they are most concerned. Agencies might then collaborate on tagging some of the data, and thereby share the cost and eliminate duplication in this expensive process. It will still be necessary, however, for one organization to be responsible for the technical maintenance of a shared tag set and its associated formats and standards.
Table of Contents