Canada's Digital CollectionsCanada's Digital Collections
Subject Index Search The Collections
Alphabetical Listing
Educational Resources Advanced Search
Back to Homepage

Contact Us
Français

Call for Proposals
Model Proposals
Design Guidelines
Technical Specifications
Project Cycle
Libraries and Archives
Community Dimension
Media Guide
Apply Online

View the Collections



PROJECTS FOR LIBRARIES AND ARCHIVES
                                       

Introduction
The capabilities of the Internet to provide multimedia, interactivity, and rapid transfer of data suggest an incredible potential for libraries and archives: ready access to resources by a wide range of users, resource sharing, and preservation of collections all become possible through digital archives and WWW publishing.

Industry Canada is helping organizations realize this potential through Canada’s Digital Collections (CDC) program.

The program is designed to aid young Canadians in developing both entrepreneurial and technological skills, focusing both on youth who have left school, on students seeking employment to fund their education and on recent graduates seeking work experience.


What is a Canada’s Digital Collections Project?
Digital conversion or digitizing can be described as the process of creating an electronic version of a physical item. The item can be a book, report, letter, index, manuscript, photograph, map, etc.

The process of creating the electronic version can be broken down into five stages:

  1. Identification—identifying a collection or resource to be digitized
  2. Digitization—the existing resource is converted to electronic format
  3. Processing—the electronic document is "cleaned up"
  4. Preservation—the electronic resource is stored on a disk or hard drive
  5. Display—the new electronic resource is set up for display and/or retrieval

Digital content can be created using different methods:

  • an existing resource can be converted to electronic format
  • an existing electronic resource can be manipulated or expanded

A digital project can consist of either one or a combination of these methods.

The Canada’s Digital Collections program is designed to train young Canadians as they transfer existing collections into digital form to be made available on the Internet. Industry Canada contracts with project managers (e.g., Canadian libraries, archives, museums, etc.) to manage teams of youth in the production of these digital collections.


Identifying the Content for a CDC Project
Do you have a collection?
The first consideration when identifying content for a CDC project is determining whether a collection is available for Internet publication. Manuscripts, print material, photographs, physical objects, or a combination of these may be digitized to create a collection portraying Canadian heritage or stories of academic, scientific or business interest.

Keep in mind that, even if the original collection is housed in several locations, the nature of the Internet makes the gathering together of this information into a single source uniquely possible. Industry Canada encourages partnering among organizations to create digital collections.

Copyright
You can never assume copyright clearance. Once you have identified a collection for digitization, take the necessary steps to ensure that you have clearance before you submit your proposal. This can be a thorny issue, particularly if you are planning to use the Internet to provide access to your resource. Many project teams have to spend a portion of their time getting copyright clearance.
What comprises the collection?
A digital collection may be comprised of a combination of materials.
  • text
  • images
  • photographs of physical objects
  • sounds
  • database or index
  • video images*

* Though proposals including video images will be considered, CDC does not recommend using video images, which tend to create large files and may present difficulties of transmission over the Internet.

Keep in mind when submitting a proposal that any of these elements may contribute to a collection, but the materials should be organized in such a way that they tell a story of local, regional, or national significance.

Is the collection suitable for a digitization project?
Preservation of materials and providing access to Canadian information are both key elements of CDC projects. The following criteria indicate the types of resources particularly viable for digitization:
  • the collection is in fragile condition
  • the collection is valuable or rare
  • the collection is an in-demand special resource not readily available elsewhere
  • the collection reflects some area of expertise or specialty at your institution or in your region
  • there are restricted hours of access to the collection
  • restricted hours affect users who cannot easily get to the library or archives
  • a large population of users cannot get to the library or archives to make use of the collection

Technical Considerations
Hardware and Software
The two primary technical concerns are hardware and software. These are some of the questions you can ask yourself as you consider your existing configuration.
  • What hardware and software is currently available for the project?
  • Will the existing configuration handle the material to be digitized?
  • If not, what additional hardware or software is needed?
  • additional computers
  • more disk space
  • more memory
  • additional peripherals (e.g., scanners, modems, mic/recording equipment, sound cards and speakers)
  • additional software (e.g., scanning software, OCR software, paint programs, HTML editing software)
  • Is an Internet connection necessary?
  • Is a LAN (Local Area Network) required?

The approach you take to digitizing collection content will largely determine the hardware and software required for the project:

  • manual keying of data—requires minimal hardware and software, but this process is very slow, time consuming, labour intensive, and costly. Manual keying of data will not suffice if you have images or sounds you want to digitize.
  • document imaging—a scanner creates a digital image of a document, and stores the image as a graphic. This process works well for both images and text, but text cannot be edited or searched once it is stored as an image.
  • optical character recognition (OCR)—a scanner creates a digital image of a document and special software converts each element on the page to digital text. The advantage of OCR is that text can later be indexed and searched.
  • sounds—using a microphone or direct audio input and special software, you can record sounds and encode them as wav, RealAudio or another sound file format.
Hardware
Computers

Digitizing can be done on both Macintosh and PC systems. While faster, newer machines will perform better and make your project more productive, it is possible to use 486 computers effectively. Scanner software will work with a 386 processor and 4 MB RAM, but this is not recommended for projects requiring a high volume of scanning. A Pentium 75 MHz PC or equivalent Macintosh is highly recommended for scanning.

HTML work can be accomplished on a 386 or higher PC or equivalent Mac. If you choose to use a 386 for HTML work, choose your editing program with some consideration. Some HTML editors require higher-end systems in order to function efficiently.

RAM

8 MB RAM is the absolute minimum for scanning software to function. However, image manipulation, in general, and OCR, in particular, require a lot of memory: ideally, the machine used for scanning and OCR would have a minimum of 16 MB RAM (32 MB recommended).

Hard disk

Scanner packages recommend anywhere from 6–20 MB available disk space; additionally, images and files will need to be stored and backed up locally prior to uploading to the server. Make sure all computers have ample free disk space.

A 1.0 GB hard drive is recommended for machines used for scanning.

Modems

A 14.4 bps modem is the minimal requirement; 28.8 bps is recommended for transferring to the server the large number of files your project will generate. If you do not have a LAN, it is advantageous to have more than one modem to ensure that the process of uploading files is efficient, and so that all students have the opportunity to familiarize themselves with the Internet and Internet software.

Scanners

Although you can get by with older computers, modems, etc., keep in mind that the quality of your scanner will determine the quality of images you produce. If a significant portion of your collection is images, get a good scanner. A quality flatbed scanner is crucial for OCR and for scanning images. Hand-held scanners are difficult to master and may require several passes of larger documents to create a single image. Make sure that the scanner is supported by the OCR software you select.

Your scanner will require a device driver (usually included in the scanner software), and a SCSI card and cable. Make sure the SCSI card is appropriate for your scanner.

Networks

Although you can certainly run a digitizing project on stand-alone computers and without a local area network, it is far more convenient to have a network. OCR and imaging can produce large files that are not easily moved from machine to machine. In the case where you have one scanner and you need to move the files to other machines to manipulate and work on them, a local area network is very helpful.

Sound Cards, Microphones and Speakers

If your project includes digitization of sounds, you may need to upgrade your existing system with a multimedia package.

Software
WWW Browsers

Your team will need WWW browsers, such as Netscape and Internet Explorer, both to familiarize themselves with websites and archival resources on the WWW, and to preview their own documents. It is highly recommended that completed pages are also previewed on a text-only browser, such as Lynx, to ensure ease of use and navigability for users who do not have access to graphical browsers. Every machine being used for development should have browser software installed.

FTP (File Transfer Protocol)

Ftp software such as WS-FTP (Windows) or Fetch (Macintosh) should be installed on every machine that has a modem.

E-mail

Team members will communicate with CDC staff and technical support people largely via e-mail. E-mail will also be essential if members of your team are working in different physical locations, or if some members are working independently. There are many acceptable e-mail programs available.

HTML Editor

There are many programs available for creating HTML documents (many can be downloaded as freeware or shareware). The program you choose depends on your needs. Netscape Navigator Gold comes packaged with a WYSIWYG (What You See Is What You Get) HTML editor. This package can be useful; keep in mind that sound understanding of HTML is a valuable skill even for this type of editor. (The Navigator Gold editor works best with more than 8 MB of RAM.) Other good WYSIWYG software includes Claris Homepage and Adobe PageMill. A reasonably priced HTML editor is HotDog Pro, a software package that can be downloaded from the Internet.

Scanner software

Make sure the scanner software you purchase is suitable for your scanner. Usually, a scanner is shipped with appropriate software.

Drawing or Paint Programs

While there are many packages for image manipulation, many teams use Paintshop Pro, a low-budget shareware program that can save images files in formats suitable for the Internet. Adobe Photoshop is an excellent software package for image manipulation and runs on either Macintosh or IBM computers. Another good paint program is Corel Photo-Paint. CorelDraw 6 and Adobe Illustrator are both excellent vector editors.

OCR (Optical Character Recognition)

There is a wide range of OCR software available. Two popular packages include Textbridge Pro and OmniPage Pro. The Pro versions of these packages often have more features than the "lite" versions. Particularly useful in the Pro versions is a "batch" mode that allows the convenience of scanning a series of pages and performing the OCR in a batch at a later time.

Sound

For digitizing sound, the desired quality will be critical in determining the package you select. Often, the audio card with your computer will come with some basic software for capturing a phrase or two. Sound Forge (for Windows) is one package that some audio archiving projects have used.

Internet Connection
You will need to determine in advance if a dedicated Internet connection is required for the project or whether a dial-up account with a modem will suffice.

For most projects, a dial-up account with a modem will work. Keep in mind that it will take a number of hours to move your files to the server, especially if you are creating a large number of image files.

Technical Support
Once you have determined your hardware and software requirements, it is equally important to consider technical support. Do you have in-house technical expertise available for the project duration, or will you require external consultants for training and troubleshooting?

If your staff has little or no experience with digital projects and a very basic knowledge of the Internet, you will need expertise to train and guide your project team. Your staff will have expertise in terms of your collection and subject matter, and pairing up with a local multimedia consulting firm is a good solution for providing the technical expertise required. Additionally, hiring team members with computer experience will minimize the amount of support and training needed.


Project Management and Team Creation
General Issues
Taking on a digital project can seem overwhelming, particularly if you are not a technological expert. However, you can organize your project and team to ensure that you do not have to be one.

Small libraries and archives, in particular, should plan so that the project does not take over the entire staff’s working hours. Careful planning can keep your commitment to between one-half and a full day per week.

The role of the library or the archive is primarily as custodian of the content. The most important role you can play is to ensure that the team of young people understands the content and the tasks to be completed.

A number of team structures will work effectively. Your team may work as peers, or you may choose to designate a project manager. Regardless of whether you designate a project manager, it is vital that one person document and keep track of the materials, status of the project, filenames, etc. It is recommended that you consider designating a project manager, as this simple step at the outset of your project can ensure careful monitoring of the work, time lines and quality control through its duration.

Team Roles
Although every project will require different skill sets and roles, the following are examples of some of the roles and contributions of various members might make to a digital project team:
Roles of Host Organization
  • Provide physical space
  • Provide day-to-day project supervision: ensure the team is present and productive
  • Provide collection to be digitized
  • Provide expertise on content
  • Provide extracted database records, or train students to extract database records as required
  • Provide computers and scanner
  • Assist in purchasing software
  • Review site, proofread, and sign off content
Roles of Consulting Multimedia Firm
  • Consult on overall project design
  • Recommend work flow
  • Establish benchmarks for digitizing
  • Core training of basic skills: Internet basics, basics of design, scanning, image manipulation, OCR, HTML, ftp
  • Supervise and guide programmers on parsing text; creating database design; loading data to a database; writing the Web-to-database queries
  • Consult on design of site; you may wish to hire the firm to provide professional graphic design of home page elements, such as backgrounds, buttons, additional images, and logos
  • Review site and sign off technical work
Roles of CDC Project Team
Project Manager
  • Organize the team and supervise members
  • Prepare reports
  • Monitor time lines and quality control
  • Communicate with other partners
  • Ensure that content is available, organized, and that copyright is upheld
  • Ensure overall site development
Programmer
  • Help set up hardware and software
  • Serve as resource to the team
  • May take the product produced by other team members and create an on-line database, develop indexes or search tools as needed
Multimedia Assistant(s)
  • Work with Project Manager on content and digitization
  • Participate in site development
  • Perform scanning and OCR
  • Perform HTML
  • Assist in developing databases, indexes, or search tools as needed
  • Upload the collection files to server
Hiring "At Risk" Youth
A number of projects have been successful at recruiting and hiring "at risk" youth. These include youth who have left school early and are having difficulty entering the job market, or those who are under-achievers and experiencing difficulties at school. For recruitment of these youth, talk to teachers at schools where there are "at risk" students, and agencies that work with youth. This population will not be likely to find an advertisement at an employment centre and will not be likely to respond. Working with youth who are at risk can be tremendously rewarding, but allow for more time for project completion. Teachers and youth workers may be able to provide advice on how to ensure a positive experience for the youth, and that the project work gets done.

Preparing for the CDC Project
Time Lines
Well before the project is underway, you will need to develop a time line for completion of the work. Set aside ample time at the project start for purchasing hardware and software, setting up computers and scanners, and installing software. Allow youth time for familiarizing themselves with your content as well as the WWW. Training and storyboarding will also take up significant time, as will administrative tasks. Make sure to leave at least two weeks at the end of the project for final revisions, cleanup and refinement. Proofing all the web pages once they are uploaded is important, and once all the materials are on the website, you will begin to see ways to improve and enhance the site and improve its usability.

A very general guideline for inexperienced staff digitizing and creating WWW pages is that approximately 2 pages of text can be designed, OCR’d, proofed, converted to HTML, and uploaded per worker per day (this estimate includes training).

Another good rule of thumb is that an average of 5 images per worker per hour can be digitized and enhanced. However, keep in mind that image maps, complex design and layout, complex indexes, databases, etc., will reduce the number of items you can expect to scan and the overall size of the site you can expect by the project completion date.

Training the Project Team
One of the goals of the CDC program is that all team members acquire new multimedia skills. While the required skills will vary from project to project, a typical team requires the following:
  • Introduction to and overview of creating digital content
  • Training on basic Internet skills (e-mail, ftp, World Wide Web)
  • May require training in basic computer skills: using Windows or Mac
  • Storyboarding or content organization
  • Scanning software and hardware
  • Image manipulation
  • OCR (Optical Character Recognition) software
  • HTML (HyperText Markup Language) and HTML editing software

If your project includes a database, team members may require specialized training, such as using database programs, or learning perl and mSQL

Your Storyboard
One of the CDC program’s mandates is to create Canadian stories of local, regional, or national significance. It is easy to forget, when you are faced with the complicated process of creating an on-line database, that your website should ultimately tell a story.

In the rush to get a project up and moving, it is easy to overlook the big picture. What is the story or theme of your site? What are you trying to convey to the people who visit the site? By creating a storyboard, you will be able to describe clearly to the project team what their tasks are. This storyboard will guide them in all levels of the project, from graphic design, to page layout, to determining searchable fields in a database. The story you tell will determine the overall organization of your website.

Most libraries and archives have specialized collections of photographs, manuscripts, monographs, articles and other materials. It is tempting to believe the collection is of such importance that a project to sequentially scan an entire set of materials and put them on the Web will be useful. The intent of the CDC program is to put up materials on about Canada and of interest to the general public. It is imperative to provide context and overview and some background materials to demonstrate why these materials are of interest and how they might be used.

One important factor in determining your storyboard is whether you have the resources to keep adding to the site once the CDC project is completed. If so, you need to plan and organize the site to allow for maintenance. If you are not planning to continue adding to the site, take care in creating links to other websites as these will change frequently.

The CDC Design Guidelines provide a useful reference for the storyboarding and design of your website. The purpose of these guidelines is to define the site structure, site look & feel, page content, credit and graphics requirements for all CDC projects.


Conclusion
Not only do schools, communities, and young people benefit from CDC projects, but libraries and archives do as well. Being involved in a CDC project provides the ability to make our uniquely Canadian collections more accessible to a wider audience than ever before. In addition, most digital projects have the added advantage of gathering materials that may not have been previously related or easily searchable.

Libraries and archives also benefit from the acquisition of new technological and managerial skills; the forging of new partnerships at local, regional, and national levels; and the opportunity to be creative and innovative in a time of severe fiscal restraint. Most important, the enhancements to our services and our collections afforded by Canada’s Digital Collections program ultimately can only benefit our users.


by Darlene Fichter
January 1997

Updated November 24, 2000

To Top


Homepage | Call for Proposals | Model Proposals | Design Guidelines | Technical Specifications |
Project Cycle | Libraries and Archives | Community Dimension | Media Guide | Starter Kit |
Apply Online | View the Collections | Contact Us | Français