Introduction |
The capabilities of the Internet to provide
multimedia, interactivity, and rapid transfer of data suggest an incredible potential for
libraries and archives: ready access to resources by a wide range of users, resource
sharing, and preservation of collections all become possible through digital archives and
WWW publishing. Industry Canada is helping
organizations realize this potential through Canadas Digital Collections (CDC)
program.
The program is designed to aid young Canadians in
developing both entrepreneurial and technological skills, focusing both on youth who have
left school, on students seeking employment to fund their education and on recent
graduates seeking work experience. |
|
What is a Canadas Digital
Collections Project? |
Digital conversion or digitizing can be
described as the process of creating an electronic version of a physical item. The item
can be a book, report, letter, index, manuscript, photograph, map, etc.
The process of creating the electronic version can be broken down
into five stages:
- Identificationidentifying a collection or
resource to be digitized
- Digitizationthe existing resource is converted
to electronic format
- Processingthe electronic document is
"cleaned up"
- Preservationthe electronic resource is stored
on a disk or hard drive
- Displaythe new electronic resource is set up
for display and/or retrieval
Digital content can be created using different methods:
- an existing resource can be converted to electronic format
- an existing electronic resource can be manipulated or
expanded
A digital project can consist of either one or a
combination of these methods.
The Canadas Digital Collections program is designed
to train young Canadians as they transfer existing collections into digital form to be
made available on the Internet. Industry Canada contracts with project managers (e.g.,
Canadian libraries, archives, museums, etc.) to manage teams of youth in the production of
these digital collections. |
|
Identifying the Content for a CDC
Project |
Do you have a collection? |
The first consideration when identifying
content for a CDC project is determining whether a collection is available for Internet
publication. Manuscripts, print material, photographs, physical objects, or a combination
of these may be digitized to create a collection portraying Canadian heritage or stories
of academic, scientific or business interest. Keep
in mind that, even if the original collection is housed in several locations, the nature
of the Internet makes the gathering together of this information into a single source
uniquely possible. Industry Canada encourages partnering among organizations to create
digital collections. |
|
Copyright |
You can never assume copyright clearance.
Once you have identified a collection for digitization, take the necessary steps to ensure
that you have clearance before you submit your proposal. This can be a thorny issue,
particularly if you are planning to use the Internet to provide access to your resource.
Many project teams have to spend a portion of their time getting copyright clearance. |
|
What comprises the collection? |
A digital collection may be comprised of a
combination of materials.
- text
- images
- photographs of physical objects
- sounds
- database or index
- video images*
* Though proposals including video images will be
considered, CDC does not recommend using video images, which tend to create large files
and may present difficulties of transmission over the Internet.
Keep in mind when submitting a proposal that any of these
elements may contribute to a collection, but the materials should be organized in such a
way that they tell a story of local, regional, or national significance. |
|
Is the collection suitable for a digitization
project? |
Preservation of materials and providing
access to Canadian information are both key elements of CDC projects. The following
criteria indicate the types of resources particularly viable for digitization:
- the collection is in fragile condition
- the collection is valuable or rare
- the collection is an in-demand special resource not readily
available elsewhere
- the collection reflects some area of expertise or specialty
at your institution or in your region
- there are restricted hours of access to the collection
- restricted hours affect users who cannot easily get to the
library or archives
- a large population of users cannot get to the library or
archives to make use of the collection
|
|
Technical Considerations |
Hardware and Software |
The two primary technical concerns are
hardware and software. These are some of the questions you can ask yourself as you
consider your existing configuration.
- What hardware and software is currently available for the
project?
- Will the existing configuration handle the material to be
digitized?
- If not, what additional hardware or software is needed?
|
- additional computers
- more disk space
- more memory
- additional peripherals (e.g., scanners, modems,
mic/recording equipment, sound cards and speakers)
- additional software (e.g., scanning software, OCR software,
paint programs, HTML editing software)
|
|
- Is an Internet connection necessary?
- Is a LAN (Local Area Network) required?
The approach you take to digitizing collection content will largely determine the
hardware and software required for the project:
- manual keying of datarequires minimal hardware and software, but this
process is very slow, time consuming, labour intensive, and costly. Manual keying of data
will not suffice if you have images or sounds you want to digitize.
- document imaginga scanner creates a digital image of a document, and stores
the image as a graphic. This process works well for both images and text, but text cannot
be edited or searched once it is stored as an image.
- optical character recognition (OCR)a scanner creates a digital image of a
document and special software converts each element on the page to digital text. The
advantage of OCR is that text can later be indexed and searched.
- soundsusing a microphone or direct audio input and special software, you
can record sounds and encode them as wav, RealAudio or another sound file format.
|
|
Hardware |
Computers
Digitizing can be done on both Macintosh and PC systems. While faster, newer
machines will perform better and make your project more productive, it is possible to use
486 computers effectively. Scanner software will work with a 386 processor and 4 MB RAM,
but this is not recommended for projects requiring a high volume of scanning. A
Pentium 75 MHz PC or equivalent Macintosh is highly recommended for scanning.
HTML work can be accomplished on a 386 or higher PC or
equivalent Mac. If you choose to use a 386 for HTML work, choose your editing program with
some consideration. Some HTML editors require higher-end systems in order to function
efficiently. |
RAM
8 MB RAM is the absolute minimum for scanning software to function. However,
image manipulation, in general, and OCR, in particular, require a lot of memory: ideally,
the machine used for scanning and OCR would have a minimum of 16 MB RAM (32 MB
recommended). |
Hard disk
Scanner packages recommend anywhere from 620 MB available disk space;
additionally, images and files will need to be stored and backed up locally prior to
uploading to the server. Make sure all computers have ample free disk space.
A 1.0 GB hard drive is recommended for machines used for
scanning. |
Modems
A 14.4 bps modem is the minimal requirement; 28.8 bps is recommended for
transferring to the server the large number of files your project will generate. If you do
not have a LAN, it is advantageous to have more than one modem to ensure that the process
of uploading files is efficient, and so that all students have the opportunity to
familiarize themselves with the Internet and Internet software. |
Scanners
Although you can get by with older computers, modems, etc., keep in mind that the
quality of your scanner will determine the quality of images you produce. If a significant
portion of your collection is images, get a good scanner. A quality flatbed scanner is
crucial for OCR and for scanning images. Hand-held scanners are difficult to master and
may require several passes of larger documents to create a single image. Make sure that
the scanner is supported by the OCR software you select.
Your scanner will require a device driver (usually included
in the scanner software), and a SCSI card and cable. Make sure the SCSI card is
appropriate for your scanner. |
Networks
Although you can certainly run a digitizing project on stand-alone computers and
without a local area network, it is far more convenient to have a network. OCR and imaging
can produce large files that are not easily moved from machine to machine. In the case
where you have one scanner and you need to move the files to other machines to manipulate
and work on them, a local area network is very helpful. |
Sound Cards, Microphones and Speakers
If your project includes digitization of sounds, you may
need to upgrade your existing system with a multimedia package. |
|
Software |
WWW Browsers
Your team will need WWW browsers, such as Netscape and Internet
Explorer, both to familiarize themselves with websites and archival resources on the
WWW, and to preview their own documents. It is highly recommended that completed pages are
also previewed on a text-only browser, such as Lynx, to ensure ease of use and
navigability for users who do not have access to graphical browsers. Every machine being
used for development should have browser software installed. |
FTP (File Transfer Protocol)
Ftp software such as WS-FTP (Windows) or Fetch
(Macintosh) should be installed on every machine that has a modem. |
E-mail
Team members will communicate with CDC staff and technical support people largely
via e-mail. E-mail will also be essential if members of your team are working in different
physical locations, or if some members are working independently. There are many
acceptable e-mail programs available. |
HTML Editor
There are many programs available for creating HTML documents (many
can be downloaded as freeware or shareware). The program you choose depends on your needs.
Netscape Navigator Gold comes packaged with a WYSIWYG (What You See Is What You
Get) HTML editor. This package can be useful; keep in mind that sound understanding of
HTML is a valuable skill even for this type of editor. (The Navigator Gold editor works
best with more than 8 MB of RAM.) Other good WYSIWYG software includes Claris Homepage
and Adobe PageMill. A reasonably priced HTML editor is HotDog Pro, a
software package that can be downloaded from the Internet. |
Scanner software
Make sure the scanner software you purchase is suitable for your
scanner. Usually, a scanner is shipped with appropriate software. |
Drawing or Paint Programs
While there are many packages for image manipulation, many teams use
Paintshop Pro, a low-budget shareware program that can save images files in formats
suitable for the Internet. Adobe Photoshop is an excellent software package for
image manipulation and runs on either Macintosh or IBM computers. Another good paint
program is Corel Photo-Paint. CorelDraw 6 and Adobe Illustrator are
both excellent vector editors. |
OCR (Optical Character Recognition)
There is a wide range of OCR software available. Two
popular packages include Textbridge Pro and OmniPage Pro. The Pro versions
of these packages often have more features than the "lite" versions.
Particularly useful in the Pro versions is a "batch" mode that allows the
convenience of scanning a series of pages and performing the OCR in a batch at a later
time. |
Sound
For digitizing sound, the desired quality will be critical in determining the
package you select. Often, the audio card with your computer will come with some basic
software for capturing a phrase or two. Sound Forge (for Windows) is one package
that some audio archiving projects have used. |
|
Internet Connection |
You will need to determine in advance if a
dedicated Internet connection is required for the project or whether a dial-up account
with a modem will suffice.
For most projects, a
dial-up account with a modem will work. Keep in mind that it will take a number of hours
to move your files to the server, especially if you are creating a large number of image
files. |
|
Technical Support |
Once you have determined your hardware and
software requirements, it is equally important to consider technical support. Do you have
in-house technical expertise available for the project duration, or will you require
external consultants for training and troubleshooting?
If your staff has little or no experience with digital projects and a very basic
knowledge of the Internet, you will need expertise to train and guide your project team.
Your staff will have expertise in terms of your collection and subject matter, and pairing
up with a local multimedia consulting firm is a good solution for providing the technical
expertise required. Additionally, hiring team members with computer experience will
minimize the amount of support and training needed. |
|
Project Management and Team Creation |
General Issues |
Taking on a digital project can seem
overwhelming, particularly if you are not a technological expert. However, you can
organize your project and team to ensure that you do not have to be one.
Small libraries and archives, in particular, should plan so that the
project does not take over the entire staffs working hours. Careful planning can
keep your commitment to between one-half and a full day per week.
The role of the library or the archive is primarily as
custodian of the content. The most important role you can play is to ensure that the team
of young people understands the content and the tasks to be completed.
A number of team structures will work effectively. Your
team may work as peers, or you may choose to designate a project manager. Regardless of
whether you designate a project manager, it is vital that one person document and keep
track of the materials, status of the project, filenames, etc. It is recommended that you
consider designating a project manager, as this simple step at the outset of your project
can ensure careful monitoring of the work, time lines and quality control through its
duration. |
|
Team Roles |
Although every project will require different
skill sets and roles, the following are examples of some of the roles and contributions of
various members might make to a digital project team: |
|
Roles of Host Organization |
- Provide physical space
- Provide day-to-day project supervision: ensure the team is
present and productive
- Provide collection to be digitized
- Provide expertise on content
- Provide extracted database records, or train students to
extract database records as required
- Provide computers and scanner
- Assist in purchasing software
- Review site, proofread, and sign off content
|
|
Roles of Consulting Multimedia Firm |
- Consult on overall project design
- Recommend work flow
- Establish benchmarks for digitizing
- Core training of basic skills: Internet basics, basics of
design, scanning, image manipulation, OCR, HTML, ftp
- Supervise and guide programmers on parsing text; creating
database design; loading data to a database; writing the Web-to-database queries
- Consult on design of site; you may wish to hire the firm to
provide professional graphic design of home page elements, such as backgrounds, buttons,
additional images, and logos
- Review site and sign off technical work
|
|
Roles of CDC Project Team |
Project Manager
- Organize the team and supervise members
- Prepare reports
- Monitor time lines and quality control
- Communicate with other partners
- Ensure that content is available, organized, and that
copyright is upheld
- Ensure overall site development
|
Programmer
- Help set up hardware and software
- Serve as resource to the team
- May take the product produced by other team members and
create an on-line database, develop indexes or search tools as needed
|
Multimedia Assistant(s)
- Work with Project Manager on content and digitization
- Participate in site development
- Perform scanning and OCR
- Perform HTML
- Assist in developing databases, indexes, or search tools as
needed
- Upload the collection files to server
|
|
Hiring "At Risk" Youth |
A number of projects have been successful at
recruiting and hiring "at risk" youth. These include youth who have left school
early and are having difficulty entering the job market, or those who are under-achievers
and experiencing difficulties at school. For recruitment of these youth, talk to teachers
at schools where there are "at risk" students, and agencies that work with
youth. This population will not be likely to find an advertisement at an employment centre
and will not be likely to respond. Working with youth who are at risk can be tremendously
rewarding, but allow for more time for project completion. Teachers and youth workers may
be able to provide advice on how to ensure a positive experience for the youth, and that
the project work gets done. |
|
Preparing for the CDC Project |
Time Lines |
Well before the project is underway, you will
need to develop a time line for completion of the work. Set aside ample time at the
project start for purchasing hardware and software, setting up computers and scanners, and
installing software. Allow youth time for familiarizing themselves with your content as
well as the WWW. Training and storyboarding will also take up significant time, as will
administrative tasks. Make sure to leave at least two weeks at the end of the project for
final revisions, cleanup and refinement. Proofing all the web pages once they are uploaded
is important, and once all the materials are on the website, you will begin to see ways to
improve and enhance the site and improve its usability.
A very general guideline for inexperienced staff digitizing and creating WWW
pages is that approximately 2 pages of text can be designed, OCRd, proofed,
converted to HTML, and uploaded per worker per day (this estimate includes training).
Another good rule of thumb is that an average of 5 images
per worker per hour can be digitized and enhanced. However, keep in mind that image maps,
complex design and layout, complex indexes, databases, etc., will reduce the number of
items you can expect to scan and the overall size of the site you can expect by the
project completion date. |
|
Training the Project Team |
One of the goals of the CDC program is that
all team members acquire new multimedia skills. While the required skills will vary from
project to project, a typical team requires the following:
- Introduction to and overview of creating digital content
- Training on basic Internet skills (e-mail, ftp, World Wide
Web)
- May require training in basic computer skills: using Windows
or Mac
- Storyboarding or content organization
- Scanning software and hardware
- Image manipulation
- OCR (Optical Character Recognition) software
- HTML (HyperText Markup Language) and HTML editing software
If your project includes a database, team members may
require specialized training, such as using database programs, or learning perl and mSQL
|
|
Your Storyboard |
One of the CDC programs mandates is to
create Canadian stories of local, regional, or national significance. It is easy to
forget, when you are faced with the complicated process of creating an on-line database,
that your website should ultimately tell a story. In
the rush to get a project up and moving, it is easy to overlook the big picture. What is
the story or theme of your site? What are you trying to convey to the people who visit the
site? By creating a storyboard, you will be able to describe clearly to the project team
what their tasks are. This storyboard will guide them in all levels of the project, from
graphic design, to page layout, to determining searchable fields in a database. The story
you tell will determine the overall organization of your website.
Most libraries and archives have specialized collections of
photographs, manuscripts, monographs, articles and other materials. It is tempting to
believe the collection is of such importance that a project to sequentially scan an entire
set of materials and put them on the Web will be useful. The intent of the CDC program is
to put up materials on about Canada and of interest to the general public. It is
imperative to provide context and overview and some background materials to demonstrate
why these materials are of interest and how they might be used.
One important factor in determining your storyboard is
whether you have the resources to keep adding to the site once the CDC project is
completed. If so, you need to plan and organize the site to allow for maintenance. If you
are not planning to continue adding to the site, take care in creating links to other
websites as these will change frequently.
The CDC Design Guidelines provide a useful reference for
the storyboarding and design of your website. The purpose of these guidelines is to define
the site structure, site look & feel, page content, credit and graphics requirements
for all CDC projects. |
|
Conclusion |
Not only do schools, communities, and young
people benefit from CDC projects, but libraries and archives do as well. Being involved in
a CDC project provides the ability to make our uniquely Canadian collections more
accessible to a wider audience than ever before. In addition, most digital projects have
the added advantage of gathering materials that may not have been previously related or
easily searchable. Libraries and archives also
benefit from the acquisition of new technological and managerial skills; the forging of
new partnerships at local, regional, and national levels; and the opportunity to be
creative and innovative in a time of severe fiscal restraint. Most important, the
enhancements to our services and our collections afforded by Canadas Digital
Collections program ultimately can only benefit our users. |
|
by Darlene Fichter
January 1997 |
|
Updated November 24, 2000 |