the news.answers WWW archive
Usenet newsgroup news.answers
The Usenet newsgroup
news.answers
is a repository for periodic informational postings
(also called Frequently Asked Questions postings, or FAQs)
from other newsgroups.
It is a moderated newsgroup.
If you want to post on news.answers, you must follow the
guidelines.
Instead of posting on news.answers you can use the rtfm-mail-server.
See the file
faq-server-help
for details.
A big thanks to the news.answers moderation team
<news-answers-request@mit.edu> who organize the whole thing.
Without them this archive wouldn't exist and faq-readership
would be lot more limited.
More info about faqs can by found in
Finding and Writing FAQs and Periodic Postings
and
FAQ maintenance Aids
.
Archive maintainers
The news.answers archive
of the Computer Science Department,
Utrecht University, the Netherlands is maintained by
Henk Penning.
Jaap Romers did the first version of the html-conversion
and took care of the wais-indexing.
I added the search-by-archive-name and search-by-newsgroup
and maintain the news.answers ftp-archive on which the
contents of the WWW archive is based.
We generate some statistics.
How it is done
The whole WWW news.answers archive is generated daily
from scratch from an ftp archive of news.answers.
The faqs are converted to html, leaving them textually intact,
with one exception: each faq is preceeded by a small
note in red explaining the status of the
document (archived usenet posting) refering readers to the faq
author(s) for matters concerning the content of the faq.
We attempt to convert text that looks like a http/ftp/gopher-url
to something selectable, represented as the original text.
The primary header is stripped to Subject:- and Newsgroups:-lines.
The '*.answers' are stripped from the newsgroups,
except from articles posted only in '*.answers'.
Generation of the html-faqs and -link-files is done by two small
Perl programs and takes about 10 minutes.
The ftp archive is also updated daily from the Usenet spool tree.
Why it is done this way?
Since maintaining a WWW archive of news.answers is not exactly
in my job description, if it can't be automated, I can't do it.
Leaving faqs textually intact is done because not all authors
like a split-up of their faq.
I can't determine (automagically) who does and who doesn't.
The note in red was added after
much hesitation. I don't want to offend authors by sticking
my words into their faq. However, from time to time, readers
were confused as to the responsibilities of the author and those
of the archivist. The note hopes to clear that up in a neutral way.
It is meant to be like a stamp in a library book.
Turning text into selectable urls is done because it makes
the html'ed faqs a lot more useful.
The substitution can be done almost always automatically.
However, if the url is embedded in url-like text
the generated href is too long.
Sometimes the href is too short because
the url looks too much like surrounding text!
I feel entitled to a few mistakes because the substitution
doesn't change the textual representation of the author's text.
Stripping the primary header to Subject:- and Newsgroups:-lines
is done because the header is so very messy.
The header facilitates transmission of the faq on Usenet
and access by news-readers.
The link to the content (body) of the faq is sparse.
The '*.answers' newsgroups are stripped because they
are too big to help in searching, except for articles
posted only in those groups.
The subject:-line can be used as some sort of title and the
Newsgroups:-line can be used to access the by-newsgroup hierarchy.
The Summary:-line is left out because it is often repeated in the
Subject or the first part of the faq.
The news.answers ftp archive
Even before the creation of news.answers members of the department
have tried to keep up an ftp archive of faqs posted on Usenet.
The introduction of the Usenet newsgroup news.answers
made maintaining a proper faq archive look easy.
Adding faqs to the archive was simply a matter of scanning
/usr/spool/news/news/answers and copying files.
However, as it turned out with news.answers, deleting faqs
has become the main problem.
Files become obsolete because faqs acquire a new archive-name,
or worse, faqs simply die.
In the archive, 10% of the faqs are more than three months old.
Some of the most popular faqs (sex, puzzles)
have not been posted for over to a year.
Other news.answers archives
Other news.answers archives on WWW are:
About the department:
penning@cs.uu.nl
Wed Jun 27 11:22:48 CEST 2007