Core Services Cumulative Monthly Report, June 2004
- Refining and Improving Core Infrastructure
- New storage model.
- Sep: Snavely began researching the Linux Logical
Volume Management system as a potential management suite for NAS
storage. Marsh
and Snavely are both maintaining lists of promising technologies for
NAS
service to seed more focused research.
- Jan-Mar: Marsh set up a small out-of-band gigabit ethernet
storage network for developing proof-of-concept for the NAS model.
- Apr: Snavely did preliminary benchmarking on the test storage
network and found promising results. Marsh and Snavely developed a
benchmarking plan.
- May: Marsh began to set up a NAS benchmarking environment
using scripts provided by Snavely and with assistance from Farber.
- May: Marsh identified a new storage vendor and RAID system
that is being used successfully in other campus departments.
- June: Marsh continued benchmarking setup work and revised
some testing methods to ensure accuracy.
- Redundant web services.
- Oct: Marsh and Snavely made plans to try to use the
storage subsystem originally purchased for dlps10 on quik.
- Jan-Mar: Marsh upgraded quik to Fedora Core 1 and installed
on it the storage subsystem originally purchased for dlps10.
- Jan-Mar: Snavely developed a process for synchronizing
production data from dlps10 to quik and began network transfers, but
began to discover problems with the storage subsystem, and started a
series of tests.
- Apr: Marsh and Snavely conducted additional tests on the
storage subsystem, concluded it was faulty, and began the process of
replacement with the vendor.
- May: Marsh shipped the faulty unit back to the vendor for
repair, but it was damaged in transit. The vendor was able to confirm
the fault and supply a workaround--that we simply not a particular
drive bay. The unit was returned and will be used for a different
project.
- Integration of COSIGN single sign-on web authentication.
- Jan-Mar: Snavely coordinated meetings with Prettyman and
Steinberg to inventory existing web applications that use
authentication and develop a transition plan for moving to COSIGN.
- Apr: Snavely presented the transition plan to NISC, and it
was approved.
- June: Snavely met with representatives from NISC to adjust
the implementation timeline and discuss the provision of self-service
for non-umich patrons.
- Production server replacements/upgrades.
- Sep: Marsh installed and configured dlps10, a
replacement server for dlps4. Marsh, with occasional assistance from
Snavely, spent
a substantial amount of time troubleshooting persistent problems with
configuring the storage subsystem before putting the storage online.
- Oct: Marsh transitioned HTI services from dlps4 to
dlps10, but after several days of full production load, Marsh and
Snavely found
problems with the storage subsystem and had to revert HTI service back
to
dlps4. Marsh began the purchase process for a Sun storage subsystem (at
substantially higher cost) that should offer better support and allow
the
transition to dlps10 to happen as soon as possible.
- Oct: Marsh, in coordination with CAEN, formulated an
upgrade plan for servers which are still running Solaris 2.6, which
CAEN will no
longer support starting in 2004.
- Nov-Dec: Marsh transitioned HTI services from dlps4 to
dlps10, using newly-installed Sun storage.
- Nov-Dec: Marsh upgraded dlps5 to Solaris 8 and supported
Steinberg in migrating proxy services to ting, a replacement server.
dlps5 is online as a hot spare.
- Jan-Mar: In coordination with Systems and Web Services, Marsh
upgraded betty to Solaris 8, supported the transition of WebZ to betty,
and then upgraded myrna to Solaris 8 and supported the transition of
library web service to myrna. hank was then upgraded to Solaris 8 and
left online as a hot spare.
- Jan-Mar: Snavely and March upgraded dlps6 to Solaris 8.
- June: Marsh ordered replacement servers for dlps7 and myrna
and additional storage for dlps8 and quik.
- Development server replacements/upgrades.
- Sep: Marsh upgraded ferment to Red Hat 9.
- Oct: Marsh and Snavely spent a substantial amount of
time pursuing what appeared to be a problem with a storage subsystem on
sangria, but what turned out to be a bug in file system diagnostics.
Marsh upgraded all potentially affected servers to newer versions of
the diagnostic software.
- DHCP/DNS services.
- Sep: Marsh, working with ITCS, transitioned DNS
service from longjing to plumwine.
- Oct: Marsh, working with DSS, formulated a plan to
cut over DHCP services from stella to sake during the upcoming holiday
break.
- Image and OCR processing workflows.
- Sep: Snavely resolved a problem with DVD-R volume
name recognition on lemonade, the new loading workstation, and
signalled collection maintainers that we are ready to begin accepting
tests of page image and continuous
tone material submitted on DVD-R media.
- Sep: Snavely resolved reliability problems with data
loading on lemonade related to the use of ssh, incorporated compression
into the
load process to marginally reduce bandwidth requirements, and added two
additional CD-ROM drives for a total of two DVD-ROM drives and three
CD-ROM drives.
- Sep: Snavely began configuring Samba-based file
service on martini, but encountered persistent authentication problems.
- Oct: Snavely coordinated the purchase and
installation of the library's first Gigabit Ethernet (GbE) switch for
improved file service performance on martini. Initial testing confirmed
substantial performance improvements, topping out at approximately
22MB/s sustained throughput.
- Oct: Snavely resolved the authentication problems
with Samba and worked with Hall to begin live testing of file service
on martini.
- Oct: Snavely took bill-ko, the old data loading
workstation, offline.
- Nov-Dec: Snavely began to develop revised OCR server
configuration and scripting to support a load-balanced workflow.
- Jan-Mar: Snavely fininshed setup work and scripting and, in
coordination with Hall, activated the new load-balanced workflow.
- Jan-Mar: Snavely developed scripts to load page images to the
OCR server from Patterson's existing load processes.
- Apr: Snavely revised scripts used for contone image
processing to handle contone page images.
- Digital object integrity.
- Cost model.
- Documentation.
- Jan-Mar: Marsh and Snavely developed an outline of
documentation to be gradually populated.
- Jan-Mar: Marsh updated existing Linux and Perl
installation documentation.
- Security assessment.
- Technology trends awareness.
- Support and Development for Access Systems
- Library management system implementation.
- Oct: Marsh reinstalled Solaris 9 on gracie after a
security incident. Marsh took this opportunity to implement several new
security
measures, including the Sunscreen host-based firewall, to substantially
increase overall system security.
- Dec: Marsh ordered the new development server, storage, and
two CPU/memory boards to upgrade the current server to production
capacity.
- Jan-Mar: Marsh purchased a rack and coordinated the
installation of power, networking, and a repurposed terminal server to
the new ALDC space.
- Jan-Mar: Marsh added two CPU/memory boards to gracie,
bringing it to production capacity.
- Jan-Mar: Marsh repurposed dlps4 as mabel, a dedicated
training server.
- Jan-Mar: Snavely configured CVS for version control of files
related to OPAC customization.
- Apr: Marsh installed and configured clyde, a dedicated
development server. Snavely configured the storage subsystem.
- May: Marsh coordinated moving gracie to the Arbor Lakes Data
Center, with support from Prettyman and Rothman.
- May: Marsh adjusted Sun hardware support agreements to
include clyde and reflect hardware upgrades on gracie.
- May: Marsh installed several Perl modules for use during
conversion.
- June: Snavely coordinated OPAC stress testing, with
assistance from Goldberg, Marsh, and Steinberg.
- June: Snavely installed and configured CVS on clyde, moved
the central repository to clyde, and revised the CVS configuration on
gracie and mabel to support an automated release process for changes
from the central repository.
- DLXS.
- Persistent URLs.
- Institutional repository initiative.
- Oct: Snavely reviewed Ottaviani's draft of our
institutional repository findings and contributed several comments for
changes.
- Oct: Marsh researched requirements for installing
DSpace on sambuca, discovering some difficulty and chains of
dependency. Snavely and Ottaviani agreed to postpone DSpace
installation until after the holidays, while in the meantime putting
sambuca to use as a NAS test server.
- Jan-Mar: Marsh installed a vanilla DSPace instance on sambuca
for testing.
- Apr: Snavely added several user accounts to the DSpace server.
- Collaborating with Other Areas of the Library and the
University
- Library/Core Services coordination.
- Integration with campus authentication.
- Miscellaneous/Unplanned
- Sep: Marsh installed the Perl module Net::Z3950, the Yaz
tool kit, and Zebra for the adjunct files project.
- Jan-Mar: Marsh installed and configured a new terminal server
in room 10, and moved the existing terminal server to ALDC.
- Jan-Mar: Snavely and Blanco spent a significant amount of time
reloading statistics and reworking statistics automation due to a
problem with tabulation that occurred in the fall.
- Apr: Snavely and Blanco resumed normal statistics processing.
- May: Marsh began testing ITCS mail with new, higher storage
quotas in preparation for shutting down CS-administered email service,
- May: Marsh began testing mod_throttle configuration for
potential use in production. If successful, this software would prevent
the overuse/abuse of public text services we frequently observe.
- June: Marsh reconfigured the subsystem that was originally
purchased for dlps4 for use with the Shoah Visual History Foundation
cache server.
- June: Snavely began meeting with Stephenson and Willett to
begin identifying areas of improvement for potential compliance with
developing guidelines for trusted digital repositories.
- System Performance
- Central Campus server environment
- betty.umdl (MIRLYNWeb server): Service was degraded (users
may have seen slow performance or errors) on Monday, June 14 from
11:25am to 1:15pm due to problems with the underlying WebZ software.
- coffee.umdl (Oracle server for access control): Service was
degraded (users were given warnings) from Sunday, June 6 at 8:00pm to
Monday, June 7 at 12:00pm due to an expired SSL server certificate.
- gin.umdl (WebCheckout server): No down time
- gracie.umdl (Aleph server): Down for several brief windows
from Wednesday, June 23 to Friday, June 25 to apply patches and test
hardware failover.
- myrna.umdl (Library Web server): No down time
- opal.umdl (SilverPlatter server): Taken out of service
- tequila.umdl (Metalib and SFX server): No down time
- ting,umdl (library proxy server): No down time
- North Campus server environment: All services were
unreachable on Thursday, June 10 from 2:10pm to 2:50pm and on Tuesday,
June 29 from 5:15am to 11:00am due to unplanned network outages; all
text and image services were down from Wednesday, June 30 at 11:00pm to
Thursday, July 1 at 11:00am due to a database server hang and
subsequent recovery from minor data corruption.
- dlps5.umdl (backup library proxy server): No additional
down time
- dlps6.umdl (Oracle server for access control and usage
statistics): Service was degraded (users were given warnings) from
Sunday, June 6 at
8:00pm to Monday, June 7 at 12:00pm due to an expired SSL server
certificate.
- dlps7.umdl (Numeric Data server): No additional down time
- dlps8.umdl (non-public collections): No additional down time
- dlps9.umdl (Image Services): No additional down time
- dlps10.umdl (primary web server for DLPS and public
collections): No additional down time