Personal tools
  • Help

nitrc:NITRC System Backup and Archiving Process

From NITRC Wiki

Jump to: navigation, search

Contents

NITRC's Production Environment

Monitored 24x7x365

The www.nitrc.org Production environment is housed in the San Diego Supercomputing Center (SDSC) located in the University of California, San Diego. This production level machine room has hotline services available to the NITRC Administrator 24x7x365 to allow the reporting of critical system (i.e. hardware, system software, and network) problems. When problem tickets are opened, an established triage system assigns the appropriate priority to the ticket. This production facility also has room wide UPS and secured key card access. The UCSD systems operation team is committed to respond within 4 business hours of receipt of the problem ticket. Should successful resolution not occur within 8 business hours, based on the NITRC Administrator’s priority level, an escalation procedure will be implemented whereby NITRC Administrators and Technical Leads at UCSD will coordinate activities to ensure resolution of the problem.

Backup and Archiving Process

Daily backups of the Production environment include backups of the Gforge and Wiki databases, CVS and SCM repositories, and file uploads. These backups are kept for 14 days on a separate primary backup server. Every week, a backup is copied to a server, not co-located with the primary backup server, for longer term storage at the Atkinson Hall Facility, where UCSD maintains a large farm of replicated data storage.

Planned and Unplanned Outages

Outages tend to be transient, and re-population of a Web site of the scope of NITRC can be done within 6 hours. In the rare event of a severe outage such as a network failure due to unplanned road construction, the Staging environment will be utilized as Production environment with databases re-populated from the latest backup.

NITRC's Staging Environment

The NITRC team also maintains a Staging/Acceptance Test environment that is located in a disparate machine room facility with secure access at UCSD. UCSD system administrators troubleshoot issues remotely when brought to their attention from a NITRC Administrator. The preferred method for reporting of problems is via problem tickets; UCSD is committed to respond within 8 business hours of receipt of the problem ticket. Should successful resolution not occur within one (1) business day from initial request, based on the NITRC Administrator priority level, an escalation procedure may be implemented.

NITRC Source Code

The NITRC source code repository is housed on a CVS service at UCSD. Daily backups of the entire CVS tree are done. These backups are kept for 14 days on a separate primary backup server. Every week, a backup is copied to a server, not co-located with the primary backup server, for longer term storage. In the event of an outage, the entire CVS directory from the latest backup can be retrieved.

NITRC Systems

All NITRC environments use UCSD Rocks, which is UCSD's cluster deployment infrastructure used to make installation and configuration of the hundreds of servers manageable, and scalable. Rocks allows the NITRC team to develop install procedures for servers with specific functionality, allowing them to be deployed in a highly automated manner. Therefore, backups of entire systems are not required. With a server failure (i.e. software failure or new server replacing failed hardware), the entire server can be re-built utilizing these automated methods (i.e. Rocks base system install and customized NITRC Gforge installation). Content on the servers can then be refreshed from application specific backups.

Powered by MediaWiki
  • This page was last modified 17:58, 23 September 2008.
  • This page has been accessed 2,917 times.
  •