nitrc:NITRC System Backup and Archiving Process
From NITRC Wiki
Contents |
NITRC's Production Environment
Monitored 24x7x365
The www.nitrc.org Production environment is housed in the San Diego Supercomputing Center (SDSC) located in the University of California, San Diego. This production level machine room has hotline services available to the NITRC Administrator 24x7x365 to allow the reporting of critical system (i.e. hardware, system software, and network) problems. When problem tickets are opened, an established triage system assigns the appropriate priority to the ticket. This production facility also has room wide UPS and secured key card access. The UCSD systems operation team is committed to respond within 4 business hours of receipt of the problem ticket. Should successful resolution not occur within 8 business hours, based on the NITRC Administrator’s priority level, an escalation procedure will be implemented whereby NITRC Administrators and Technical Leads at UCSD will coordinate activities to ensure resolution of the problem.
Backup and Archiving Process
Daily backups of the Production environment include backups of the Gforge and Wiki databases, CVS and SCM repositories, and file uploads. These backups are kept for 14 days on a separate primary backup server. Every week, a backup is copied to a server, not co-located with the primary backup server, for longer term storage at the Atkinson Hall Facility, where UCSD maintains a large farm of replicated data storage.
Planned and Unplanned Outages
Outages tend to be transient, and re-population of a Web site of the scope of NITRC can be done within 6 hours. In the rare event of a severe outage such as a network failure due to unplanned road construction, the Staging environment will be utilized as Production environment with databases re-populated from the latest backup.
NITRC's Staging Environment
The NITRC team also maintains a Staging/Acceptance Test environment that is located in a disparate machine room facility with secure access at UCSD. UCSD system administrators troubleshoot issues remotely when brought to their attention from a NITRC Administrator. The preferred method for reporting of problems is via problem tickets; UCSD is committed to respond within 8 business hours of receipt of the problem ticket. Should successful resolution not occur within one (1) business day from initial request, based on the NITRC Administrator priority level, an escalation procedure may be implemented.
NITRC Source Code
The NITRC source code repository is housed on a CVS service at UCSD. Daily backups of the entire CVS tree are done. These backups are kept for 14 days on a separate primary backup server. Every week, a backup is copied to a server, not co-located with the primary backup server, for longer term storage. In the event of an outage, the entire CVS directory from the latest backup can be retrieved.
NITRC Systems
All NITRC environments use UCSD Rocks, which is UCSD's cluster deployment infrastructure used to make installation and configuration of the hundreds of servers manageable, and scalable. Rocks allows the NITRC team to develop install procedures for servers with specific functionality, allowing them to be deployed in a highly automated manner. Therefore, backups of entire systems are not required. With a server failure (i.e. software failure or new server replacing failed hardware), the entire server can be re-built utilizing these automated methods (i.e. Rocks base system install and customized NITRC Gforge installation). Content on the servers can then be refreshed from application specific backups.




