Maintenance¶
This guide covers ongoing maintenance procedures for LabID production deployments.
Prerequisites
Before setting up maintenance procedures, ensure you have completed the full Installation, Web Server, and Configuration setup.
Quick Reference Commands¶
Service Status Check¶
Updates and Upgrades¶
Update LabID Server¶
For detailed server update instructions including database backups and migration procedures, see the dedicated update guide.
Quick update process:
sudo -u labid -i
cd /opt/labid
git pull origin production # or if you prefer specific tag/commit
conda env update -n labid -f requirements/environment.yml --prune
conda activate labid
# it is recommended to take a database dump before updating the server
# if you have to rollback to the previous version then having the db backup will help
cd labid
./scripts/migrations/update.sh
sudo systemctl restart labid
Update LabID UI¶
For detailed UI update instructions, see the dedicated update guide.
Quick update process:
sudo -u labid -i
cd /opt/labid-ui
git pull origin production # or specific tag/commit
# Update dependencies if package.json changed
npm install
# Rebuild the UI
npm run build
# No restart needed - nginx serves static files from /opt/labid-ui/dist/
Security Updates¶
- Keep system packages updated:
sudo apt update && sudo apt upgrade - Monitor Django security releases
- Update Python dependencies regularly
- Rotate secrets and API keys periodically
Backup Strategy¶
Database Backups¶
Automated Daily Backups¶
Create a backup script /opt/labid/scripts/backup_db.sh:
#!/bin/bash
BACKUP_DIR="/backup/labid"
DATE=$(date +%Y%m%d_%H%M%S)
DB_NAME="labid_prod"
DB_USER="labid"
# Create backup directory if it doesn't exist
mkdir -p $BACKUP_DIR
# Create database backup
pg_dump -h localhost -U $DB_USER $DB_NAME | gzip > "$BACKUP_DIR/labid_$DATE.sql.gz"
# Retention: keep 30 days of daily backups
find $BACKUP_DIR -name "labid_*.sql.gz" -mtime +30 -delete
# Log backup completion
echo "$(date): Database backup completed - labid_$DATE.sql.gz" >> /var/log/labid/backup.log
Make it executable and set up cron job:
chmod +x /opt/labid/scripts/backup_db.sh
# Add to crontab for daily 2 AM backups
sudo crontab -e
# Add: 0 2 * * * /opt/labid/scripts/backup_db.sh
Manual Backup¶
For manual backups before updates:
# Pre-update backup
pg_dump -h localhost -U labid labid_prod | gzip > /backup/labid_$(date +%Y%m%d).bfupdate.backup.gz
# Post-update backup
pg_dump -h localhost -U labid labid_prod | gzip > /backup/labid_$(date +%Y%m%d).afupdate.backup.gz
File System Backups¶
Storage Directory Backup¶
#!/bin/bash
# /opt/labid/scripts/backup_files.sh
BACKUP_DIR="/backup/labid-files"
SOURCE_DIR="/opt/labid/labid/storage"
DATE=$(date +%Y%m%d)
# Incremental backup using rsync
rsync -av --link-dest="$BACKUP_DIR/latest" "$SOURCE_DIR/" "$BACKUP_DIR/$DATE/"
# Update latest symlink
rm -f "$BACKUP_DIR/latest"
ln -s "$DATE" "$BACKUP_DIR/latest"
# Retention: keep 7 days of file backups
find $BACKUP_DIR -maxdepth 1 -type d -name "2*" -mtime +7 -exec rm -rf {} \;
Configuration Backup¶
# Backup configuration files
tar -czf /backup/labid-config_$(date +%Y%m%d).tar.gz \
/opt/labid/labid/.env \
/etc/systemd/system/labid*.service \
/etc/nginx/sites-available/labid.conf \
/etc/logrotate.d/labid
Monitoring and Health Checks¶
System Health Monitoring¶
Create a health check script /opt/labid/scripts/health_check.sh:
#!/bin/bash
LOG_FILE="/var/log/labid/health_check.log"
ERROR_COUNT=0
# Function to log with timestamp
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S'): $1" >> $LOG_FILE
}
# Check database connectivity
if ! sudo -u labid psql -h localhost -U labid -d labid_prod -c "SELECT 1;" > /dev/null 2>&1; then
log_message "ERROR: Database connection failed"
((ERROR_COUNT++))
else
log_message "INFO: Database connection OK"
fi
# Check RabbitMQ status
if ! systemctl is-active --quiet rabbitmq-server; then
log_message "ERROR: RabbitMQ service not running"
((ERROR_COUNT++))
else
log_message "INFO: RabbitMQ service OK"
fi
# Check Django application
if ! curl -f http://localhost:8000/api/health/ > /dev/null 2>&1; then
log_message "ERROR: Django application not responding"
((ERROR_COUNT++))
else
log_message "INFO: Django application OK"
fi
# Check Celery workers
WORKER_COUNT=$(systemctl list-units --type=service --state=running | grep labid-celery-worker | wc -l)
if [ $WORKER_COUNT -lt 3 ]; then
log_message "ERROR: Only $WORKER_COUNT Celery workers running (expected 3+)"
((ERROR_COUNT++))
else
log_message "INFO: Celery workers OK ($WORKER_COUNT running)"
fi
# Check disk space
DISK_USAGE=$(df /opt/labid | awk 'NR==2 {print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 85 ]; then
log_message "WARNING: Disk usage high ($DISK_USAGE%)"
fi
# Send alert if errors detected
if [ $ERROR_COUNT -gt 0 ]; then
log_message "ALERT: $ERROR_COUNT errors detected"
# Send email alert (configure sendmail or similar)
# echo "LabID Health Check Failed: $ERROR_COUNT errors" | mail -s "LabID Alert" admin@your-labid.com
fi
exit $ERROR_COUNT
Set up monitoring cron job:
Log Management¶
Log Rotation Configuration¶
Ensure log rotation is properly configured (/etc/logrotate.d/labid):
/var/log/labid/*.log {
daily
missingok
rotate 30
compress
delaycompress
notifempty
create 644 labid labid
postrotate
systemctl reload labid-gunicorn > /dev/null 2>&1 || true
endrotate
}
Log Monitoring¶
Monitor critical log patterns:
# Monitor for errors in Django logs
tail -f /var/log/labid/django.log | grep -i error
# Monitor Celery worker logs
journalctl -u labid-celery-worker -f
# Monitor Nginx errors
tail -f /var/log/nginx/error.log
Performance Optimization¶
Database Maintenance¶
Regular VACUUM and ANALYZE¶
Create maintenance script /opt/labid/scripts/db_maintenance.sh:
#!/bin/bash
sudo -u labid psql -h localhost -U labid -d labid_prod << EOF
VACUUM ANALYZE;
REINDEX DATABASE labid_prod;
EOF
Schedule weekly maintenance:
Connection Pooling¶
Monitor database connections:
Consider implementing pgbouncer for connection pooling in high-load environments.
Disaster Recovery¶
Backup Restoration¶
Database Restoration¶
# Stop application services
sudo systemctl stop labid
# Restore database from backup
sudo -u postgres dropdb labid_prod
sudo -u postgres createdb labid_prod
sudo -u postgres psql -d labid_prod -c "CREATE USER labid WITH PASSWORD 'password';"
sudo -u postgres psql -d labid_prod -c "GRANT ALL PRIVILEGES ON DATABASE labid_prod TO labid;"
# Restore from backup
zcat /backup/labid_20240201.sql.gz | sudo -u labid psql -h localhost -U labid -d labid_prod
# Start services
sudo systemctl start labid
File Restoration¶
# Restore storage files
rsync -av /backup/labid-files/latest/ /opt/labid/labid/storage/
# Fix permissions
sudo chown -R labid:labid /opt/labid/labid/storage/
Service Recovery¶
Restart All Services¶
# Complete service restart
sudo systemctl restart rabbitmq-server
sudo systemctl restart postgresql
sudo systemctl restart labid
sudo systemctl restart nginx
Reset Celery Workers¶
# Clear Celery queues if corrupted
sudo systemctl stop labid-celery-*
sudo -u labid rabbitmqctl purge_queue celery
sudo systemctl start labid-celery-*
Monitoring and Alerting¶
Sentry Integration¶
LabID includes built-in support for Sentry error tracking and performance monitoring. To enable Sentry:
-
Set up Sentry project at sentry.io and obtain your DSN
-
Configure Sentry in your
.envfile: -
Restart LabID services:
Sentry will automatically:
- Capture application errors and exceptions
- Monitor performance metrics
- Track releases using LabID version information
- Include request context and user data for debugging
For more advanced Sentry configuration options, refer to the Sentry Python documentation.