UESPWiki:Administrator Noticeboard/Archives/Upcoming Hardware Changes
< UESPWiki:Administrator Noticeboard/ArchivesThis is an archive of past UESPWiki:Administrator Noticeboard/Archives discussions. Do not edit the contents of this page, except for maintenance such as updating links. |
This post is to make everyone aware of a number of significant changes coming up on the hardware side of the site. I've been working with iWeb to provide a number of upgrades at the same time that are within our budget in a manner than permits zero downtime.
Currently these hardware changes include:
-
- Private Rack -- Moving our six servers to a private rack with two spaces to spare for future servers. The two main reasons for this are bandwidth consolidation and performance. Currently, our main Internet facing servers (squid1 and files1) have a monthly bandwidth of 1500GB. While this sounds like a lot we're very close to exceeding this amount in the past few months. Exceeding the monthly cap can be very expensive and we've had monthly bills of several hundred dollars in past years. By moving to a private rack we now get 10000GB/month which does not include any internal traffic between the servers. There should also be a small amount of performance improvement by moving to a private rack as all internal server traffic would use 1000Mbps compared to the mixed 10/100Mbps now in addition to the actual physical distance between servers (100s of ft down to several). The external connection to the Internet is also increased to 100Mbps for all servers. I'm not sure exactly how much performance increase to expect, it may not even be noticeable at the current traffic levels but it will definitely help when the traffic increases in a few months due to Skyrim.
- RAID -- Add RAID1 to the database and files server for redundancy purposes. This will significantly reduce the amount of downtime if/when we see another hard drive failure on those servers. There isn't enough benefit compared to the cost t:o similarly add it to the remaining content and squid servers. The content servers are already mirrored/load balanced and if the squid goes down there are simple ways to temporarily redirect traffic to one of the content servers.
- Hard Drive for Backup -- Increase the hard drive size of content3 for backup purposes. This will just make it easier to setup rotating backups in one place without having to juggle free disk space.
- CPU/RAM -- Upgrade all servers to the same CPU and RAM (i3 Dual Core 2.93 GHz and 4GB). More of both can always be used in some manner and it makes load balancing the content servers much easier.
- Site Cost -- The monthly cost of the site's servers will increase from the current 400$/month to over 700$/month and I'll be pre-purchasing all servers for 24 months to get a discount. Fortunately, the ad revenue has been very good this past year so we shouldn't have any problem supporting it for the next few years at least.
- Future Thoughts -- Hardware wise this setup should be good for a significant increase in traffic. A conservative guess would say at least 5x the current traffic which is unlikely to occur in the near future. Given the trends of past ES title releases I would expect a traffic peak of about 2x at Skyrim's release at the end of the year compared to the current traffic (so in the order of 2 million page views/day). We still have two free spaces in the private stack and more servers outside the stack can always be obtained as needed.
The plan is to get six completely new servers installed in the private rack which will let me setup and switch over each server one by one with minimal down time. I'm aiming for zero downtime and only a limited amount of time when the site is set to read-only (partially or fully) so the change to the new servers should be transparent to most visitors. The process of switching over likely won't begin for a few weeks and I'll post something here whenever I plan on setting something to read-only. I have an arbitrary amount of time to keep the old servers so I always have the option to switch back to the original server should issues pop up.
If anyone has any requests for additional hardware or features just let me know in the next few days since the deal with iWeb hasn't been finalized yet. -- Daveh 17:19, 28 January 2011 (UTC)
- Sounds brilliant! The private rack should make a big difference - anybody who doesn't know our current setup should look at the Servers page and imagine how much data is zipping around between the different boxes. I imagine the RAM upgrade will make a big difference too. rpeh •T•C•E• 17:42, 28 January 2011 (UTC)
Update -- 7 Feb 2011: I finalized the deal with iWeb last week and received the servers on Friday. I'll be setting up the servers this week and beginning to switch them over in roughly the following order:
-
- files1 -- Best to do this first as it makes setting up the new content servers easier. There will be a short window of time where uploads will be disabled.
- content3 -- Nothing is currently relying on this host so it can be switched over easily.
- content1/2 -- These are similarly easy to switch by just updating the Squid cache to point to the new hosts.
- squid1 -- Once setup it is a simple matter of changing the DNS entry for www.uesp.net and letting the change propagate naturally over the next 24 hours.
- db1 -- The trickiest one that will require a short window with the wiki/forum set to read only when the switch is made.
I'll post here when each server switch is occurring and any issues that appear can be posted here as well. I'll make a server switch every couple of days to permit any issues to reveal themselves and make it easier to diagnose their source. -- Daveh 01:08, 8 February 2011 (UTC)
Issue Note: 9 Feb 2011 -- In case anyone just noticed some site issues (images/skins not loading, sessions invalid, etc...) that was just me playing around with NFS on the new servers and files1 which caused the NFS shares on content1/2/3 to become invalid. All should be well now. -- Daveh 01:43, 10 February 2011 (UTC)
Files1 Update: 14 Feb 2011 -- I plan on beginning the switch for files1 to the new hardware sometime tonight if everything goes well. I'll lock Wiki uploads for a short period but that should be the only noticeable effect. I'll update here as it goes and if anyone notices any issues you can record them here as well. -- Daveh 22:35, 14 February 2011 (UTC)
- Wiki uploads disabled now. -- Daveh 23:39, 14 February 2011 (UTC)
- Re-enabled uploads. There is an issue with content1/2 accessing the lockd server on newfiles1 I have to figure out first. -- Daveh 00:02, 15 February 2011 (UTC)
- Figured out the issue and disabled Wiki uploads again. PHP session files on content1/2 are running from newfiles1. -- Daveh 01:12, 15 February 2011 (UTC)
- Switched shares on content1/2 to newfiles1. Switched directories on files1 to mounts from newfiles1. -- Daveh 01:15, 15 February 2011 (UTC)
- Changed DNS entries for maps/skins/images/files to point to the newfiles1 server. Propagation will take ~24 hours to fully switch. -- Daveh 01:18, 15 February 2011 (UTC)
- Re-enabled Wiki uploads and did a quick test. Everything seems to be in order but let me know if you see anything out of the ordinary. -- Daveh 01:32, 15 February 2011 (UTC)
Content2 Update: 17 Feb 2011 -- I plan on switching over content2 to the new server tonight. This will be a very simple procedure which just requires squid1 to be restarted so there should be no site interruption besides a few seconds of downtime. -- Daveh 19:33, 17 February 2011 (UTC)
- Content2 switched over. I noted a few minor issues in the error log which may be new or only just visible now that I'll be watching. -- Daveh 01:00, 18 February 2011 (UTC)
Content1 Update: 19 Feb 2011 -- Content1 was just switched to the new server. -- Daveh 15:44, 19 February 2011 (UTC)
Squid1 Update: 21 Feb 2011 -- I plan on switching to the new squid1 server sometime today. Barring any issue there should be no service interruption. -- Daveh 16:32, 21 February 2011 (UTC)
- DNS entries for www.uesp.net changed to point to the new squid1 server. It will slowly begin to take over all requests in the next day or two. -- Daveh 17:00, 21 February 2011 (UTC)
Db1 Update: 23 Feb 2011 -- I plan on switching db1 tonight sometime if everything works out. This will require the Wiki and forums to be put into read only mode for a while (up to an hour at most) while the switch is made. -- Daveh 17:07, 23 February 2011 (UTC)
- Locking Wiki/forums shortly. -- Daveh 00:29, 24 February 2011 (UTC)
- Test edit from content3. -- Daveh 00:41, 24 February 2011 (UTC)
- Test edit from content2. -- Daveh 00:42, 24 February 2011 (UTC)
- Test edit from content1. -- Daveh 00:42, 24 February 2011 (UTC)
- Wiki/forums re-enabled. -- Daveh 00:44, 24 February 2011 (UTC)
- Changed database host address for all secondary services (maps, EQWiki, Blog, DaveWiki). Shut down old db1 and tested all sites to make sure they are still working. Barring any issues we should be good. -- Daveh 01:02, 24 February 2011 (UTC)
Summary: 28 Feb 2011 -- All servers have been successfully switched over and appear to be running fine so I canceled the old servers yesterday. There is still a lot of work left on the new servers to setup monitoring, backups, documentation, and other minor services. On a slightly related note the site had its highest daily traffic ever this weekend at around 950k Wiki page views (or around 8 million files) each day on Saturday and Sunday beating the previous record by a decent 10%. Despite the high traffic volume the new servers didn't appear to notice it which means we should be good for the upcoming traffic spike for Skyrim. -- Daveh 01:13, 1 March 2011 (UTC)