How to use CVS with a website

Caveat: I am not a professional web developer. Do not take my word as gospel. This is just another "cvs trick" that I've worked out, and it may well count as abuse of CVS. YMMV and all that. If you have suggestions, comments, and/or refinements, you can find me in #cvs on irc.freenode.net.

There are a lot of people out there who are building great websites, but who aren't using any sort of version-control software. They make occassional backups, and they end up with a half-dozen copies of the website on various development machines, all of 'em slightly different. When they get several people involved, they end up with issues of different people changing files, and then stomping on each other's changes.

This is all exactly the sort of thing that CVS is supposed to be good for. And since HTML is just text, it lends itself to version control using CVS.

Part of the process of setting up CVS to help you manage a website is to decide on, oddly enough, the process. Do you want to update the website as soon as you commit? Do you have a 'testing area' that you want to update right away, and a 'production site' that you will update once you're happy with the testing area?

Prerequisites

You need a CVS repository set up and accessible.

I like to use cvs-over-ssh as my means of accessing my repository remotely. This, of course, means that one of the first things I do on a box (after shutting down nearly all of the services) is to install ssh and get sshd running. If you're doing your development on the same machine as your repository and your website, you have a much simpler situation, but you should be able to figure that out yourself -- and if not, well, the solutions presented here will work just the same.

You should have ssh running among all of the machines that you'll be using, including the machine holding the CVS repository, and the machine(s) hosting the website.

You should also be using authorized_keys between the machines (at least between the machine holding the CVS repository and the machine(s) hosting the website). This means that we don't have to worry about passwords, which do sort of mess up the whole no-user-interaction automatation thing we're trying to set up.

It wouldn't hurt to make sure that your web-server doesn't traverse into CVS directories, and that either each directory of your website in CVS has an index.html file, or that your web-server doesn't display directories without an index.html file. Some people think that letting this sort of infrastructure show through indicates an unprofessional-appearing website. It's probably best not to antagonize such folks.

Process One: Immediate Update

This is for the situation where you don't want much of a delay between a cvs commit and the web-page updating with the new changes. For example, a personal or family page, a hobbiest or informational site, or a non-retail non-restricted-access site. If you're trying to conduct business or need to restrict access to your website, you probably want to add an explicit QA step.

The quick test is -- if just anyone could download your entire site and associated data, would this be a problem for you? If not, you can probably get by with the immediate update; if so, then you definately need a multi-step deployment process. If you don't care now but you will care later, then go with the multi-step approach.

Nomenclature

Let's start by defining three (logical) machines: the web-site machine (W), the cvs repository machine (R), and the local development machine (L).

Our CVSROOT environment variable should be set to :ext:user@R:/path/to/repository -- where user is our account name on R and /path/to/repository is just that (the path to your repository), and CVS_RSH has been set appropriately. The CVSROOT and CVS_RSH should be set on both L and W.

The desired flow is as follows:

You, the developer, working on L, check out a working copy of the repository (from R) with cvs checkout webmodule. You edit some of the files, and decide that you've made the changes you want and it's time to deploy. So you upload the files (to R) with cvs commit -m "Updated pages to implement change request #Q73." and then bring up your browser to check on the new website on W.

Upon commit to R, we automatically update the page on W so that when you point your browser at the new pages, they're there. Perhaps there will be a slight delay, but there shouldn't be anything significant.

So, how do we do this?

Implementation

We take advantage of the "administrative files" of CVS, specifically, the loginfo file. To make things simple, we're going to write a pretty simplistic shell-script, because that allows us to easily test the system by hand, and it offers a place to further extend your system, should you find the need.

Here's the shell-script:

#!/bin/sh
#
# For very fast machines/networks, uncomment the next line
#sleep 5

ssh user@W cd /var/www/html/yoursite \; cvs update > /dev/null 2>&1

Simple, eh?

On R, you created your repository at /path/to/repository. Check that directory, there should be a CVSROOT directory as well. We will want our script close by, so make a directory /path/to/repository/bin and put the shell-script there. I named my script "stage.sh", but it doesn't really matter what you call it.

At this point, you should test your script. On L, make a trivial change to your repository. Commit that change. Bring up a browser and verify that your change hasn't taken place, then on R, run your script. Wait a bit, and then refresh the browser. If the script ran w/o complaining or asking for a password, and the change made it to the website, then all is well, and you can proceed. If not, there's a problem that needs to be dealt with before you try to continue.

Check out the administrative module "CVSROOT" with "cvs checkout CVSROOT" (sans quotes). In the CVSROOT directory, edit the file "loginfo", and add to the bottom of the file the line (adjusting it to match where you put your script):

DEFAULT /path/to/repository/bin/stage.sh &

(The & is significant -- it lets the update run in the background, so that the current commit can complete and release the locks on the repository.)

Then test again... Bring up a browser, point it at a page in your website, make a trivial change (on L), commit, then refresh your browser. You should see the change!

Process Two: Staging and Production

This is for the situation where you want to test your changes before updating your "live" or "production" website. Any sort of website where bugs and broken pages would have a seriously detrimental effect, such as a commercial website, or one providing access to sensitive data.

The quick test applies here, in the opposite sense... if just anyone could download your entire site and associated data, this /would/ be a problem for you, thus the multi-step deployment process that lets you add a QA step to catch any problems with your website before it goes live.

Again, I must emphasize that I do not have sufficent experience in setting up or running a commercial website to be considered an expert in this domain. This is all use-at-your-own-risk stuff here, and while I am confident that used properly, it will work, this is just a guide based on how I think you could run such a site. Naturally, feedback is welcome.

Nomenclature

Let's start by adding a machine to our collection, and defining four (logical) machines: the production web-site machine (W), the staging web-site machine (S), the cvs repository machine (R), and the local development machine (L).

As before, our CVSROOT environment variable should be set to :ext:user@R:/path/to/repository -- where user is our account name on R and /path/to/repository is just that (the path to your repository), and CVS_RSH has been set appropriately. The CVSROOT and CVS_RSH should be set on L, S, and W.

The desired flow is as follows:

You, the developer, working on L, check out a working copy of the repository (from R) with cvs checkout webmodule. You edit some of the files, and decide that you've made the changes you want and it's time to deploy. So you upload the files (to R) with cvs commit -m "Updated pages to implement change request #Q73." and then bring up your browser to check on the updated page on the staging website S.

Upon commit to R, we automatically update the page on S so that when you point your browser at the new pages, they're there. Perhaps there will be a slight delay, but there shouldn't be anything significant.

When the updated page passed the QA tests, you take some sort of action to update the production website on W.

Leftover bits

Getting Started

[Importing your web-pages into the repository]

$Id: cvs_web.html,v 1.6 2005/08/04 06:23:06 stremler Exp $