# HG changeset patch # User Bryan O'Sullivan # Date 1240558025 25200 # Node ID 1a0a78e197c3ef91f2b8dbada59421d2ef729bd5 # Parent ef53d025f4100d5ce4d159aad97b651b43a9d7b4 Incorporate feedback from Greg Lindahl. diff -r ef53d025f410 -r 1a0a78e197c3 en/ch04-daily.xml --- a/en/ch04-daily.xml Thu Apr 23 22:24:02 2009 -0700 +++ b/en/ch04-daily.xml Fri Apr 24 00:27:05 2009 -0700 @@ -1,6 +1,6 @@ - + Mercurial in daily use @@ -673,6 +673,162 @@ track of our progress with each file as we go. + + + More useful diffs + + The default output of the hg + diff command is backwards compatible with the + regular diff command, but this has some + drawbacks. + + Consider the case where we use hg + rename to rename a file. + + &interaction.ch04-diff.rename.basic; + + The output of hg diff above + obscures the fact that we simply renamed a file. The hg diff command accepts an option, + or , to use a newer + diff format that displays such information in a more readable + form. + + &interaction.ch04-diff.rename.git; + + This option also helps with a case that can otherwise be + confusing: a file that appears to be modified according to + hg status, but for which + hg diff prints nothing. This + situation can arise if we change the file's execute + permissions. + + &interaction.ch04-diff.chmod; + + The normal diff command pays no attention + to file permissions, which is why hg + diff prints nothing by default. If we supply it + with the option, it tells us what really + happened. + + &interaction.ch04-diff.chmod.git; + + + + Which files to manage, and which to avoid + + Revision control systems are generally best at managing text + files that are written by humans, such as source code, where the + files do not change much from one revision to the next. Some + centralized revision control systems can also deal tolerably + well with binary files, such as bitmap images. + + For instance, a game development team will typically manage + both its source code and all of its binary assets (e.g. geometry + data, textures, map layouts) in a revision control + system. + + Because it is usually impossible to merge two conflicting + modifications to a binary file, centralized systems often + provide a file locking mechanism that allow a user to say + I am the only person who can edit this + file. + + Compared to a centralized system, a distributed revision + control system changes some of the factors that guide decisions + over which files to manage and how. + + For instance, a distributed revision control system cannot, + by its nature, offer a file locking facility. There is thus no + built-in mechanism to prevent two people from making conflicting + changes to a binary file. If you have a team where several + people may be editing binary files frequently, it may not be a + good idea to use Mercurial&emdash;or any other distributed + revision control system&emdash;to manage those files. + + When storing modifications to a file, Mercurial usually + saves only the differences between the previous and current + versions of the file. For most text files, this is extremely + efficient. However, some files (particularly binary files) are + laid out in such a way that even a small change to a file's + logical content results in many or most of the bytes inside the + file changing. For instance, compressed files are particularly + susceptible to this. If the differences between each successive + version of a file are always large, Mercurial will not be able + to store the file's revision history very efficiently. This can + affect both local storage needs and the amount of time it takes + to clone a repository. + + To get an idea of how this could affect you in practice, + suppose you want to use Mercurial to manage an OpenOffice + document. OpenOffice stores documents on disk as compressed zip + files. Edit even a single letter of your document in OpenOffice, + and almost every byte in the entire file will change when you + save it. Now suppose that file is 2MB in size. Because most of + the file changes every time you save, Mercurial will have to + store all 2MB of the file every time you commit, even though + from your perspective, perhaps only a few words are changing + each time. A single frequently-edited file that is not friendly + to Mercurial's storage assumptions can easily have an outsized + effect on the size of the repository. + + Even worse, if both you and someone else edit the OpenOffice + document you're working on, there is no useful way to merge your + work. In fact, there isn't even a good way to tell what the + differences are between your respective changes. + + There are thus a few clear recommendations about specific + kinds of files to be very careful with. + + + + Files that are very large and incompressible, e.g. ISO + CD-ROM images, will by virtue of sheer size make clones over + a network very slow. + + + Files that change a lot from one revision to the next + may be expensive to store if you edit them frequently, and + conflicts due to concurrent edits may be difficult to + resolve. + + + + + + Backups and mirroring + + Since Mercurial maintains a complete copy of history in each + clone, everyone who uses Mercurial to collaborate on a project + can potentially act as a source of backups in the event of a + catastrophe. If a central repository becomes unavailable, you + can construct a replacement simply by cloning a copy of the + repository from one contributor, and pulling any changes they + may not have seen from others. + + It is simple to use Mercurial to perform off-site backups + and remote mirrors. Set up a periodic job (e.g. via the + cron command) on a remote server to pull + changes from your master repositories every hour. This will + only be tricky in the unlikely case that the number of master + repositories you maintain changes frequently, in which case + you'll need to do a little scripting to refresh the list of + repositories to back up. + + If you perform traditional backups of your master + repositories to tape or disk, and you want to back up a + repository named myrepo. Use hg + clone -U myrepo myrepo.bak to create a + clone of myrepo before you start your + backups. The option doesn't check out a + working directory after the clone completes, since that would be + superfluous and make the backup take longer. + + If you then back up myrepo.bak instead + of myrepo, you will be guaranteed to have a + consistent snapshot of your repository that won't be pushed to + by an insomniac developer in mid-backup. +