Mercurial > hgbook
diff en/hgext.tex @ 224:34943a3d50d6
Start writing up extensions. Begin with inotify.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Tue, 15 May 2007 16:24:20 -0700 |
parents | 4c9b9416cd23 |
children | eef2171243e8 |
line wrap: on
line diff
--- a/en/hgext.tex Tue May 15 14:55:54 2007 -0700 +++ b/en/hgext.tex Tue May 15 16:24:20 2007 -0700 @@ -1,6 +1,208 @@ \chapter{Adding functionality with extensions} \label{chap:hgext} +While the core of Mercurial is quite complete from a functionality +standpoint, it's deliberately shorn of fancy features. This approach +of preserving simplicity keeps the software easy to deal with for both +maintainers and users. + +However, Mercurial doesn't box you in with an inflexible command set: +you can add features to it as \emph{extensions} (sometimes known as +\emph{plugins}). We've already discussed a few of these extensions in +earlier chapters. +\begin{itemize} +\item Section~\ref{sec:tour-merge:fetch} covers the \hgext{fetch} + extension; this combines pulling new changes and merging them with + local changes into a single command, \hgcmd{fetch}. +\item The \hgext{bisect} extension adds an efficient pruning search + for changes that introduced bugs, and we documented it in + chapter~\ref{sec:undo:bisect}. +\item In chapter~\ref{chap:hook}, we covered several extensions that + are useful for hook-related functionality: \hgext{acl} adds access + control lists; \hgext{bugzilla} adds integration with the Bugzilla + bug tracking system; and \hgext{notify} sends notification emails on + new changes. +\item The Mercurial Queues patch management extension is so invaluable + that it merits two chapters and an appendix all to itself. + Chapter~\ref{chap:mq} covers the basics; + chapter~\ref{chap:mq-collab} discusses advanced topics; and + appendix~\ref{chap:mqref} goes into detail on each command. +\end{itemize} + +In this chapter, we'll cover some of the other extensions that are +available for Mercurial, and briefly touch on some of the machinery +you'll need to know about if you want to write an extension of your +own. +\begin{itemize} +\item In section~\ref{sec:hgext:inotify}, we'll discuss the + possibility of \emph{huge} performance improvements using the + \hgext{inotify} extension. +\end{itemize} + +\section{Improve performance with the \hgext{inotify} extension} +\label{sec:hgext:inotify} + +Are you interested in having some of the most common Mercurial +operations run as much as a hundred times faster? Read on! + +Mercurial has great performance under normal circumstances. For +example, when you run the \hgcmd{status} command, Mercurial has to +scan almost every directory and file in your repository so that it can +display file status. Many other Mercurial commands need to do the +same work behind the scenes; for example, the \hgcmd{diff} command +uses the status machinery to avoid doing an expensive comparison +operation on files that obviously haven't changed. + +Because obtaining file status is crucial to good performance, the +authors of Mercurial have optimised this code to within an inch of its +life. However, there's no avoiding the fact that when you run +\hgcmd{status}, Mercurial is going to have to perform at least one +expensive system call for each managed file to determine whether it's +changed since the last time Mercurial checked. For a sufficiently +large repository, this can take a long time. + +To put a number on the magnitude of this effect, I created a +repository containing 150,000 managed files. I timed \hgcmd{status} +as taking ten seconds to run, even when \emph{none} of those files had +been modified. + +Many modern operating systems contain a file notification facility. +If a program signs up to an appropriate service, the operating system +will notify it every time a file of interest is created, modified, or +deleted. On Linux systems, the kernel component that does this is +called \texttt{inotify}. + +Mercurial's \hgext{inotify} extension talks to the kernel's +\texttt{inotify} component to optimise \hgcmd{status} commands. The +extension has two components. A daemon sits in the background and +receives notifications from the \texttt{inotify} subsystem. It also +listens for connections from a regular Mercurial command. The +extension modifies Mercurial's behaviour so that instead of scanning +the filesystem, it queries the daemon. Since the daemon has perfect +information about the state of the repository, it can respond with a +result instantaneously, avoiding the need to scan every directory and +file in the repository. + +Recall the ten seconds that I measured plain Mercurial as taking to +run \hgcmd{status} on a 150,000 file repository. With the +\hgext{inotify} extension enabled, the time dropped to 0.1~seconds, a +factor of \emph{one hundred} faster. + +Before we continue, please pay attention to some caveats. +\begin{itemize} +\item The \hgext{inotify} extension is Linux-specific. Because it + interfaces directly to the Linux kernel's \texttt{inotify} + subsystem, it does not work on other operating systems. +\item It should work on any Linux distribution that was released after + early~2005. Older distributions are likely to have a kernel that + lacks \texttt{inotify}, or a version of \texttt{glibc} that does not + have the necessary interfacing support. +\item Not all filesystems are suitable for use with the + \hgext{inotify} extension. Network filesystems such as NFS are a + non-starter, for example, particularly if you're running Mercurial + on several systems, all mounting the same network filesystem. The + kernel's \texttt{inotify} system has no way of knowing about changes + made on another system. Most local filesystems (e.g.~ext3, XFS, + ReiserFS) should work fine. +\end{itemize} + +The \hgext{inotify} extension is not yet shipped with Mercurial as of +May~2007, so it's a little more involved to set up than other +extensions. But the performance improvement is worth it! + +The extension currently comes in two parts: a set of patches to the +Mercurial source code, and a library of Python bindings to the +\texttt{inotify} subsystem. +\begin{note} + There are \emph{two} Python \texttt{inotify} binding libraries. One + of them is called \texttt{pyinotify}, and is packaged by some Linux + distributions as \texttt{python-inotify}. This is \emph{not} the + one you'll need, as it is too buggy and inefficient to be practical. +\end{note} +To get going, it's best to already have a functioning copy of +Mercurial installed. +\begin{note} + If you follow the instructions below, you'll be \emph{replacing} and + overwriting any existing installation of Mercurial that you might + already have, using the latest ``bleeding edge'' Mercurial code. + Don't say you weren't warned! +\end{note} +\begin{enumerate} +\item Clone the Python \texttt{inotify} binding repository. Build and + install it. + \begin{codesample4} + hg clone http://hg.kublai.com/python/inotify + cd inotify + python setup.py build --force + sudo python setup.py install --skip-build + \end{codesample4} +\item Clone the \dirname{crew} Mercurial repository. Clone the + \hgext{inotify} patch repository so that Mercurial Queues will be + able to apply patches to your cope of the \dirname{crew} repository. + \begin{codesample4} + hg clone http://hg.intevation.org/mercurial/crew + hg clone crew inotify + hg clone http://hg.kublai.com/mercurial/patches/inotify inotify/.hg/patches + \end{codesample4} +\item Make sure that you have the Mercurial Queues extension, + \hgext{mq}, enabled. If you've never used MQ, read + section~\ref{sec:mq:start} to get started quickly. +\item Go into the \dirname{inotify} repo, and apply all of the + \hgext{inotify} patches using the \hgopt{qpush}{-a} option to the + \hgcmd{qpush} command. + \begin{codesample4} + cd inotify + hg qpush -a + \end{codesample4} + If you get an error message from \hgcmd{qpush}, you should not + continue. Instead, ask for help. +\item Build and install the patched version of Mercurial. + \begin{codesample4} + python setup.py build --force + sudo python setup.py install --skip-build + \end{codesample4} +\end{enumerate} +Once you've build a suitably patched version of Mercurial, all you +need to do to enable the \hgext{inotify} extension is add an entry to +your \hgrc. +\begin{codesample2} + [extensions] + inotify = +\end{codesample2} +When the \hgext{inotify} extension is enabled, Mercurial will +automatically and transparently start the status daemon the first time +you run a command that needs status in a repository. It runs one +status daemon per repository. + +The status daemon is started silently, and runs in the background. If +you look at a list of running processes after you've enabled the +\hgext{inotify} extension and run a few commands in different +repositories, you'll thus see a few \texttt{hg} processes sitting +around, waiting for updates from the kernel and queries from +Mercurial. + +The first time you run a Mercurial command in a repository when you +have the \hgext{inotify} extension enabled, it will run with about the +same performance as a normal Mercurial command. This is because the +status daemon needs to perform a normal status scan so that it has a +baseline against which to apply later updates from the kernel. +However, \emph{every} subsequent command that does any kind of status +check should be noticeably faster on repositories of even fairly +modest size. Better yet, the bigger your repository is, the greater a +performance advantage you'll see. The \hgext{inotify} daemon makes +status operations almost instantaneous on repositories of all sizes! + +If you like, you can manually start a status daemon using the +\hgcmd{inserve} command. This gives you slightly finer control over +how the daemon ought to run. This command will of course only be +available when the \hgext{inotify} extension is enabled. + +When you're using the \hgext{inotify} extension, you should notice +\emph{no difference at all} in Mercurial's behaviour, with the sole +exception of status-related commands running a whole lot faster than +they used to. You should specifically expect that commands will not +print different output; neither should they give different results. +If either of these situations occurs, please report a bug. %%% Local Variables: %%% mode: latex