Mercurial > hgbook
diff en/filenames.tex @ 133:1e013fbe35f7
Lots of filename related content. A little more command reference
work.
Added a script to make sure commands are exhaustively documented.
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Fri, 29 Dec 2006 17:54:14 -0800 |
parents | |
children | 7f07aca44938 |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/filenames.tex Fri Dec 29 17:54:14 2006 -0800 @@ -0,0 +1,306 @@ +\chapter{File names and pattern matching} +\label{chap:names} + +Mercurial provides mechanisms that let you work with file names in a +consistent and expressive way. + +\section{Simple file naming} + +Mercurial uses a unified piece of machinery ``under the hood'' to +handle file names. Every command behaves uniformly with respect to +file names. The way in which commands work with file names is as +follows. + +If you explicitly name real files on the command line, Mercurial works +with exactly those files, as you would expect. +\interaction{filenames.files} + +When you provide a directory name, Mercurial will interpret this as +``operate on every file in this directory and its subdirectories''. +Mercurial traverses the files and subdirectories in a directory in +alphabetical order. When it encounters a subdirectory, it will +traverse that subdirectory before continuing with the current +directory. +\interaction{filenames.dirs} + +\section{Running commands without any file names} + +Mercurial's commands that work with file names have useful default +behaviours when you invoke them without providing any file names or +patterns. What kind of behaviour you should expect depends on what +the command does. Here are a few rules of thumb you can use to +predict what a command is likely to do if you don't give it any names +to work with. +\begin{itemize} +\item Most commands will operate on the entire working directory. + This is what the \hgcmd{add} command does, for example. +\item If the command has effects that are difficult or impossible to + reverse, it will force you to explicitly provide at least one name + or pattern (see below). This protects you from accidentally + deleting files by running \hgcmd{remove} with no arguments, for + example. +\end{itemize} + +It's easy to work around these default behaviours if they don't suit +you. If a command normally operates on the whole working directory, +you can invoke it on just the current directory and its subdirectories +by giving it the name ``\dirname{.}''. +\interaction{filenames.wdir-subdir} + +Along the same lines, some commands normally print file names relative +to the root of the repository, even if you're invoking them from a +subdirectory. Such a command will print file names relative to your +subdirectory if you give it explicit names. Here, we're going to run +\hgcmd{status} from a subdirectory, and get it to operate on the +entire working directory while printing file names relative to our +subdirectory, by passing it the output of the \hgcmd{root} command. +\interaction{filenames.wdir-relname} + +\section{Telling you what's going on} + +The \hgcmd{add} example in the preceding section illustrates something +else that's helpful about Mercurial commands. If a command operates +on a file that you didn't name explicitly on the command line, it will +usually print the name of the file, so that you will not be surprised +what's going on. + +The principle here is of \emph{least surprise}. If you've exactly +named a file on the command line, there's no point in repeating it +back at you. If Mercurial is acting on a file \emph{implicitly}, +because you provided no names, or a directory, or a pattern (see +below), it's safest to tell you what it's doing. + +For commands that behave this way, you can silence them using the +\hggopt{-q} option. You can also get them to print the name of every +file, even those you've named explicitly, using the \hggopt{-v} +option. + +\section{Using patterns to identify files} + +In addition to working with file and directory names, Mercurial lets +you use \emph{patterns} to identify files. Mercurial's pattern +handling is expressive. + +On Unix-like systems (Linux, MacOS, etc.), the job of matching file +names to patterns normally falls to the shell. On these systems, you +must explicitly tell Mercurial that a name is a pattern. On Windows, +the shell does not expand patterns, so Mercurial will automatically +identify names that are patterns, and expand them for you. + +To provide a pattern in place of a regular name on the command line, +the mechanism is simple: +\begin{codesample2} + syntax:patternbody +\end{codesample2} +That is, a pattern is identified by a short text string that says what +kind of pattern this is, followed by a colon, followed by the actual +pattern. + +Mercurial supports two kinds of pattern syntax. The most frequently +used is called \texttt{glob}; this is the same kind of pattern +matching used by the Unix shell, and should be familiar to Windows +command prompt users, too. + +When Mercurial does automatic pattern matching on Windows, it uses +\texttt{glob} syntax. You can thus omit the ``\texttt{glob:}'' prefix +on Windows, but it's safe to use it, too. + +The \texttt{re} syntax is more powerful; it lets you specify patterns +using regular expressions, also known as regexps. + +By the way, in the examples that follow, notice that I'm careful to +wrap all of my patterns in quote characters, so that they won't get +expanded by the shell before Mercurial sees them. + +\subsection{Shell-style \texttt{glob} patterns} + +This is an overview of the kinds of patterns you can use when you're +matching on glob patterns. + +The ``\texttt{*}'' character matches any string, within a single +directory. +\interaction{filenames.glob.star} + +The ``\texttt{**}'' pattern matches any string, and crosses directory +boundaries. It's not a standard Unix glob token, but it's accepted by +several popular Unix shells, and is very useful. +\interaction{filenames.glob.starstar} + +The ``\texttt{?}'' pattern matches any single character. +\interaction{filenames.glob.question} + +The ``\texttt{[}'' character begins a \emph{character class}. This +matches any single character within the class. The class ends with a +``\texttt{]}'' character. A class may contain multiple \emph{range}s +of the form ``\texttt{a-f}'', which is shorthand for +``\texttt{abcdef}''. +\interaction{filenames.glob.range} +If the first character after the ``\texttt{[}'' in a character class +is a ``\texttt{!}'', it \emph{negates} the class, making it match any +single character not in the class. + +A ``\texttt{\{}'' begins a group of subpatterns, where the whole group +matches if any subpattern in the group matches. The ``\texttt{,}'' +character separates subpatterns, and ``\texttt{\}}'' ends the group. +\interaction{filenames.glob.group} + +\subsubsection{Watch out!} + +Don't forget that if you want to match a pattern in any directory, you +should not be using the ``\texttt{*}'' match-any token, as this will +only match within one directory. Instead, use the ``\texttt{**}'' +token. This small example illustrates the difference between the two. +\interaction{filenames.glob.star-starstar} + +\subsection{Regular expression matching with \texttt{re} patterns} + +Mercurial accepts the same regular expression syntax as the Python +programming language (it uses Python's regexp engine internally). +This is based on the Perl language's regexp syntax, which is the most +popular dialect in use (it's also used in Java, for example). + +I won't discuss Mercurial's regexp dialect in any detail here, as +regexps are not often used. Perl-style regexps are in any case +already exhaustively documented on a multitude of web sites, and in +many books. Instead, I will focus here on a few things you should +know if you find yourself needing to use regexps with Mercurial. + +A regexp is matched against an entire file name, relative to the root +of the repository. In other words, even if you're already in +subbdirectory \dirname{foo}, if you want to match files under this +directory, your pattern must start with ``\texttt{foo/}''. + +One thing to note, if you're familiar with Perl-style regexps, is that +Mercurial's are \emph{rooted}. That is, a regexp starts matching +against the beginning of a string; it doesn't look for a match +anywhere within the string it. To match anywhere in a string, start +your pattern with ``\texttt{.*}''. + +\section{Filtering files} + +Not only does Mercurial give you a variety of ways to specify files; +it lets you further winnow those files using \emph{filters}. Commands +that work with file names accept two filtering options. +\begin{itemize} +\item \hggopt{-I}, or \hggopt{--include}, lets you specify a pattern + that file names must match in order to be processed. +\item \hggopt{-X}, or \hggopt{--exclude}, gives you a way to + \emph{avoid} processing files, if they match this pattern. +\end{itemize} +You can provide multiple \hggopt{-I} and \hggopt{-X} options on the +command line, and intermix them as you please. Mercurial interprets +the patterns you provide using glob syntax by default (but you can use +regexps if you need to). + +You can read a \hggopt{-I} filter as ``process only the files that +match this filter''. +\interaction{filenames.filter.include} +The \hggopt{-X} filter is best read as ``process only the files that +don't match this pattern''. +\interaction{filenames.filter.exclude} + +\section{Ignoring unwanted files and directories} + +XXX. + +\section{Case sensitivity} +\label{sec:names:case} + +If you're working in a mixed development environment that contains +both Linux (or other Unix) systems and Macs or Windows systems, you +should keep in the back of your mind the knowledge that they treat the +case (``N'' versus ``n'') of file names in incompatible ways. This is +not very likely to affect you, and it's easy to deal with if it does, +but it could surprise you if you don't know about it. + +Operating systems and filesystems differ in the way they handle the +\emph{case} of characters in file and directory names. There are +three common ways to handle case in names. +\begin{itemize} +\item Completely case insensitive. Uppercase and lowercase versions + of a letter are treated as identical, both when creating a file and + during subsequent accesses. This is common on older DOS-based + systems. +\item Case preserving, but insensitive. When a file or directory is + created, the case of its name is stored, and can be retrieved and + displayed by the operating system. When an existing file is being + looked up, its case is ignored. This is the standard arrangement on + Windows and MacOS. The names \filename{foo} and \filename{FoO} + identify the same file. This treatment of uppercase and lowercase + letters as interchangeable is also referred to as \emph{case + folding}. +\item Case sensitive. The case of a name is significant at all times. + The names \filename{foo} and {FoO} identify different files. This + is the way Linux and Unix systems normally work. +\end{itemize} + +On Unix-like systems, it is possible to have any or all of the above +ways of handling case in action at once. For example, if you use a +USB thumb drive formatted with a FAT32 filesystem on a Linux system, +Linux will handle names on that filesystem in a case preserving, but +insensitive, way. + +\subsection{Safe, portable repository storage} + +Mercurial's repository storage mechanism is \emph{case safe}. It +translates file names so that they can be safely stored on both case +sensitive and case insensitive filesystems. This means that you can +use normal file copying tools to transfer a Mercurial repository onto, +for example, a USB thumb drive, and safely move that drive and +repository back and forth between a Mac, a PC running Windows, and a +Linux box. + +\subsection{Detecting case conflicts} + +When operating in the working directory, Mercurial honours the naming +policy of the filesystem where the working directory is located. If +the filesystem is case preserving, but insensitive, Mercurial will +treat names that differ only in case as the same. + +An important aspect of this approach is that it is possible to commit +a changeset on a case sensitive (typically Linux or Unix) filesystem +that will cause trouble for users on case insensitive (usually Windows +and MacOS) users. If a Linux user commits changes to two files, one +named \filename{myfile.c} and the other named \filename{MyFile.C}, +they will be stored correctly in the repository. And in the working +directories of other Linux users, they will be correctly represented +as separate files. + +If a Windows or Mac user pulls this change, they will not initially +have a problem, because Mercurial's repository storage mechanism is +case safe. However, once they try to \hgcmd{update} the working +directory to that changeset, or \hgcmd{merge} with that changeset, +Mercurial will spot the conflict between the two file names that the +filesystem would treat as the same, and forbid the update or merge +from occurring. + +\subsection{Fixing a case conflict} + +If you are using Windows or a Mac in a mixed environment where some of +your collaborators are using Linux or Unix, and Mercurial reports a +case folding conflict when you try to \hgcmd{update} or \hgcmd{merge}, +the procedure to fix the problem is simple. + +Just find a nearby Linux or Unix box, clone the problem repository +onto it, and use Mercurial's \hgcmd{rename} command to change the +names of any offending files or directories so that they will no +longer cause case folding conflicts. Commit this change, \hgcmd{pull} +or \hgcmd{push} it across to your Windows or MacOS system, and +\hgcmd{update} to the revision with the non-conflicting names. + +The changeset with case-conflicting names will remain in your +project's history, and you still won't be able to \hgcmd{update} your +working directory to that changeset on a Windows or MacOS system, but +you can continue development unimpeded. + +\begin{note} + Prior to version~0.9.3, Mercurial did not use a case safe repository + storage mechanism, and did not detect case folding conflicts. If + you are using an older version of Mercurial on Windows or MacOS, I + strongly recommend that you upgrade. +\end{note} + +%%% Local Variables: +%%% mode: latex +%%% TeX-master: "00book" +%%% End: