Mercurial > hgbook
diff en/ch07-filenames.xml @ 658:b90b024729f1
WIP DocBook snapshot that all compiles. Mirabile dictu!
author | Bryan O'Sullivan <bos@serpentine.com> |
---|---|
date | Wed, 18 Feb 2009 00:22:09 -0800 |
parents | en/ch07-filenames.tex@f72b7e6cbe90 |
children | 21c62e09b99f |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/en/ch07-filenames.xml Wed Feb 18 00:22:09 2009 -0800 @@ -0,0 +1,392 @@ +<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : --> + +<chapter id="chap:names"> + <title>File names and pattern matching</title> + + <para>Mercurial provides mechanisms that let you work with file + names in a consistent and expressive way.</para> + + <sect1> + <title>Simple file naming</title> + + <para>Mercurial uses a unified piece of machinery <quote>under the + hood</quote> to handle file names. Every command behaves + uniformly with respect to file names. The way in which commands + work with file names is as follows.</para> + + <para>If you explicitly name real files on the command line, + Mercurial works with exactly those files, as you would expect. + <!-- &interaction.filenames.files; --></para> + + <para>When you provide a directory name, Mercurial will interpret + this as <quote>operate on every file in this directory and its + subdirectories</quote>. Mercurial traverses the files and + subdirectories in a directory in alphabetical order. When it + encounters a subdirectory, it will traverse that subdirectory + before continuing with the current directory. <!-- + &interaction.filenames.dirs; --></para> + + </sect1> + <sect1> + <title>Running commands without any file names</title> + + <para>Mercurial's commands that work with file names have useful + default behaviours when you invoke them without providing any + file names or patterns. What kind of behaviour you should + expect depends on what the command does. Here are a few rules + of thumb you can use to predict what a command is likely to do + if you don't give it any names to work with.</para> + <itemizedlist> + <listitem><para>Most commands will operate on the entire working + directory. This is what the <command role="hg-cmd">hg + add</command> command does, for example.</para> + </listitem> + <listitem><para>If the command has effects that are difficult or + impossible to reverse, it will force you to explicitly + provide at least one name or pattern (see below). This + protects you from accidentally deleting files by running + <command role="hg-cmd">hg remove</command> with no + arguments, for example.</para> + </listitem></itemizedlist> + + <para>It's easy to work around these default behaviours if they + don't suit you. If a command normally operates on the whole + working directory, you can invoke it on just the current + directory and its subdirectories by giving it the name + <quote><filename class="directory">.</filename></quote>. <!-- + &interaction.filenames.wdir-subdir; --></para> + + <para>Along the same lines, some commands normally print file + names relative to the root of the repository, even if you're + invoking them from a subdirectory. Such a command will print + file names relative to your subdirectory if you give it explicit + names. Here, we're going to run <command role="hg-cmd">hg + status</command> from a subdirectory, and get it to operate on + the entire working directory while printing file names relative + to our subdirectory, by passing it the output of the <command + role="hg-cmd">hg root</command> command. <!-- + &interaction.filenames.wdir-relname; --></para> + + </sect1> + <sect1> + <title>Telling you what's going on</title> + + <para>The <command role="hg-cmd">hg add</command> example in the + preceding section illustrates something else that's helpful + about Mercurial commands. If a command operates on a file that + you didn't name explicitly on the command line, it will usually + print the name of the file, so that you will not be surprised + what's going on.</para> + + <para>The principle here is of <emphasis>least + surprise</emphasis>. If you've exactly named a file on the + command line, there's no point in repeating it back at you. If + Mercurial is acting on a file <emphasis>implicitly</emphasis>, + because you provided no names, or a directory, or a pattern (see + below), it's safest to tell you what it's doing.</para> + + <para>For commands that behave this way, you can silence them + using the <option role="hg-opt-global">-q</option> option. You + can also get them to print the name of every file, even those + you've named explicitly, using the <option + role="hg-opt-global">-v</option> option.</para> + + </sect1> + <sect1> + <title>Using patterns to identify files</title> + + <para>In addition to working with file and directory names, + Mercurial lets you use <emphasis>patterns</emphasis> to identify + files. Mercurial's pattern handling is expressive.</para> + + <para>On Unix-like systems (Linux, MacOS, etc.), the job of + matching file names to patterns normally falls to the shell. On + these systems, you must explicitly tell Mercurial that a name is + a pattern. On Windows, the shell does not expand patterns, so + Mercurial will automatically identify names that are patterns, + and expand them for you.</para> + + <para>To provide a pattern in place of a regular name on the + command line, the mechanism is simple:</para> + <programlisting>syntax:patternbody</programlisting> + <para>That is, a pattern is identified by a short text string that + says what kind of pattern this is, followed by a colon, followed + by the actual pattern.</para> + + <para>Mercurial supports two kinds of pattern syntax. The most + frequently used is called <literal>glob</literal>; this is the + same kind of pattern matching used by the Unix shell, and should + be familiar to Windows command prompt users, too.</para> + + <para>When Mercurial does automatic pattern matching on Windows, + it uses <literal>glob</literal> syntax. You can thus omit the + <quote><literal>glob:</literal></quote> prefix on Windows, but + it's safe to use it, too.</para> + + <para>The <literal>re</literal> syntax is more powerful; it lets + you specify patterns using regular expressions, also known as + regexps.</para> + + <para>By the way, in the examples that follow, notice that I'm + careful to wrap all of my patterns in quote characters, so that + they won't get expanded by the shell before Mercurial sees + them.</para> + + <sect2> + <title>Shell-style <literal>glob</literal> patterns</title> + + <para>This is an overview of the kinds of patterns you can use + when you're matching on glob patterns.</para> + + <para>The <quote><literal>*</literal></quote> character matches + any string, within a single directory. <!-- + &interaction.filenames.glob.star; --></para> + + <para>The <quote><literal>**</literal></quote> pattern matches + any string, and crosses directory boundaries. It's not a + standard Unix glob token, but it's accepted by several popular + Unix shells, and is very useful. <!-- + &interaction.filenames.glob.starstar; --></para> + + <para>The <quote><literal>?</literal></quote> pattern matches + any single character. <!-- + &interaction.filenames.glob.question; --></para> + + <para>The <quote><literal>[</literal></quote> character begins a + <emphasis>character class</emphasis>. This matches any single + character within the class. The class ends with a + <quote><literal>]</literal></quote> character. A class may + contain multiple <emphasis>range</emphasis>s of the form + <quote><literal>a-f</literal></quote>, which is shorthand for + <quote><literal>abcdef</literal></quote>. <!-- + &interaction.filenames.glob.range; --> If the first character + after the <quote><literal>[</literal></quote> in a character + class is a <quote><literal>!</literal></quote>, it + <emphasis>negates</emphasis> the class, making it match any + single character not in the class.</para> + + <para>A <quote><literal>{</literal></quote> begins a group of + subpatterns, where the whole group matches if any subpattern + in the group matches. The <quote><literal>,</literal></quote> + character separates subpatterns, and <quote>\texttt{}}</quote> + ends the group. <!-- &interaction.filenames.glob.group; + --></para> + + <sect3> + <title>Watch out!</title> + + <para>Don't forget that if you want to match a pattern in any + directory, you should not be using the + <quote><literal>*</literal></quote> match-any token, as this + will only match within one directory. Instead, use the + <quote><literal>**</literal></quote> token. This small + example illustrates the difference between the two. <!-- + &interaction.filenames.glob.star-starstar; --></para> + + </sect3> + </sect2> + <sect2> + <title>Regular expression matching with <literal>re</literal> + patterns</title> + + <para>Mercurial accepts the same regular expression syntax as + the Python programming language (it uses Python's regexp + engine internally). This is based on the Perl language's + regexp syntax, which is the most popular dialect in use (it's + also used in Java, for example).</para> + + <para>I won't discuss Mercurial's regexp dialect in any detail + here, as regexps are not often used. Perl-style regexps are + in any case already exhaustively documented on a multitude of + web sites, and in many books. Instead, I will focus here on a + few things you should know if you find yourself needing to use + regexps with Mercurial.</para> + + <para>A regexp is matched against an entire file name, relative + to the root of the repository. In other words, even if you're + already in subbdirectory <filename + class="directory">foo</filename>, if you want to match files + under this directory, your pattern must start with + <quote><literal>foo/</literal></quote>.</para> + + <para>One thing to note, if you're familiar with Perl-style + regexps, is that Mercurial's are <emphasis>rooted</emphasis>. + That is, a regexp starts matching against the beginning of a + string; it doesn't look for a match anywhere within the + string. To match anywhere in a string, start your pattern + with <quote><literal>.*</literal></quote>.</para> + + </sect2> + </sect1> + <sect1> + <title>Filtering files</title> + + <para>Not only does Mercurial give you a variety of ways to + specify files; it lets you further winnow those files using + <emphasis>filters</emphasis>. Commands that work with file + names accept two filtering options.</para> + <itemizedlist> + <listitem><para><option role="hg-opt-global">-I</option>, or + <option role="hg-opt-global">--include</option>, lets you + specify a pattern that file names must match in order to be + processed.</para> + </listitem> + <listitem><para><option role="hg-opt-global">-X</option>, or + <option role="hg-opt-global">--exclude</option>, gives you a + way to <emphasis>avoid</emphasis> processing files, if they + match this pattern.</para> + </listitem></itemizedlist> + <para>You can provide multiple <option + role="hg-opt-global">-I</option> and <option + role="hg-opt-global">-X</option> options on the command line, + and intermix them as you please. Mercurial interprets the + patterns you provide using glob syntax by default (but you can + use regexps if you need to).</para> + + <para>You can read a <option role="hg-opt-global">-I</option> + filter as <quote>process only the files that match this + filter</quote>. <!-- &interaction.filenames.filter.include; + --> The <option role="hg-opt-global">-X</option> filter is best + read as <quote>process only the files that don't match this + pattern</quote>. <!-- &interaction.filenames.filter.exclude; + --></para> + + </sect1> + <sect1> + <title>Ignoring unwanted files and directories</title> + + <para>XXX.</para> + + </sect1> + <sect1 id="sec:names:case"> + <title>Case sensitivity</title> + + <para>If you're working in a mixed development environment that + contains both Linux (or other Unix) systems and Macs or Windows + systems, you should keep in the back of your mind the knowledge + that they treat the case (<quote>N</quote> versus + <quote>n</quote>) of file names in incompatible ways. This is + not very likely to affect you, and it's easy to deal with if it + does, but it could surprise you if you don't know about + it.</para> + + <para>Operating systems and filesystems differ in the way they + handle the <emphasis>case</emphasis> of characters in file and + directory names. There are three common ways to handle case in + names.</para> + <itemizedlist> + <listitem><para>Completely case insensitive. Uppercase and + lowercase versions of a letter are treated as identical, + both when creating a file and during subsequent accesses. + This is common on older DOS-based systems.</para> + </listitem> + <listitem><para>Case preserving, but insensitive. When a file + or directory is created, the case of its name is stored, and + can be retrieved and displayed by the operating system. + When an existing file is being looked up, its case is + ignored. This is the standard arrangement on Windows and + MacOS. The names <filename>foo</filename> and + <filename>FoO</filename> identify the same file. This + treatment of uppercase and lowercase letters as + interchangeable is also referred to as <emphasis>case + folding</emphasis>.</para> + </listitem> + <listitem><para>Case sensitive. The case of a name is + significant at all times. The names <filename>foo</filename> + and {FoO} identify different files. This is the way Linux + and Unix systems normally work.</para> + </listitem></itemizedlist> + + <para>On Unix-like systems, it is possible to have any or all of + the above ways of handling case in action at once. For example, + if you use a USB thumb drive formatted with a FAT32 filesystem + on a Linux system, Linux will handle names on that filesystem in + a case preserving, but insensitive, way.</para> + + <sect2> + <title>Safe, portable repository storage</title> + + <para>Mercurial's repository storage mechanism is <emphasis>case + safe</emphasis>. It translates file names so that they can + be safely stored on both case sensitive and case insensitive + filesystems. This means that you can use normal file copying + tools to transfer a Mercurial repository onto, for example, a + USB thumb drive, and safely move that drive and repository + back and forth between a Mac, a PC running Windows, and a + Linux box.</para> + + </sect2> + <sect2> + <title>Detecting case conflicts</title> + + <para>When operating in the working directory, Mercurial honours + the naming policy of the filesystem where the working + directory is located. If the filesystem is case preserving, + but insensitive, Mercurial will treat names that differ only + in case as the same.</para> + + <para>An important aspect of this approach is that it is + possible to commit a changeset on a case sensitive (typically + Linux or Unix) filesystem that will cause trouble for users on + case insensitive (usually Windows and MacOS) users. If a + Linux user commits changes to two files, one named + <filename>myfile.c</filename> and the other named + <filename>MyFile.C</filename>, they will be stored correctly + in the repository. And in the working directories of other + Linux users, they will be correctly represented as separate + files.</para> + + <para>If a Windows or Mac user pulls this change, they will not + initially have a problem, because Mercurial's repository + storage mechanism is case safe. However, once they try to + <command role="hg-cmd">hg update</command> the working + directory to that changeset, or <command role="hg-cmd">hg + merge</command> with that changeset, Mercurial will spot the + conflict between the two file names that the filesystem would + treat as the same, and forbid the update or merge from + occurring.</para> + + </sect2> + <sect2> + <title>Fixing a case conflict</title> + + <para>If you are using Windows or a Mac in a mixed environment + where some of your collaborators are using Linux or Unix, and + Mercurial reports a case folding conflict when you try to + <command role="hg-cmd">hg update</command> or <command + role="hg-cmd">hg merge</command>, the procedure to fix the + problem is simple.</para> + + <para>Just find a nearby Linux or Unix box, clone the problem + repository onto it, and use Mercurial's <command + role="hg-cmd">hg rename</command> command to change the + names of any offending files or directories so that they will + no longer cause case folding conflicts. Commit this change, + <command role="hg-cmd">hg pull</command> or <command + role="hg-cmd">hg push</command> it across to your Windows or + MacOS system, and <command role="hg-cmd">hg update</command> + to the revision with the non-conflicting names.</para> + + <para>The changeset with case-conflicting names will remain in + your project's history, and you still won't be able to + <command role="hg-cmd">hg update</command> your working + directory to that changeset on a Windows or MacOS system, but + you can continue development unimpeded.</para> + + <note> + <para> Prior to version 0.9.3, Mercurial did not use a case + safe repository storage mechanism, and did not detect case + folding conflicts. If you are using an older version of + Mercurial on Windows or MacOS, I strongly recommend that you + upgrade.</para> + </note> + + </sect2> + </sect1> +</chapter> + +<!-- +local variables: +sgml-parent-document: ("00book.xml" "book" "chapter") +end: +-->