view en/ch07-filenames.xml @ 658:b90b024729f1

WIP DocBook snapshot that all compiles. Mirabile dictu!
author Bryan O'Sullivan <bos@serpentine.com>
date Wed, 18 Feb 2009 00:22:09 -0800
parents en/ch07-filenames.tex@f72b7e6cbe90
children 21c62e09b99f
line wrap: on
line source

<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->

<chapter id="chap:names">
  <title>File names and pattern matching</title>

  <para>Mercurial provides mechanisms that let you work with file
    names in a consistent and expressive way.</para>

  <sect1>
    <title>Simple file naming</title>

    <para>Mercurial uses a unified piece of machinery <quote>under the
	hood</quote> to handle file names.  Every command behaves
      uniformly with respect to file names.  The way in which commands
      work with file names is as follows.</para>

    <para>If you explicitly name real files on the command line,
      Mercurial works with exactly those files, as you would expect.
      <!-- &interaction.filenames.files; --></para>

    <para>When you provide a directory name, Mercurial will interpret
      this as <quote>operate on every file in this directory and its
	subdirectories</quote>. Mercurial traverses the files and
      subdirectories in a directory in alphabetical order.  When it
      encounters a subdirectory, it will traverse that subdirectory
      before continuing with the current directory. <!--
      &interaction.filenames.dirs; --></para>

  </sect1>
  <sect1>
    <title>Running commands without any file names</title>

    <para>Mercurial's commands that work with file names have useful
      default behaviours when you invoke them without providing any
      file names or patterns.  What kind of behaviour you should
      expect depends on what the command does.  Here are a few rules
      of thumb you can use to predict what a command is likely to do
      if you don't give it any names to work with.</para>
    <itemizedlist>
      <listitem><para>Most commands will operate on the entire working
	  directory. This is what the <command role="hg-cmd">hg
	    add</command> command does, for example.</para>
      </listitem>
      <listitem><para>If the command has effects that are difficult or
	  impossible to reverse, it will force you to explicitly
	  provide at least one name or pattern (see below).  This
	  protects you from accidentally deleting files by running
	  <command role="hg-cmd">hg remove</command> with no
	  arguments, for example.</para>
      </listitem></itemizedlist>

    <para>It's easy to work around these default behaviours if they
      don't suit you.  If a command normally operates on the whole
      working directory, you can invoke it on just the current
      directory and its subdirectories by giving it the name
      <quote><filename class="directory">.</filename></quote>. <!--
      &interaction.filenames.wdir-subdir; --></para>

    <para>Along the same lines, some commands normally print file
      names relative to the root of the repository, even if you're
      invoking them from a subdirectory.  Such a command will print
      file names relative to your subdirectory if you give it explicit
      names.  Here, we're going to run <command role="hg-cmd">hg
	status</command> from a subdirectory, and get it to operate on
      the entire working directory while printing file names relative
      to our subdirectory, by passing it the output of the <command
	role="hg-cmd">hg root</command> command. <!--
      &interaction.filenames.wdir-relname; --></para>

  </sect1>
  <sect1>
    <title>Telling you what's going on</title>

    <para>The <command role="hg-cmd">hg add</command> example in the
      preceding section illustrates something else that's helpful
      about Mercurial commands.  If a command operates on a file that
      you didn't name explicitly on the command line, it will usually
      print the name of the file, so that you will not be surprised
      what's going on.</para>

    <para>The principle here is of <emphasis>least
	surprise</emphasis>.  If you've exactly named a file on the
      command line, there's no point in repeating it back at you.  If
      Mercurial is acting on a file <emphasis>implicitly</emphasis>,
      because you provided no names, or a directory, or a pattern (see
      below), it's safest to tell you what it's doing.</para>

    <para>For commands that behave this way, you can silence them
      using the <option role="hg-opt-global">-q</option> option.  You
      can also get them to print the name of every file, even those
      you've named explicitly, using the <option
	role="hg-opt-global">-v</option> option.</para>

  </sect1>
  <sect1>
    <title>Using patterns to identify files</title>

    <para>In addition to working with file and directory names,
      Mercurial lets you use <emphasis>patterns</emphasis> to identify
      files.  Mercurial's pattern handling is expressive.</para>

    <para>On Unix-like systems (Linux, MacOS, etc.), the job of
      matching file names to patterns normally falls to the shell.  On
      these systems, you must explicitly tell Mercurial that a name is
      a pattern.  On Windows, the shell does not expand patterns, so
      Mercurial will automatically identify names that are patterns,
      and expand them for you.</para>

    <para>To provide a pattern in place of a regular name on the
      command line, the mechanism is simple:</para>
    <programlisting>syntax:patternbody</programlisting>
    <para>That is, a pattern is identified by a short text string that
      says what kind of pattern this is, followed by a colon, followed
      by the actual pattern.</para>

    <para>Mercurial supports two kinds of pattern syntax.  The most
      frequently used is called <literal>glob</literal>; this is the
      same kind of pattern matching used by the Unix shell, and should
      be familiar to Windows command prompt users, too.</para>

    <para>When Mercurial does automatic pattern matching on Windows,
      it uses <literal>glob</literal> syntax.  You can thus omit the
      <quote><literal>glob:</literal></quote> prefix on Windows, but
      it's safe to use it, too.</para>

    <para>The <literal>re</literal> syntax is more powerful; it lets
      you specify patterns using regular expressions, also known as
      regexps.</para>

    <para>By the way, in the examples that follow, notice that I'm
      careful to wrap all of my patterns in quote characters, so that
      they won't get expanded by the shell before Mercurial sees
      them.</para>

    <sect2>
      <title>Shell-style <literal>glob</literal> patterns</title>

      <para>This is an overview of the kinds of patterns you can use
	when you're matching on glob patterns.</para>

      <para>The <quote><literal>*</literal></quote> character matches
	any string, within a single directory. <!--
	&interaction.filenames.glob.star; --></para>

      <para>The <quote><literal>**</literal></quote> pattern matches
	any string, and crosses directory boundaries.  It's not a
	standard Unix glob token, but it's accepted by several popular
	Unix shells, and is very useful. <!--
	&interaction.filenames.glob.starstar; --></para>

      <para>The <quote><literal>?</literal></quote> pattern matches
	any single character. <!--
	&interaction.filenames.glob.question; --></para>

      <para>The <quote><literal>[</literal></quote> character begins a
	<emphasis>character class</emphasis>.  This matches any single
	character within the class.  The class ends with a
	<quote><literal>]</literal></quote> character.  A class may
	contain multiple <emphasis>range</emphasis>s of the form
	<quote><literal>a-f</literal></quote>, which is shorthand for
	<quote><literal>abcdef</literal></quote>. <!--
	&interaction.filenames.glob.range; --> If the first character
	after the <quote><literal>[</literal></quote> in a character
	class is a <quote><literal>!</literal></quote>, it
	<emphasis>negates</emphasis> the class, making it match any
	single character not in the class.</para>

      <para>A <quote><literal>{</literal></quote> begins a group of
	subpatterns, where the whole group matches if any subpattern
	in the group matches.  The <quote><literal>,</literal></quote>
	character separates subpatterns, and <quote>\texttt{}}</quote>
	ends the group. <!-- &interaction.filenames.glob.group;
	--></para>

      <sect3>
	<title>Watch out!</title>

	<para>Don't forget that if you want to match a pattern in any
	  directory, you should not be using the
	  <quote><literal>*</literal></quote> match-any token, as this
	  will only match within one directory.  Instead, use the
	  <quote><literal>**</literal></quote> token.  This small
	  example illustrates the difference between the two. <!--
	  &interaction.filenames.glob.star-starstar; --></para>

      </sect3>
    </sect2>
    <sect2>
      <title>Regular expression matching with <literal>re</literal>
	patterns</title>

      <para>Mercurial accepts the same regular expression syntax as
	the Python programming language (it uses Python's regexp
	engine internally). This is based on the Perl language's
	regexp syntax, which is the most popular dialect in use (it's
	also used in Java, for example).</para>

      <para>I won't discuss Mercurial's regexp dialect in any detail
	here, as regexps are not often used.  Perl-style regexps are
	in any case already exhaustively documented on a multitude of
	web sites, and in many books.  Instead, I will focus here on a
	few things you should know if you find yourself needing to use
	regexps with Mercurial.</para>

      <para>A regexp is matched against an entire file name, relative
	to the root of the repository.  In other words, even if you're
	already in subbdirectory <filename
	  class="directory">foo</filename>, if you want to match files
	under this directory, your pattern must start with
	<quote><literal>foo/</literal></quote>.</para>

      <para>One thing to note, if you're familiar with Perl-style
	regexps, is that Mercurial's are <emphasis>rooted</emphasis>.
	That is, a regexp starts matching against the beginning of a
	string; it doesn't look for a match anywhere within the
	string.  To match anywhere in a string, start your pattern
	with <quote><literal>.*</literal></quote>.</para>

    </sect2>
  </sect1>
  <sect1>
    <title>Filtering files</title>

    <para>Not only does Mercurial give you a variety of ways to
      specify files; it lets you further winnow those files using
      <emphasis>filters</emphasis>.  Commands that work with file
      names accept two filtering options.</para>
    <itemizedlist>
      <listitem><para><option role="hg-opt-global">-I</option>, or
	  <option role="hg-opt-global">--include</option>, lets you
	  specify a pattern that file names must match in order to be
	  processed.</para>
      </listitem>
      <listitem><para><option role="hg-opt-global">-X</option>, or
	  <option role="hg-opt-global">--exclude</option>, gives you a
	  way to <emphasis>avoid</emphasis> processing files, if they
	  match this pattern.</para>
      </listitem></itemizedlist>
    <para>You can provide multiple <option
	role="hg-opt-global">-I</option> and <option
	role="hg-opt-global">-X</option> options on the command line,
      and intermix them as you please.  Mercurial interprets the
      patterns you provide using glob syntax by default (but you can
      use regexps if you need to).</para>

    <para>You can read a <option role="hg-opt-global">-I</option>
      filter as <quote>process only the files that match this
	filter</quote>. <!-- &interaction.filenames.filter.include;
      --> The <option role="hg-opt-global">-X</option> filter is best
      read as <quote>process only the files that don't match this
	pattern</quote>. <!-- &interaction.filenames.filter.exclude;
      --></para>

  </sect1>
  <sect1>
    <title>Ignoring unwanted files and directories</title>

    <para>XXX.</para>

  </sect1>
  <sect1 id="sec:names:case">
    <title>Case sensitivity</title>

    <para>If you're working in a mixed development environment that
      contains both Linux (or other Unix) systems and Macs or Windows
      systems, you should keep in the back of your mind the knowledge
      that they treat the case (<quote>N</quote> versus
      <quote>n</quote>) of file names in incompatible ways.  This is
      not very likely to affect you, and it's easy to deal with if it
      does, but it could surprise you if you don't know about
      it.</para>

    <para>Operating systems and filesystems differ in the way they
      handle the <emphasis>case</emphasis> of characters in file and
      directory names.  There are three common ways to handle case in
      names.</para>
    <itemizedlist>
      <listitem><para>Completely case insensitive.  Uppercase and
	  lowercase versions of a letter are treated as identical,
	  both when creating a file and during subsequent accesses.
	  This is common on older DOS-based systems.</para>
      </listitem>
      <listitem><para>Case preserving, but insensitive.  When a file
	  or directory is created, the case of its name is stored, and
	  can be retrieved and displayed by the operating system.
	  When an existing file is being looked up, its case is
	  ignored.  This is the standard arrangement on Windows and
	  MacOS.  The names <filename>foo</filename> and
	  <filename>FoO</filename> identify the same file.  This
	  treatment of uppercase and lowercase letters as
	  interchangeable is also referred to as <emphasis>case
	    folding</emphasis>.</para>
      </listitem>
      <listitem><para>Case sensitive.  The case of a name is
	  significant at all times. The names <filename>foo</filename>
	  and {FoO} identify different files.  This is the way Linux
	  and Unix systems normally work.</para>
      </listitem></itemizedlist>

    <para>On Unix-like systems, it is possible to have any or all of
      the above ways of handling case in action at once.  For example,
      if you use a USB thumb drive formatted with a FAT32 filesystem
      on a Linux system, Linux will handle names on that filesystem in
      a case preserving, but insensitive, way.</para>

    <sect2>
      <title>Safe, portable repository storage</title>

      <para>Mercurial's repository storage mechanism is <emphasis>case
	  safe</emphasis>.  It translates file names so that they can
	be safely stored on both case sensitive and case insensitive
	filesystems.  This means that you can use normal file copying
	tools to transfer a Mercurial repository onto, for example, a
	USB thumb drive, and safely move that drive and repository
	back and forth between a Mac, a PC running Windows, and a
	Linux box.</para>

    </sect2>
    <sect2>
      <title>Detecting case conflicts</title>

      <para>When operating in the working directory, Mercurial honours
	the naming policy of the filesystem where the working
	directory is located.  If the filesystem is case preserving,
	but insensitive, Mercurial will treat names that differ only
	in case as the same.</para>

      <para>An important aspect of this approach is that it is
	possible to commit a changeset on a case sensitive (typically
	Linux or Unix) filesystem that will cause trouble for users on
	case insensitive (usually Windows and MacOS) users.  If a
	Linux user commits changes to two files, one named
	<filename>myfile.c</filename> and the other named
	<filename>MyFile.C</filename>, they will be stored correctly
	in the repository.  And in the working directories of other
	Linux users, they will be correctly represented as separate
	files.</para>

      <para>If a Windows or Mac user pulls this change, they will not
	initially have a problem, because Mercurial's repository
	storage mechanism is case safe.  However, once they try to
	<command role="hg-cmd">hg update</command> the working
	directory to that changeset, or <command role="hg-cmd">hg
	  merge</command> with that changeset, Mercurial will spot the
	conflict between the two file names that the filesystem would
	treat as the same, and forbid the update or merge from
	occurring.</para>

    </sect2>
    <sect2>
      <title>Fixing a case conflict</title>

      <para>If you are using Windows or a Mac in a mixed environment
	where some of your collaborators are using Linux or Unix, and
	Mercurial reports a case folding conflict when you try to
	<command role="hg-cmd">hg update</command> or <command
	  role="hg-cmd">hg merge</command>, the procedure to fix the
	problem is simple.</para>

      <para>Just find a nearby Linux or Unix box, clone the problem
	repository onto it, and use Mercurial's <command
	  role="hg-cmd">hg rename</command> command to change the
	names of any offending files or directories so that they will
	no longer cause case folding conflicts.  Commit this change,
	<command role="hg-cmd">hg pull</command> or <command
	  role="hg-cmd">hg push</command> it across to your Windows or
	MacOS system, and <command role="hg-cmd">hg update</command>
	to the revision with the non-conflicting names.</para>

      <para>The changeset with case-conflicting names will remain in
	your project's history, and you still won't be able to
	<command role="hg-cmd">hg update</command> your working
	directory to that changeset on a Windows or MacOS system, but
	you can continue development unimpeded.</para>

      <note>
	<para>  Prior to version 0.9.3, Mercurial did not use a case
	  safe repository storage mechanism, and did not detect case
	  folding conflicts.  If you are using an older version of
	  Mercurial on Windows or MacOS, I strongly recommend that you
	  upgrade.</para>
      </note>

    </sect2>
  </sect1>
</chapter>

<!--
local variables: 
sgml-parent-document: ("00book.xml" "book" "chapter")
end:
-->