Mercurial > hgbook

<!-- vim: set filetype=docbkxml shiftwidth=2 autoindent expandtab tw=77 : -->

<chapter id="chap.undo">
  <?dbhtml filename="finding-and-fixing-mistakes.html"?>
  <title>Finding and fixing mistakes</title>

  <para>To err might be human, but to really handle the consequences
    well takes a top-notch revision control system.  In this chapter,
    we'll discuss some of the techniques you can use when you find
    that a problem has crept into your project.  Mercurial has some
    highly capable features that will help you to isolate the sources
    of problems, and to handle them appropriately.</para>

  <sect1>
    <title>Erasing local history</title>

    <sect2>
      <title>The accidental commit</title>

      <para>I have the occasional but persistent problem of typing
	rather more quickly than I can think, which sometimes results
	in me committing a changeset that is either incomplete or
	plain wrong.  In my case, the usual kind of incomplete
	changeset is one in which I've created a new source file, but
	forgotten to <command role="hg-cmd">hg add</command> it.  A
	<quote>plain wrong</quote> changeset is not as common, but no
	less annoying.</para>

    </sect2>
    <sect2 id="sec.undo.rollback">
      <title>Rolling back a transaction</title>

      <para>In section <xref linkend="sec.concepts.txn"/>, I mentioned
	that Mercurial treats each modification of a repository as a
	<emphasis>transaction</emphasis>.  Every time you commit a
	changeset or pull changes from another repository, Mercurial
	remembers what you did.  You can undo, or <emphasis>roll
	  back</emphasis>, exactly one of these actions using the
	<command role="hg-cmd">hg rollback</command> command.  (See
	section <xref linkend="sec.undo.rollback-after-push"/> for an
	important caveat about the use of this command.)</para>

      <para>Here's a mistake that I often find myself making:
	committing a change in which I've created a new file, but
	forgotten to <command role="hg-cmd">hg add</command>
	it.</para>

      &interaction.rollback.commit;

      <para>Looking at the output of <command role="hg-cmd">hg
	  status</command> after the commit immediately confirms the
	error.</para>

      &interaction.rollback.status;

      <para>The commit captured the changes to the file
	<filename>a</filename>, but not the new file
	<filename>b</filename>.  If I were to push this changeset to a
	repository that I shared with a colleague, the chances are
	high that something in <filename>a</filename> would refer to
	<filename>b</filename>, which would not be present in their
	repository when they pulled my changes.  I would thus become
	the object of some indignation.</para>

      <para>However, luck is with me&emdash;I've caught my error
	before I pushed the changeset.  I use the <command
	  role="hg-cmd">hg rollback</command> command, and Mercurial
	makes that last changeset vanish.</para>

      &interaction.rollback.rollback;

      <para>Notice that the changeset is no longer present in the
	repository's history, and the working directory once again
	thinks that the file <filename>a</filename> is modified.  The
	commit and rollback have left the working directory exactly as
	it was prior to the commit; the changeset has been completely
	erased.  I can now safely <command role="hg-cmd">hg
	  add</command> the file <filename>b</filename>, and rerun my
	commit.</para>

      &interaction.rollback.add;

    </sect2>
    <sect2>
      <title>The erroneous pull</title>

      <para>It's common practice with Mercurial to maintain separate
	development branches of a project in different repositories.
	Your development team might have one shared repository for
	your project's <quote>0.9</quote> release, and another,
	containing different changes, for the <quote>1.0</quote>
	release.</para>

      <para>Given this, you can imagine that the consequences could be
	messy if you had a local <quote>0.9</quote> repository, and
	accidentally pulled changes from the shared <quote>1.0</quote>
	repository into it.  At worst, you could be paying
	insufficient attention, and push those changes into the shared
	<quote>0.9</quote> tree, confusing your entire team (but don't
	worry, we'll return to this horror scenario later).  However,
	it's more likely that you'll notice immediately, because
	Mercurial will display the URL it's pulling from, or you will
	see it pull a suspiciously large number of changes into the
	repository.</para>

      <para>The <command role="hg-cmd">hg rollback</command> command
	will work nicely to expunge all of the changesets that you
	just pulled.  Mercurial groups all changes from one <command
	  role="hg-cmd">hg pull</command> into a single transaction,
	so one <command role="hg-cmd">hg rollback</command> is all you
	need to undo this mistake.</para>

    </sect2>
    <sect2 id="sec.undo.rollback-after-push">
      <title>Rolling back is useless once you've pushed</title>

      <para>The value of the <command role="hg-cmd">hg
	  rollback</command> command drops to zero once you've pushed
	your changes to another repository.  Rolling back a change
	makes it disappear entirely, but <emphasis>only</emphasis> in
	the repository in which you perform the <command
	  role="hg-cmd">hg rollback</command>.  Because a rollback
	eliminates history, there's no way for the disappearance of a
	change to propagate between repositories.</para>

      <para>If you've pushed a change to another
	repository&emdash;particularly if it's a shared
	repository&emdash;it has essentially <quote>escaped into the
	  wild,</quote> and you'll have to recover from your mistake
	in a different way.  What will happen if you push a changeset
	somewhere, then roll it back, then pull from the repository
	you pushed to, is that the changeset will reappear in your
	repository.</para>

      <para>(If you absolutely know for sure that the change you want
	to roll back is the most recent change in the repository that
	you pushed to, <emphasis>and</emphasis> you know that nobody
	else could have pulled it from that repository, you can roll
	back the changeset there, too, but you really should really
	not rely on this working reliably.  If you do this, sooner or
	later a change really will make it into a repository that you
	don't directly control (or have forgotten about), and come
	back to bite you.)</para>

    </sect2>
    <sect2>
      <title>You can only roll back once</title>

      <para>Mercurial stores exactly one transaction in its
	transaction log; that transaction is the most recent one that
	occurred in the repository. This means that you can only roll
	back one transaction.  If you expect to be able to roll back
	one transaction, then its predecessor, this is not the
	behaviour you will get.</para>

      &interaction.rollback.twice;

      <para>Once you've rolled back one transaction in a repository,
	you can't roll back again in that repository until you perform
	another commit or pull.</para>

    </sect2>
  </sect1>
  <sect1>
    <title>Reverting the mistaken change</title>

    <para>If you make a modification to a file, and decide that you
      really didn't want to change the file at all, and you haven't
      yet committed your changes, the <command role="hg-cmd">hg
	revert</command> command is the one you'll need.  It looks at
      the changeset that's the parent of the working directory, and
      restores the contents of the file to their state as of that
      changeset. (That's a long-winded way of saying that, in the
      normal case, it undoes your modifications.)</para>

    <para>Let's illustrate how the <command role="hg-cmd">hg
	revert</command> command works with yet another small example.
      We'll begin by modifying a file that Mercurial is already
      tracking.</para>

    &interaction.daily.revert.modify;

    <para>If we don't
      want that change, we can simply <command role="hg-cmd">hg
	revert</command> the file.</para>

      &interaction.daily.revert.unmodify;

    <para>The <command role="hg-cmd">hg revert</command> command
      provides us with an extra degree of safety by saving our
      modified file with a <filename>.orig</filename>
      extension.</para>

    &interaction.daily.revert.status;

    <para>Here is a summary of the cases that the <command
	role="hg-cmd">hg revert</command> command can deal with.  We
      will describe each of these in more detail in the section that
      follows.</para>
    <itemizedlist>
      <listitem><para>If you modify a file, it will restore the file
	  to its unmodified state.</para>
      </listitem>
      <listitem><para>If you <command role="hg-cmd">hg add</command> a
	  file, it will undo the <quote>added</quote> state of the
	  file, but leave the file itself untouched.</para>
      </listitem>
      <listitem><para>If you delete a file without telling Mercurial,
	  it will restore the file to its unmodified contents.</para>
      </listitem>
      <listitem><para>If you use the <command role="hg-cmd">hg
	    remove</command> command to remove a file, it will undo
	  the <quote>removed</quote> state of the file, and restore
	  the file to its unmodified contents.</para>
      </listitem></itemizedlist>

    <sect2 id="sec.undo.mgmt">
      <title>File management errors</title>

      <para>The <command role="hg-cmd">hg revert</command> command is
	useful for more than just modified files.  It lets you reverse
	the results of all of Mercurial's file management
	commands&emdash;<command role="hg-cmd">hg add</command>,
	<command role="hg-cmd">hg remove</command>, and so on.</para>

      <para>If you <command role="hg-cmd">hg add</command> a file,
	then decide that in fact you don't want Mercurial to track it,
	use <command role="hg-cmd">hg revert</command> to undo the
	add.  Don't worry; Mercurial will not modify the file in any
	way.  It will just <quote>unmark</quote> the file.</para>

      &interaction.daily.revert.add;

      <para>Similarly, if you ask Mercurial to <command
	  role="hg-cmd">hg remove</command> a file, you can use
	<command role="hg-cmd">hg revert</command> to restore it to
	the contents it had as of the parent of the working directory.
	&interaction.daily.revert.remove; This works just as
	well for a file that you deleted by hand, without telling
	Mercurial (recall that in Mercurial terminology, this kind of
	file is called <quote>missing</quote>).</para>

      &interaction.daily.revert.missing;

      <para>If you revert a <command role="hg-cmd">hg copy</command>,
	the copied-to file remains in your working directory
	afterwards, untracked.  Since a copy doesn't affect the
	copied-from file in any way, Mercurial doesn't do anything
	with the copied-from file.</para>

      &interaction.daily.revert.copy;

      <sect3>
	<title>A slightly special case: reverting a rename</title>

	<para>If you <command role="hg-cmd">hg rename</command> a
	  file, there is one small detail that you should remember.
	  When you <command role="hg-cmd">hg revert</command> a
	  rename, it's not enough to provide the name of the
	  renamed-to file, as you can see here.</para>

	&interaction.daily.revert.rename;

	<para>As you can see from the output of <command
	    role="hg-cmd">hg status</command>, the renamed-to file is
	  no longer identified as added, but the
	  renamed-<emphasis>from</emphasis> file is still removed!
	  This is counter-intuitive (at least to me), but at least
	  it's easy to deal with.</para>

	&interaction.daily.revert.rename-orig;

	<para>So remember, to revert a <command role="hg-cmd">hg
	    rename</command>, you must provide
	  <emphasis>both</emphasis> the source and destination
	  names.</para>

	<para>% TODO: the output doesn't look like it will be
	  removed!</para>

	<para>(By the way, if you rename a file, then modify the
	  renamed-to file, then revert both components of the rename,
	  when Mercurial restores the file that was removed as part of
	  the rename, it will be unmodified. If you need the
	  modifications in the renamed-to file to show up in the
	  renamed-from file, don't forget to copy them over.)</para>

	<para>These fiddly aspects of reverting a rename arguably
	  constitute a small bug in Mercurial.</para>

      </sect3>
    </sect2>
  </sect1>
  <sect1>
    <title>Dealing with committed changes</title>

    <para>Consider a case where you have committed a change $a$, and
      another change $b$ on top of it; you then realise that change
      $a$ was incorrect.  Mercurial lets you <quote>back out</quote>
      an entire changeset automatically, and building blocks that let
      you reverse part of a changeset by hand.</para>

    <para>Before you read this section, here's something to keep in
      mind: the <command role="hg-cmd">hg backout</command> command
      undoes changes by <emphasis>adding</emphasis> history, not by
      modifying or erasing it.  It's the right tool to use if you're
      fixing bugs, but not if you're trying to undo some change that
      has catastrophic consequences.  To deal with those, see section
      <xref linkend="sec.undo.aaaiiieee"/>.</para>

    <sect2>
      <title>Backing out a changeset</title>

      <para>The <command role="hg-cmd">hg backout</command> command
	lets you <quote>undo</quote> the effects of an entire
	changeset in an automated fashion.  Because Mercurial's
	history is immutable, this command <emphasis>does
	  not</emphasis> get rid of the changeset you want to undo.
	Instead, it creates a new changeset that
	<emphasis>reverses</emphasis> the effect of the to-be-undone
	changeset.</para>

      <para>The operation of the <command role="hg-cmd">hg
	  backout</command> command is a little intricate, so let's
	illustrate it with some examples.  First, we'll create a
	repository with some simple changes.</para>

      &interaction.backout.init;

      <para>The <command role="hg-cmd">hg backout</command> command
	takes a single changeset ID as its argument; this is the
	changeset to back out.  Normally, <command role="hg-cmd">hg
	  backout</command> will drop you into a text editor to write
	a commit message, so you can record why you're backing the
	change out.  In this example, we provide a commit message on
	the command line using the <option
	  role="hg-opt-backout">-m</option> option.</para>

    </sect2>
    <sect2>
      <title>Backing out the tip changeset</title>

      <para>We're going to start by backing out the last changeset we
	committed.</para>

      &interaction.backout.simple;

      <para>You can see that the second line from
	<filename>myfile</filename> is no longer present.  Taking a
	look at the output of <command role="hg-cmd">hg log</command>
	gives us an idea of what the <command role="hg-cmd">hg
	  backout</command> command has done.
	&interaction.backout.simple.log; Notice that the new changeset
	that <command role="hg-cmd">hg backout</command> has created
	is a child of the changeset we backed out.  It's easier to see
	this in figure <xref
	  linkend="fig.undo.backout"/>, which presents a graphical
	view of the change history.  As you can see, the history is
	nice and linear.</para>

      <informalfigure id="fig.undo.backout">
	<mediaobject><imageobject><imagedata
				    fileref="images/undo-simple.png"/></imageobject><textobject><phrase>XXX
	      add text</phrase></textobject><caption><para>Backing out
	      a change using the <command role="hg-cmd">hg
		backout</command>
	      command</para></caption></mediaobject>

      </informalfigure>

    </sect2>
    <sect2>
      <title>Backing out a non-tip change</title>

      <para>If you want to back out a change other than the last one
	you committed, pass the <option
	  role="hg-opt-backout">--merge</option> option to the
	<command role="hg-cmd">hg backout</command> command.</para>

      &interaction.backout.non-tip.clone;

      <para>This makes backing out any changeset a
	<quote>one-shot</quote> operation that's usually simple and
	fast.</para>

      &interaction.backout.non-tip.backout;

      <para>If you take a look at the contents of
	<filename>myfile</filename> after the backout finishes, you'll
	see that the first and third changes are present, but not the
	second.</para>

      &interaction.backout.non-tip.cat;

      <para>As the graphical history in figure <xref
	  linkend="fig.undo.backout-non-tip"/> illustrates, Mercurial
	actually commits <emphasis>two</emphasis> changes in this kind
	of situation (the box-shaped nodes are the ones that Mercurial
	commits automatically).  Before Mercurial begins the backout
	process, it first remembers what the current parent of the
	working directory is.  It then backs out the target changeset,
	and commits that as a changeset.  Finally, it merges back to
	the previous parent of the working directory, and commits the
	result of the merge.</para>

      <para>% TODO: to me it looks like mercurial doesn't commit the
	second merge automatically!</para>

      <informalfigure id="fig.undo.backout-non-tip">
	<mediaobject><imageobject><imagedata
				    fileref="images/undo-non-tip.png"/></imageobject><textobject><phrase>XXX
	      add text</phrase></textobject><caption><para>Automated
	      backout of a non-tip change using the <command
		role="hg-cmd">hg backout</command>
	      command</para></caption></mediaobject>
      </informalfigure>

      <para>The result is that you end up <quote>back where you
	  were</quote>, only with some extra history that undoes the
	effect of the changeset you wanted to back out.</para>

      <sect3>
	<title>Always use the <option
	    role="hg-opt-backout">--merge</option> option</title>

	<para>In fact, since the <option
	    role="hg-opt-backout">--merge</option> option will do the
	  <quote>right thing</quote> whether or not the changeset
	  you're backing out is the tip (i.e. it won't try to merge if
	  it's backing out the tip, since there's no need), you should
	  <emphasis>always</emphasis> use this option when you run the
	  <command role="hg-cmd">hg backout</command> command.</para>

      </sect3>
    </sect2>
    <sect2>
      <title>Gaining more control of the backout process</title>

      <para>While I've recommended that you always use the <option
	  role="hg-opt-backout">--merge</option> option when backing
	out a change, the <command role="hg-cmd">hg backout</command>
	command lets you decide how to merge a backout changeset.
	Taking control of the backout process by hand is something you
	will rarely need to do, but it can be useful to understand
	what the <command role="hg-cmd">hg backout</command> command
	is doing for you automatically.  To illustrate this, let's
	clone our first repository, but omit the backout change that
	it contains.</para>

      &interaction.backout.manual.clone;

      <para>As with our
	earlier example, We'll commit a third changeset, then back out
	its parent, and see what happens.</para>

      &interaction.backout.manual.backout;

      <para>Our new changeset is again a descendant of the changeset
	we backout out; it's thus a new head, <emphasis>not</emphasis>
	a descendant of the changeset that was the tip.  The <command
	  role="hg-cmd">hg backout</command> command was quite
	explicit in telling us this.</para>

      &interaction.backout.manual.log;

      <para>Again, it's easier to see what has happened by looking at
	a graph of the revision history, in figure <xref
	  linkend="fig.undo.backout-manual"/>.  This makes it clear
	that when we use <command role="hg-cmd">hg backout</command>
	to back out a change other than the tip, Mercurial adds a new
	head to the repository (the change it committed is
	box-shaped).</para>

      <informalfigure id="fig.undo.backout-manual">
	<mediaobject><imageobject><imagedata
				    fileref="images/undo-manual.png"/></imageobject><textobject><phrase>XXX
	      add text</phrase></textobject><caption><para>Backing out
	      a change using the <command role="hg-cmd">hg
		backout</command>
	      command</para></caption></mediaobject>

      </informalfigure>

      <para>After the <command role="hg-cmd">hg backout</command>
	command has completed, it leaves the new
	<quote>backout</quote> changeset as the parent of the working
	directory.</para>

      &interaction.backout.manual.parents;

      <para>Now we have two isolated sets of changes.</para>

      &interaction.backout.manual.heads;

      <para>Let's think about what we expect to see as the contents of
	<filename>myfile</filename> now.  The first change should be
	present, because we've never backed it out.  The second change
	should be missing, as that's the change we backed out.  Since
	the history graph shows the third change as a separate head,
	we <emphasis>don't</emphasis> expect to see the third change
	present in <filename>myfile</filename>.</para>

      &interaction.backout.manual.cat;

      <para>To get the third change back into the file, we just do a
	normal merge of our two heads.</para>

      &interaction.backout.manual.merge;

      <para>Afterwards, the graphical history of our repository looks
	like figure
	<xref linkend="fig.undo.backout-manual-merge"/>.</para>

      <informalfigure id="fig.undo.backout-manual-merge">
	<mediaobject><imageobject><imagedata
				    fileref="images/undo-manual-merge.png"/></imageobject><textobject><phrase>XXX
	      add text</phrase></textobject><caption><para>Manually
	      merging a backout change</para></caption></mediaobject>

      </informalfigure>

    </sect2>
    <sect2>
      <title>Why <command role="hg-cmd">hg backout</command> works as
	it does</title>

      <para>Here's a brief description of how the <command
	  role="hg-cmd">hg backout</command> command works.</para>
      <orderedlist>
	<listitem><para>It ensures that the working directory is
	    <quote>clean</quote>, i.e. that the output of <command
	      role="hg-cmd">hg status</command> would be empty.</para>
	</listitem>
	<listitem><para>It remembers the current parent of the working
	    directory.  Let's call this changeset
	    <literal>orig</literal></para>
	</listitem>
	<listitem><para>It does the equivalent of a <command
	      role="hg-cmd">hg update</command> to sync the working
	    directory to the changeset you want to back out.  Let's
	    call this changeset <literal>backout</literal></para>
	</listitem>
	<listitem><para>It finds the parent of that changeset.  Let's
	    call that changeset <literal>parent</literal>.</para>
	</listitem>
	<listitem><para>For each file that the
	    <literal>backout</literal> changeset affected, it does the
	    equivalent of a <command role="hg-cmd">hg revert -r
	      parent</command> on that file, to restore it to the
	    contents it had before that changeset was
	    committed.</para>
	</listitem>
	<listitem><para>It commits the result as a new changeset.
	    This changeset has <literal>backout</literal> as its
	    parent.</para>
	</listitem>
	<listitem><para>If you specify <option
	      role="hg-opt-backout">--merge</option> on the command
	    line, it merges with <literal>orig</literal>, and commits
	    the result of the merge.</para>
	</listitem></orderedlist>

      <para>An alternative way to implement the <command
	  role="hg-cmd">hg backout</command> command would be to
	<command role="hg-cmd">hg export</command> the
	to-be-backed-out changeset as a diff, then use the <option
	  role="cmd-opt-patch">--reverse</option> option to the
	<command>patch</command> command to reverse the effect of the
	change without fiddling with the working directory.  This
	sounds much simpler, but it would not work nearly as
	well.</para>

      <para>The reason that <command role="hg-cmd">hg
	  backout</command> does an update, a commit, a merge, and
	another commit is to give the merge machinery the best chance
	to do a good job when dealing with all the changes
	<emphasis>between</emphasis> the change you're backing out and
	the current tip.</para>

      <para>If you're backing out a changeset that's 100 revisions
	back in your project's history, the chances that the
	<command>patch</command> command will be able to apply a
	reverse diff cleanly are not good, because intervening changes
	are likely to have <quote>broken the context</quote> that
	<command>patch</command> uses to determine whether it can
	apply a patch (if this sounds like gibberish, see <xref
	  linkend="sec.mq.patch"/> for a
	discussion of the <command>patch</command> command).  Also,
	Mercurial's merge machinery will handle files and directories
	being renamed, permission changes, and modifications to binary
	files, none of which <command>patch</command> can deal
	with.</para>

    </sect2>
  </sect1>
  <sect1 id="sec.undo.aaaiiieee">
    <title>Changes that should never have been</title>

    <para>Most of the time, the <command role="hg-cmd">hg
	backout</command> command is exactly what you need if you want
      to undo the effects of a change.  It leaves a permanent record
      of exactly what you did, both when committing the original
      changeset and when you cleaned up after it.</para>

    <para>On rare occasions, though, you may find that you've
      committed a change that really should not be present in the
      repository at all.  For example, it would be very unusual, and
      usually considered a mistake, to commit a software project's
      object files as well as its source files.  Object files have
      almost no intrinsic value, and they're <emphasis>big</emphasis>,
      so they increase the size of the repository and the amount of
      time it takes to clone or pull changes.</para>

    <para>Before I discuss the options that you have if you commit a
      <quote>brown paper bag</quote> change (the kind that's so bad
      that you want to pull a brown paper bag over your head), let me
      first discuss some approaches that probably won't work.</para>

    <para>Since Mercurial treats history as accumulative&emdash;every
      change builds on top of all changes that preceded it&emdash;you
      generally can't just make disastrous changes disappear.  The one
      exception is when you've just committed a change, and it hasn't
      been pushed or pulled into another repository.  That's when you
      can safely use the <command role="hg-cmd">hg rollback</command>
      command, as I detailed in section <xref
	linkend="sec.undo.rollback"/>.</para>

    <para>After you've pushed a bad change to another repository, you
      <emphasis>could</emphasis> still use <command role="hg-cmd">hg
	rollback</command> to make your local copy of the change
      disappear, but it won't have the consequences you want.  The
      change will still be present in the remote repository, so it
      will reappear in your local repository the next time you
      pull.</para>

    <para>If a situation like this arises, and you know which
      repositories your bad change has propagated into, you can
      <emphasis>try</emphasis> to get rid of the changeefrom
      <emphasis>every</emphasis> one of those repositories.  This is,
      of course, not a satisfactory solution: if you miss even a
      single repository while you're expunging, the change is still
      <quote>in the wild</quote>, and could propagate further.</para>

    <para>If you've committed one or more changes
      <emphasis>after</emphasis> the change that you'd like to see
      disappear, your options are further reduced. Mercurial doesn't
      provide a way to <quote>punch a hole</quote> in history, leaving
      changesets intact.</para>

    <para>XXX This needs filling out.  The
      <literal>hg-replay</literal> script in the
      <literal>examples</literal> directory works, but doesn't handle
      merge changesets.  Kind of an important omission.</para>

    <sect2>
      <title>Protect yourself from <quote>escaped</quote>
	changes</title>

      <para>If you've committed some changes to your local repository
	and they've been pushed or pulled somewhere else, this isn't
	necessarily a disaster.  You can protect yourself ahead of
	time against some classes of bad changeset.  This is
	particularly easy if your team usually pulls changes from a
	central repository.</para>

      <para>By configuring some hooks on that repository to validate
	incoming changesets (see chapter <xref linkend="chap.hook"/>),
	you can
	automatically prevent some kinds of bad changeset from being
	pushed to the central repository at all.  With such a
	configuration in place, some kinds of bad changeset will
	naturally tend to <quote>die out</quote> because they can't
	propagate into the central repository.  Better yet, this
	happens without any need for explicit intervention.</para>

      <para>For instance, an incoming change hook that verifies that a
	changeset will actually compile can prevent people from
	inadvertantly <quote>breaking the build</quote>.</para>

    </sect2>
  </sect1>
  <sect1 id="sec.undo.bisect">
    <title>Finding the source of a bug</title>

    <para>While it's all very well to be able to back out a changeset
      that introduced a bug, this requires that you know which
      changeset to back out.  Mercurial provides an invaluable
      command, called <command role="hg-cmd">hg bisect</command>, that
      helps you to automate this process and accomplish it very
      efficiently.</para>

    <para>The idea behind the <command role="hg-cmd">hg
	bisect</command> command is that a changeset has introduced
      some change of behaviour that you can identify with a simple
      binary test.  You don't know which piece of code introduced the
      change, but you know how to test for the presence of the bug.
      The <command role="hg-cmd">hg bisect</command> command uses your
      test to direct its search for the changeset that introduced the
      code that caused the bug.</para>

    <para>Here are a few scenarios to help you understand how you
      might apply this command.</para>
    <itemizedlist>
      <listitem><para>The most recent version of your software has a
	  bug that you remember wasn't present a few weeks ago, but
	  you don't know when it was introduced.  Here, your binary
	  test checks for the presence of that bug.</para>
      </listitem>
      <listitem><para>You fixed a bug in a rush, and now it's time to
	  close the entry in your team's bug database.  The bug
	  database requires a changeset ID when you close an entry,
	  but you don't remember which changeset you fixed the bug in.
	  Once again, your binary test checks for the presence of the
	  bug.</para>
      </listitem>
      <listitem><para>Your software works correctly, but runs 15%
	  slower than the last time you measured it.  You want to know
	  which changeset introduced the performance regression.  In
	  this case, your binary test measures the performance of your
	  software, to see whether it's <quote>fast</quote> or
	  <quote>slow</quote>.</para>
      </listitem>
      <listitem><para>The sizes of the components of your project that
	  you ship exploded recently, and you suspect that something
	  changed in the way you build your project.</para>
      </listitem></itemizedlist>

    <para>From these examples, it should be clear that the <command
	role="hg-cmd">hg bisect</command> command is not useful only
      for finding the sources of bugs.  You can use it to find any
      <quote>emergent property</quote> of a repository (anything that
      you can't find from a simple text search of the files in the
      tree) for which you can write a binary test.</para>

    <para>We'll introduce a little bit of terminology here, just to
      make it clear which parts of the search process are your
      responsibility, and which are Mercurial's.  A
      <emphasis>test</emphasis> is something that
      <emphasis>you</emphasis> run when <command role="hg-cmd">hg
	bisect</command> chooses a changeset.  A
      <emphasis>probe</emphasis> is what <command role="hg-cmd">hg
	bisect</command> runs to tell whether a revision is good.
      Finally, we'll use the word <quote>bisect</quote>, as both a
      noun and a verb, to stand in for the phrase <quote>search using
	the <command role="hg-cmd">hg bisect</command>
	command</quote>.</para>

    <para>One simple way to automate the searching process would be
      simply to probe every changeset.  However, this scales poorly.
      If it took ten minutes to test a single changeset, and you had
      10,000 changesets in your repository, the exhaustive approach
      would take on average 35 <emphasis>days</emphasis> to find the
      changeset that introduced a bug.  Even if you knew that the bug
      was introduced by one of the last 500 changesets, and limited
      your search to those, you'd still be looking at over 40 hours to
      find the changeset that introduced your bug.</para>

    <para>What the <command role="hg-cmd">hg bisect</command> command
      does is use its knowledge of the <quote>shape</quote> of your
      project's revision history to perform a search in time
      proportional to the <emphasis>logarithm</emphasis> of the number
      of changesets to check (the kind of search it performs is called
      a dichotomic search).  With this approach, searching through
      10,000 changesets will take less than three hours, even at ten
      minutes per test (the search will require about 14 tests).
      Limit your search to the last hundred changesets, and it will
      take only about an hour (roughly seven tests).</para>

    <para>The <command role="hg-cmd">hg bisect</command> command is
      aware of the <quote>branchy</quote> nature of a Mercurial
      project's revision history, so it has no problems dealing with
      branches, merges, or multiple heads in a repository.  It can
      prune entire branches of history with a single probe, which is
      how it operates so efficiently.</para>

    <sect2>
      <title>Using the <command role="hg-cmd">hg bisect</command>
	command</title>

      <para>Here's an example of <command role="hg-cmd">hg
	  bisect</command> in action.</para>

      <note>
	<para>  In versions 0.9.5 and earlier of Mercurial, <command
	    role="hg-cmd">hg bisect</command> was not a core command:
	  it was distributed with Mercurial as an extension. This
	  section describes the built-in command, not the old
	  extension.</para>
      </note>

      <para>Now let's create a repository, so that we can try out the
	<command role="hg-cmd">hg bisect</command> command in
	isolation.</para>

      &interaction.bisect.init;

      <para>We'll simulate a project that has a bug in it in a
	simple-minded way: create trivial changes in a loop, and
	nominate one specific change that will have the
	<quote>bug</quote>.  This loop creates 35 changesets, each
	adding a single file to the repository. We'll represent our
	<quote>bug</quote> with a file that contains the text <quote>i
	  have a gub</quote>.</para>

      &interaction.bisect.commits;

      <para>The next thing that we'd like to do is figure out how to
	use the <command role="hg-cmd">hg bisect</command> command.
	We can use Mercurial's normal built-in help mechanism for
	this.</para>

      &interaction.bisect.help;

      <para>The <command role="hg-cmd">hg bisect</command> command
	works in steps.  Each step proceeds as follows.</para>
      <orderedlist>
	<listitem><para>You run your binary test.</para>
	  <itemizedlist>
	    <listitem><para>If the test succeeded, you tell <command
		  role="hg-cmd">hg bisect</command> by running the
		<command role="hg-cmd">hg bisect good</command>
		command.</para>
	    </listitem>
	    <listitem><para>If it failed, run the <command
		  role="hg-cmd">hg bisect bad</command>
		command.</para></listitem></itemizedlist>
	</listitem>
	<listitem><para>The command uses your information to decide
	    which changeset to test next.</para>
	</listitem>
	<listitem><para>It updates the working directory to that
	    changeset, and the process begins again.</para>
	</listitem></orderedlist>
      <para>The process ends when <command role="hg-cmd">hg
	  bisect</command> identifies a unique changeset that marks
	the point where your test transitioned from
	<quote>succeeding</quote> to <quote>failing</quote>.</para>

      <para>To start the search, we must run the <command
	  role="hg-cmd">hg bisect --reset</command> command.</para>

      &interaction.bisect.search.init;

      <para>In our case, the binary test we use is simple: we check to
	see if any file in the repository contains the string <quote>i
	  have a gub</quote>.  If it does, this changeset contains the
	change that <quote>caused the bug</quote>.  By convention, a
	changeset that has the property we're searching for is
	<quote>bad</quote>, while one that doesn't is
	<quote>good</quote>.</para>

      <para>Most of the time, the revision to which the working
	directory is synced (usually the tip) already exhibits the
	problem introduced by the buggy change, so we'll mark it as
	<quote>bad</quote>.</para>

      &interaction.bisect.search.bad-init;

      <para>Our next task is to nominate a changeset that we know
	<emphasis>doesn't</emphasis> have the bug; the <command
	  role="hg-cmd">hg bisect</command> command will
	<quote>bracket</quote> its search between the first pair of
	good and bad changesets.  In our case, we know that revision
	10 didn't have the bug.  (I'll have more words about choosing
	the first <quote>good</quote> changeset later.)</para>

      &interaction.bisect.search.good-init;

      <para>Notice that this command printed some output.</para>
      <itemizedlist>
	<listitem><para>It told us how many changesets it must
	    consider before it can identify the one that introduced
	    the bug, and how many tests that will require.</para>
	</listitem>
	<listitem><para>It updated the working directory to the next
	    changeset to test, and told us which changeset it's
	    testing.</para>
	</listitem></itemizedlist>

      <para>We now run our test in the working directory.  We use the
	<command>grep</command> command to see if our
	<quote>bad</quote> file is present in the working directory.
	If it is, this revision is bad; if not, this revision is good.
	&interaction.bisect.search.step1;</para>

      <para>This test looks like a perfect candidate for automation,
	so let's turn it into a shell function.</para>
      &interaction.bisect.search.mytest;

      <para>We can now run an entire test step with a single command,
	<literal>mytest</literal>.</para>

      &interaction.bisect.search.step2;

      <para>A few more invocations of our canned test step command,
	and we're done.</para>

      &interaction.bisect.search.rest;

      <para>Even though we had 40 changesets to search through, the
	<command role="hg-cmd">hg bisect</command> command let us find
	the changeset that introduced our <quote>bug</quote> with only
	five tests.  Because the number of tests that the <command
	  role="hg-cmd">hg bisect</command> command performs grows
	logarithmically with the number of changesets to search, the
	advantage that it has over the <quote>brute force</quote>
	search approach increases with every changeset you add.</para>

    </sect2>
    <sect2>
      <title>Cleaning up after your search</title>

      <para>When you're finished using the <command role="hg-cmd">hg
	  bisect</command> command in a repository, you can use the
	<command role="hg-cmd">hg bisect reset</command> command to
	drop the information it was using to drive your search.  The
	command doesn't use much space, so it doesn't matter if you
	forget to run this command.  However, <command
	  role="hg-cmd">hg bisect</command> won't let you start a new
	search in that repository until you do a <command
	  role="hg-cmd">hg bisect reset</command>.</para>

      &interaction.bisect.search.reset;

    </sect2>
  </sect1>
  <sect1>
    <title>Tips for finding bugs effectively</title>

    <sect2>
      <title>Give consistent input</title>

      <para>The <command role="hg-cmd">hg bisect</command> command
	requires that you correctly report the result of every test
	you perform.  If you tell it that a test failed when it really
	succeeded, it <emphasis>might</emphasis> be able to detect the
	inconsistency.  If it can identify an inconsistency in your
	reports, it will tell you that a particular changeset is both
	good and bad. However, it can't do this perfectly; it's about
	as likely to report the wrong changeset as the source of the
	bug.</para>

    </sect2>
    <sect2>
      <title>Automate as much as possible</title>

      <para>When I started using the <command role="hg-cmd">hg
	  bisect</command> command, I tried a few times to run my
	tests by hand, on the command line.  This is an approach that
	I, at least, am not suited to.  After a few tries, I found
	that I was making enough mistakes that I was having to restart
	my searches several times before finally getting correct
	results.</para>

      <para>My initial problems with driving the <command
	  role="hg-cmd">hg bisect</command> command by hand occurred
	even with simple searches on small repositories; if the
	problem you're looking for is more subtle, or the number of
	tests that <command role="hg-cmd">hg bisect</command> must
	perform increases, the likelihood of operator error ruining
	the search is much higher.  Once I started automating my
	tests, I had much better results.</para>

      <para>The key to automated testing is twofold:</para>
      <itemizedlist>
	<listitem><para>always test for the same symptom, and</para>
	</listitem>
	<listitem><para>always feed consistent input to the <command
	      role="hg-cmd">hg bisect</command> command.</para>
	</listitem></itemizedlist>
      <para>In my tutorial example above, the <command>grep</command>
	command tests for the symptom, and the <literal>if</literal>
	statement takes the result of this check and ensures that we
	always feed the same input to the <command role="hg-cmd">hg
	  bisect</command> command.  The <literal>mytest</literal>
	function marries these together in a reproducible way, so that
	every test is uniform and consistent.</para>

    </sect2>
    <sect2>
      <title>Check your results</title>

      <para>Because the output of a <command role="hg-cmd">hg
	  bisect</command> search is only as good as the input you
	give it, don't take the changeset it reports as the absolute
	truth.  A simple way to cross-check its report is to manually
	run your test at each of the following changesets:</para>
      <itemizedlist>
	<listitem><para>The changeset that it reports as the first bad
	    revision.  Your test should still report this as
	    bad.</para>
	</listitem>
	<listitem><para>The parent of that changeset (either parent,
	    if it's a merge). Your test should report this changeset
	    as good.</para>
	</listitem>
	<listitem><para>A child of that changeset.  Your test should
	    report this changeset as bad.</para>
	</listitem></itemizedlist>

    </sect2>
    <sect2>
      <title>Beware interference between bugs</title>

      <para>It's possible that your search for one bug could be
	disrupted by the presence of another.  For example, let's say
	your software crashes at revision 100, and worked correctly at
	revision 50.  Unknown to you, someone else introduced a
	different crashing bug at revision 60, and fixed it at
	revision 80.  This could distort your results in one of
	several ways.</para>

      <para>It is possible that this other bug completely
	<quote>masks</quote> yours, which is to say that it occurs
	before your bug has a chance to manifest itself.  If you can't
	avoid that other bug (for example, it prevents your project
	from building), and so can't tell whether your bug is present
	in a particular changeset, the <command role="hg-cmd">hg
	  bisect</command> command cannot help you directly.  Instead,
	you can mark a changeset as untested by running <command
	  role="hg-cmd">hg bisect --skip</command>.</para>

      <para>A different problem could arise if your test for a bug's
	presence is not specific enough.  If you check for <quote>my
	  program crashes</quote>, then both your crashing bug and an
	unrelated crashing bug that masks it will look like the same
	thing, and mislead <command role="hg-cmd">hg
	  bisect</command>.</para>

      <para>Another useful situation in which to use <command
	  role="hg-cmd">hg bisect --skip</command> is if you can't
	test a revision because your project was in a broken and hence
	untestable state at that revision, perhaps because someone
	checked in a change that prevented the project from
	building.</para>

    </sect2>
    <sect2>
      <title>Bracket your search lazily</title>

      <para>Choosing the first <quote>good</quote> and
	<quote>bad</quote> changesets that will mark the end points of
	your search is often easy, but it bears a little discussion
	nevertheless.  From the perspective of <command
	  role="hg-cmd">hg bisect</command>, the <quote>newest</quote>
	changeset is conventionally <quote>bad</quote>, and the older
	changeset is <quote>good</quote>.</para>

      <para>If you're having trouble remembering when a suitable
	<quote>good</quote> change was, so that you can tell <command
	  role="hg-cmd">hg bisect</command>, you could do worse than
	testing changesets at random.  Just remember to eliminate
	contenders that can't possibly exhibit the bug (perhaps
	because the feature with the bug isn't present yet) and those
	where another problem masks the bug (as I discussed
	above).</para>

      <para>Even if you end up <quote>early</quote> by thousands of
	changesets or months of history, you will only add a handful
	of tests to the total number that <command role="hg-cmd">hg
	  bisect</command> must perform, thanks to its logarithmic
	behaviour.</para>

    </sect2>
  </sect1>
</chapter>

<!--
local variables:
sgml-parent-document: ("00book.xml" "book" "chapter")
end:
-->
author	Dongsheng Song <dongsheng.song@gmail.com>
date	Thu, 12 Mar 2009 15:53:01 +0800
parents	cfdb601a3c8b
children	a13813534ccd