Documentation for matchit.vim [ Intro | Credits | Install | Configure | Comments | New Language | Debugging | Details | Bugs | Support | Change Log ]

Introduction

In Vim, as in plain vi, the percent key, %, jumps the cursor from a brace, bracket, or paren to its match. (For details, type :help % from within Vim.) This can be configured with the 'matchpairs' option (:help 'matchpairs'). The script
matchit.vim extends this in several ways:
  • You can match whole words, such as if and endif, not just single characters.
  • You can define groups with more than two words, such as if, else, endif. Banging on the % will cycle from the if to the first else, the next else, ..., the closing endif, and back to the opening if. Nested structures are skipped.
  • By default, words inside comments are ignored, unless the cursor is inside a comment when you type %. If the only thing you want to do is modify the behavior of % so that it treats comments this way, you can
    :let b:match_words = &matchpairs
    and source the script. See the description of comments for details.

Currently, the following languages are supported: Ada, Csh, DTD, Entity, Essbase, Fortran, HTML, LaTeX, Pascal, SGML, Shell, Tcsh, Vim, XML.
To support a new language, see below.

Credits

This script was started by Raul Segura Acevedo. Support for comments was added by Douglas Potts. Support for back references and other improvements were made by Benji Fisher, who is the current maintainer (as of April, 2000). Johannes Zellner added support for many languages. Suggestions for improvement, bug reports, and support for additional languages were contributed by Jordi-Albert Batalla, Neil Bird, Mark Collett, Stephen Wall, and Johannes Zellner.

If you use this script and like it, type :help uganda from within Vim.

Installation

First,
download matchit.vim. For these instructions, I assume that you save it as $VIM/matchit.vim. (This is appropriate on single-user systems. For multi-user systems, something like $HOME/vim/matchit.vim would be more reasonable.) There is nothing magic about the directory nor the name of the file. Note that most users only have to follow Step 2 below.
  1. (optional) If you want to "test drive" before you commit yourself, do
    :source $VIM/matchit.vim
    within Vim. Then open a file with one of the supported file types and start banging on the % key. If you already have the file open, you will have to trigger the autocommand that defines b:match_words, so do
    :set ft=<file type>
    where <file type> is the current file type: tex, html, vim, or whatever.
  2. Add the line
    :source $VIM/matchit.vim
    to your vimrc file. (If you do not have a vimrc file, read :help vimrc within Vim.) It should start working the next time you start Vim. If you are impatient and do not want to restart Vim, see the previous step.
  3. (trouble shooting) What if nothing happens? Check the file matchit.vim and look for a line like
    au FileType html ...
    for the file type you are using. If your file type is not yet supported, go to the section on supporting a new language. If it is supported, return to the file you were editing and type
    :echo b:match_words<CR>
    where <CR> means a carriage-return. If this is not set as indicated in the autocommand, try
    :filetype on
    :set ft=<file type>

    You should consider adding a :filetype on or :syntax on line to your vimrc file (or gvimrc file). If b:match_words is still not set, check that the autocommand has been defined:
    :au Matchit FileType <file type>
    This should show an autocommand to set b:match_words. If it does not, try sourcing matchit.vim again. If the autocommand is there but b:match_words is still not set, I am stumped. Maybe you should replace FileType html in the autocommand with BufRead,BufNewFile *.html or whatever. Maybe it is time to ask for help.

Configuration

There are several variables (and one highlight group) that govern the behavior of matchit.vim. Note that these are variables local to the buffer, not options, so use :let to define them, not :set. Some of these variables have values that matter; for others, it only matters whether the variable has been defined. All of these can be defined in the autocommand that defines b:match_words or "on the fly."

To support a new language

In order for matchit.vim to support a new language, you must define a suitable pattern for b:match_words. You may also want to set some of the other
configuration variables, as described above. If you figure out how to support a new language, please send me a copy so that I can include it with future versions of matchit.vim. If your language has a complicated syntax, or many keywords, you will need to know something about Vim's regular expressions, which are very well documented: :help pattern. If you need help, see the section on support, below.
  1. The format for b:match_words is similar to that of the 'matchpairs' option: it is a colon (:)-separated list of groups; each group is a comma(,)-separated list of patterns (regular expressions). it is OK to have only one group; the effect is undefined if a group has only one pattern. A simple example is
    :let b:match_words = '\<if\>,\<endif\>:\<while\>,\<continue\>,\<break\>,\<endwhile\>'
    (In Vim regular expressions, '\<' and '\>' denote word boundaries. Thus 'if' matches the end of "endif" but '\<if\>' does not.) Then banging on the % key will bounce the cursor between "if" and the matching 'endif"; and from "while" to any matching "continue" or "break", then to the matching "endwhile" and back to the "while".
  2. Once you have defined the appropriate value of b:match_words, you will probably want to have this set automatically each time you edit the appropriate file type. The usual way of doing this is by adding an autocommand, either withing the function Match_autocommands() in the file matchit.vim or in some file that is sourced after you source matchit.vim. Continuing the example above, the autocommand should be something like
    :autocmd Matchit FileType myft let b:match_words =
    \ '\<if\>,\<endif\>:\<while\>,\<continue\>,\<break\>,\<endwhile\>'

    If your version of vim does not support the FileType autocommand event then you can use BufNewFile,BufRead *.myft instead of FileType myft. If you are using Vim 5.3 or earlier then the line continuation here will not work: you will have to modify this to make it a single line, and make similar modifications to matchit.vim.
  3. Be careful that your initial pattern does not match your final pattern. See the example above for the use of word-boundary expressions.
  4. It is usually better to use '.\{-}' (as many as necessary) instead of '.*' (as many as possible). For example, in the string "<tag>label</tag>", '<.*>' matches the whole string whereas '<.\{-}>' and '<[^>]*>' match "<tag> and "</tag>".
  5. If "if" is to be paired with "end if" (Note the space!) then word boundaries are not enough. Instead, define a regular expression notend that will match anything but "end" and use it as follows:
    let notend = '\(^\s*\|[^d\t ]\s\+\)'
    let b:match_words = notend . '\<if\>,\<end\s\+if\>'
    This is a simplified version of what is done for Ada. For details, including how to do it while making notend a local variable, see the autocommand for Ada in matchit.vim. Similarly, you may want to define a start-of-line regular expression
    :let sol = '\(^\|;\)\s*'
    if keywords are only recognized after the start of a line or after a semicolon (;), with optional white space.
  6. In any group, the expressions '\1', '\2', ..., '\9' refer to parts of the initial pattern enclosed in '\('escaped parentheses'\)'. These are referred to as back references, or backrefs. For example, '\<b\(o\+\)\>,\(h\)\1' means that "bo" pairs with "ho" and "boo" pairs with "hoo" and so on. Note that '\1' does not refer to the '\(h\)' in this example. If you have '\('nested '\('parentheses'\)\)' then '\d' refers to the d-th '\(' and everything up to and including the matching '\)'. In '\(nested\(parentheses\)\)', '\1' refers to everything and '\2' refers to '\(parentheses\)'.

    If you use a variable such as notend or sol (as in the previous paragraph) then remember to count any '\(' patterns in this variable.

    It should be possible to resolve back references from any pattern in the group. For example,
    :let b:match_words = '\(foo\)\(bar\),more\1,and\2,end\1\2'
    would not work because '\2' cannot be determined from "morefoo" and '\1' cannot be determined from "andbar". On the other hand,
    :let b:match_words = '\(\(foo\)\(bar\)\),\3\2,end\1'
    should work (and have the same effect as 'foobar,barfoo,endfoobar'), although this has not been thoroughly tested.

  7. (TODO) The special character '&' means "Put the cursor on the following character." For example, if the keyword "if" must occur at the start of the line, with optional white space, you might use the pattern '^\s*&if' so that the cursor will end on the "i" instead of at the start of the line. For another example, if HTML had only one tag then one could
    :let b:match_words = '<,>:<&tag>,<&/tag>'
    so that % can bounce between matching < and > pairs or (starting on "tag" or "/tag") between matching tags.
  8. If you are having trouble figuring out the appropriate definition of b:match_words then you can take advantage of the same information I use when debugging the script. This is especially true if you are not sure whether your patterns or my script are at fault! To make this more convenient, I have made the command :MatchDebug, which defines the variable b:match_debug and creates a Matchit menu. This menu makes it convenient to check the values of the variables described below. You will probably also want to read the detailed description below of what the script does.

    Defining the variable b:match_debug causes the script to set the following variables, each time you hit the % key. Severasl of these are only defined if b:match_words includes backrefs.

    • b:match_pat: b:match_words with backrefs parsed
    • b:match_match: the bit of text that is recognized as a match
    • b:match_col: the cursor column of the start of the matching text
    • b:match_wholeBR: the comma-separated group of patterns that matches, with backrefs unparsed
    • b:match_iniBR: the first pattern in b:match_wholeBR
    • b:match_ini: the first pattern in b:match_wholeBR, with backrefs resolved from b:match_match
    • b:match_tail: the remaining patterns in b:match_wholeBR, with backrefs resolved from b:match_match
    • b:match_word: the pattern from b:match_wholeBR that matches b:match_match
    • b:match_table: The back reference '\'.d refers to the same thing as '\'.table[d] in b:match_word.

Detailed description of the script

Here is an outline of what matchit.vim does each time you hit the % key, which is mapped to call the function Match_wrapper().
  1. If there are backrefs in b:match_words then the first step is to produce a version in which these back references have been eliminated; if there are no backrefs then this step is skipped. I refer to this process as parsing. For example, '\(foo\|bar\),end\1' is parsed to yield '\(foo\|bar\),end\(foo\|bar\)'. This can get tricky, especially if there are nested groups, but that has not yet come up in practice. I am a little concerned that, on some systems, this step may take too long. It seems wasteful to repeat the same parsing each time you hit the % key, but I have not yet figured out a way to avoid this, making sure that the original and parsed versions are kept synchronized. If debugging is turned on, the parsed version is saved as b:match_pat.
  2. Look for a word on the current line that matches the pattern just constructed. Include the patterns from the 'matchpairs' option. Unfortunately, this is a little complicated:
    1. Insist on a match that ends on or after the cusor. Prefer a match that includes the cursor position (that is, one that starts on or before the cursor).
    2. Prefer a match that starts as close to the cursor as possible.
    3. Prefer a match in b:match_words to a match in 'matchpairs'. If more than one pattern in b:match_words matches, choose the one that is listed first.
    It would probably be preferable to switch priorities (b) and (c), but that is very hard to do with regular expressions. (Let me know if you think of a way.)
    • Example: given the pattern '<,>:<tag>,</tag>' with the cursor on or before the "<" in "a <tag> is born". The pattern '<' comes first, so it is preferred over '<tag>', which also matches. If the cursor is on the "t", however, then '<tag>' is preferred, because this matches a bit of text containing the cursor. If the two groups of patterns were reversed then '<' would never be preferred.
    • Example: given the pattern 'if,end if' (Note the space!) and the string "end if". If the cursor starts on the "if" then 'if' matches, which is probably not what you want, but if the cursor starts on the "end " then 'end if' is chosen.
    If there is no match, fall back on the usual behavior of %. If there is a match, move the cursor to its start. If debugging is turned on, the matched bit of text is saved as b:match_match and the cursor column of the start of the match is saved as b:match_col.
  3. Next, the script looks through b:match_words (original and parsed versions) for the group and pattern that match. If debugging is turned on, the group is saved as b:match_ini (the first pattern) and b:match_tail (the rest). If there are backrefs then, in addition, the matching pattern is saved as b:match_word and a table of translations is saved as b:match_table.
  4. If there are backrefs, these are determined from the matching pattern and b:match_match and substituted into each pattern in the matching group.
  5. The script decides whether to search forwards or backwards and calls the function Match_Busca() with appropriate arguments. This function pays attention to whether the cursor is in a comment and so on. It implements the usual algorithm for finding matching parentheses: it counts +1 each time the "openening" pattern is found and -1 each time the "closing" pattern is found, and stops when it reaches zero. "Opening" and "closing" are interpreted correctly depending on the direction of search, and "in between" patterns are treated appropriately.

Known Bugs and Limitations

Just because I know about a bug does not mean that it is on my todo list. I try to be responsive to reports of bugs that cause real problems. If it does not cause serious problems, or if there is a work-around, a bug may sit there for a while. Moral: if a bug (known or not) bothers you,
let me know.
  • I think that the comma (,) and colon (:) are used backwards compared to the syntax for 'matchpairs'. I may change this in the next version.
  • Since the comma (,) and colon (:) are used to separate patterns, they cannot be used as part of a pattern. I suppose this could be solved by interpreting \, and \: as literal characters, but this might be harder than it sounds... (TODO: I will be adding ampersand (&) as another special character.)
  • The script does not recognize \\ as an escaped backslash. Thus \\1, \\(, and \\) cause problems. This is a problem for LaTeX, where it would be nice to add '\\(,\\)' to b:match_words.
  • The script may not treat ^ and $ as start- and end-of-line in all cases. A bug report would help in diagnosing this if, in fact, there is a problem.
  • It would be nice if \0 were recognized as the entire pattern. That is, it would be nice if 'foo,\end\0' had the same effect as '\(foo\),\end\1'. I may try to implement this in a future version. (This is not so easy to arrange as you might think.)

Getting Help

For most purposes, the best source of help is the
vim mailing list. Note that you must subscribe to the list before you can post questions. Why is this usually better than mailing me directly?
  • Someone may already have worked out the language you are trying to support. They may have neglected to send me the definitions, or I may not have gotten around to including them in matchit.vim, or you may not have the most recent version.
  • I may be on vacation, or way behind on my e-mail, or I may have given up supporting this script. The list is always there, and there are helpful people around the world (in many, if not all, time zones) who might help you out.
  • Posting questions to the list has the side effect of advertising this useful script (and thereby stroking my ego). For this reason, please include the URL when posting questions to the list.
  • I read the list, so you are as likely to reach me that way as you are by e-mailing me directly.
If, after all that, you still want to send me e-mail directly (compliments or bug reports, for example) then go ahead!

Change Log

  • 990112 (1999 Ene 12) Raul Segura Acevedo: original author.
  • 24-Feb-2000 pottsdl
    fixed searching mechanism to check for 'comment' syntax attribute, keeps it from finding a false match 'inside' a comment.
  • March 8, 2000 Benji Fisher:
    I added comments, did a few things to simplify it (without changing the function, I hope) and modified the definitions of b:match_words for vim files, added a definition for LaTeX files.
  • March 10, 2000 Benji Fisher:
    I rewrote the function Busca() to make it non-recursive. I succeeded, but it is still very slow. (On a 900-line LaTeX document, it takes 2 or 3 seconds to go from \begin{document} to \end{document}. This is on a 400-MHz Pentium II.)
  • March 19, 2000 Benji Fisher:
    I cleaned things up: removed raw <CR> characters, removed redundancy from Busca(), and so on. I added the b:match_comment option. This speeds it up for LaTeX: on the sample file described above, it takes less than a second.
  • April 6, 2000 Benji Fisher:
    Minor changes: uncomment autocommands at end of file (some lines that had been commented out caused problems), added a test for b:match_words="" to the test for exists("b:match_words"), fixed the vim pattern for aug...aug END. Then posted this as the non-beta version.
  • Late April, 2000 Benji Fisher:
    Following a suggestion of Johannes Zellner, I implemented back references and posted this as the new beta version. This is a major change; I now consider this my own script, not merely a script with which I am tinkering. I posted this for beta testing. Since then, I have fixed a bug or two, played with the definitions of b:match_words, added the b:match_ignorecase option, and started the HTML documentation.
  • May 23, 2000 Benji Fisher:
    I finished the documentation and declared that beta testing was done for this version. Now, I can start thinking about improving the script again!
  • May 27, 2000 Benji Fisher:
    I fixed it so that b:match_match and b:match_col get set (assuming debugging is turned on) even if the script bails out and calls :normal! %. I also postponed the first time that the cursor is moved, eliminated the top_of_screen variable used in defining restore_screen, and fixed the ' mark. I implemented g% to go backwards and I implemented the option b:match_strings_like_comments.
  • July 15, 2000 Benji Fisher:
    I added support for DTD and updated XML, both thanks to Johannes Zellner.
  • November 22, 2000 Benji Fisher:
    I added support for <count>% in Normal mode, after Bram Moolenaar pointed out that this was broken. I have not figured out how to do this in Visual mode.

© benji@member.AMS.org
LAST MODIFICATION: "Fri, 29 Jun 2001 08:53:09 Eastern Daylight Time ()"