Data file formats for Tcl scripts

Abstract:

A Tcl script sometimes needs to save textual data on disk, and read it back. To avoid writing a parser for the data, you can use a few simple tricks that turn Tcl into a parser for free.

Introduction

A typical Tcl script stores its internal data in lists, dictionaries, and arrays. Suppose you want to write a Tcl application that can save its data on disk and read it back again. For example, the application could save a drawing project and load it back later. Writing the data from the running script to a file is not difficult: just use 'puts' to create a text file. But you also need a way to read the data back into a running script, which seems a lot harder.

You can choose to store the data in a binary form, or in a text file. This article looks only at textual data formats. We will look at a number of possible formats and how to parse them in Tcl. In particular, we will show some simple techniques that make text file parsing a lot easier.

A simple example

Suppose you have a simple drawing tool that places text and rectangle items on a canvas. To save the resulting pictures, you want a textual file format that must be easy to read, both by humans and by your drawing tool. The first and simplest file format that comes to mind, looks something like this:

example_01/datafile.dat
 1 rectangle 10 10 150 50 2 blue
 2 rectangle 7 7 153 53 2 blue
 3 text 80 30 "Simple Drawing Tool" c red

The first two lines of this file represent the data for two blue, horizontally stretched rectangles with a line thickness of 2. The final line places a piece of red text, anchored at the center (hence the "c"), in the middle of the two rectangles.

Saving your data in a text file makes it easier to debug the application, because you can inspect the output to see if everything is correct. It also allows users to manually tinker with the saved data (which may be good or bad depending on your purposes).

When reading a data file in this format, you somehow need to parse the file and create data structures from it. To parse the file, you may be tempted to step through the file line by line, and use something like regexp to analyse the different pieces of each line. This is what such an implementation could look like:

example_01/parser.tcl
 1 canvas .c
 2 pack .c
 3 
 4 set fid [open "datafile.dat" r]
 5 while { ![eof $fid] } {
 6    # Read a line from the file and analyse it.
 7    gets $fid line
 8 
 9    if { [regexp \
10       {^rectangle +([0-9]+) +([0-9]+) +([0-9]+) +([0-9]+) +([0-9]+) +(.*)$} \
11          $line dummy x1 y1 x2 y2 thickness color] } {
12       .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color
13 
14    } elseif { [regexp \
15       {^text +([0-9]+) +([0-9]+) +("[^"]*") +([^ ]+) +(.*)$} \
16          $line dummy x y txt anchor color] } {
17       .c create text $x $y -text $txt -anchor $anchor -fill $color
18 
19    } elseif { [regexp {^ *$} $line] } {
20       # Ignore blank lines
21 
22    } else {
23       puts "error: unknown keyword."
24    }
25 }
26 close $fid

We read one line at a time, and use regular expressions to find out what kind of data the line represents. By looking at the first word, we can distinguish between data for rectangles and data for text. The first word serves as a keyword: it tells us exactly what kind of data we are dealing with. We also parse the coordinates, color and other attributes of each item.

Grouping parts of the regular expression between parentheses allows us to retrieve the parsed results in the variables 'x1', 'x2', etc.

This looks like a simple enough implementation, assuming that you understand how regular expressions work. But I find it pretty hard to maintain. The regular expressions also make it hard to understand.

There is a more elegant solution, known as an "active data file". It is captured in a design pattern, originally written by Nat Pryce. It is based on a very simple suggestion: Instead of writing your own parser in Tcl (using regexp or other means), why not let the Tcl parser do all the work for you?

The Active File design pattern

To explain this design pattern, we continue the example of the simple drawing tool from the previous section. First, we write two procedures in Tcl, one that draws a rectangle, and one that draws text.

example_02/parser.tcl
 1 canvas .c
 2 pack .c
 3 
 4 proc d_rect {x1 y1 x2 y2 thickness color} {
 5    .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color
 6 }
 7 
 8 proc d_text {x y text anchor color} {
 9    .c create text $x $y -text $text -anchor $anchor -fill $color
10 }

To make a picture on the canvas, we can now call these two procedures several times, once for each item we want to draw. To make the same picture as above, we need the following three calls:

example_02/datafile.dat
 1 d_rect 10 10 150 50 2 blue
 2 d_rect 7 7 153 53 2 blue
 3 d_text 80 30 "Simple Drawing Tool" c red

Does this look familiar? The code for calling our two procedures looks almost exactly like the data file we parsed earlier. The only difference is that the keywords have changed from 'rectangle' and 'text' to 'd_rect' and 'd_text', the names of our 2 procedures.

Now we come to the insight that makes this design pattern tick: to parse the data file, we treat it like a Tcl script. The fact that the data file actually contains calls to Tcl procedures, is the heart of this design pattern.

Parsing the data file is now extremely easy:

example_02/parser.tcl
 1 source "datafile.dat"

The built-in Tcl command source reads the file, parses it, and executes the commands in the file. Since we have implemented the procedures d_rect and d_text, the source command will automatically invoke the two procedures with the correct parameters. We will call d_rect and d_text the parsing procedures or parsing commands.

We do not need to do any further parsing. No regular expressions, no line-by-line loop, no opening and closing of files. Just one call to source does the trick.

The data file has become a Tcl script that can be executed. This is called an Active File because it contains executable commands, not just passive data. The Active File design pattern works in many scripting languages, but here we stick to Tcl.

Advantages of using the Active File pattern:

Disadvantages of using the Active File pattern:

Limitations of the Active File pattern:

Syntactic sugar

So far we have been able to come up with a very simple file format:

example_02/datafile.dat
 1 d_rect 10 10 150 50 2 blue
 2 d_rect 7 7 153 53 2 blue
 3 d_text 80 30 "Simple Drawing Tool" c red

And we have a very simple parser for it, using only two parsing procedures and the source command. Now, let's see how we can improve things.

When you look at large volumes of this kind of data, it is easy to get confused by all the command arguments. The first line contains the numbers 10 10 150 50 2, and it takes some training to quickly see the first two as a pair of coordinates, the next two as another pair, and the last one as the line thickness. We can make this easier to read for a programmer by introducing some additional text in the data:

example_03/datafile.dat
 1 d_rect from 10 10 to 150 50 thick 2 clr blue
 2 d_rect from 7 7 to 153 53 thick 2 clr blue
 3 d_text at 80 30 "Simple Drawing Tool" anchor c clr red

Prepositions like to and from, and argument names like thick and color make the data look more like a sentence (in English in this example). To accomodate these new prepositions, our parsing procedure needs to get some additional dummy arguments:

example_03/parser.tcl
 1 proc d_rect {"from" x1 y1 "to" x2 y2 "thick" thickness "clr" color} {
 2    .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color
 3 }

As you can see, the implementation does not change. The new arguments are not used in the procedure's body; their only purpose is to make the data more readable. I make it a habit to make the names of the procedure's parameters the same as the corresponding argument in the data file (e.g. from appears at the same place in the data file and in the parameter list). That way, I can quickly see how one maps to the other. I also learned on the Tcl'ers Wiki to put quotes around the parameter names to make them stand out. It makes no difference to Tcl itself, but it makes the procedure signature more readable.

Introducing dummy arguments for readability is called "syntactic sugar". We will see other ways of making data more readable.

Option/value pairs

The Tk toolkit offers a set of widgets to create graphical interfaces. These widgets are configured with options and their values. The syntax for the configuration is simple (a dash, followed by the option name, followed by the value) and standardized (many other Tcl extensions use the same syntax for configuring their components).

With option/value pairs, our data file looks like this:

example_04/datafile.dat
 1 d_rect -x1 10 -y1 10 -x2 150 -y2 50 -thickness 2
 2 d_rect -thickness 2 -x1 7 -y1 7 -x2 153 -y2 53
 3 d_text -x 80 -y 30 -text "Simple Drawing Tool" -anchor c -color red

I have made the two 'd_rect' calls use a different ordering of their options, just to show you that this is now possible. To parse this data, we need to introduce the parsing of option/value pairs in the parsing procedures d_rect and d_text. Our first attempt is to use dummy arguments (similar to the syntactic sugar above):

 1 proc d_rect {opt1 x1 opt2 y1 opt3 x2 opt4 y2 opt5 thickness opt6 color} {
 2    .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color
 3 }

Again, the implementation of the procedure does not change, because it does not use any of its dummy arguments.

This solution will only work for the simplest of data formats. It has two major disadvantages:

Here is an implementation that solves both these problems, using Tcl arrays:

example_04/parser.tcl
 1 proc d_rect {args} {
 2    # First, specify some defaults.
 3    set arr(-thickness) 1
 4    set arr(-color) blue
 5 
 6    # Then, 'parse' the user-supplied options and values.
 7    array set arr $args
 8 
 9    # Create the rectangle.
10    .c create rectangle $arr(-x1) $arr(-y1) $arr(-x2) $arr(-y2) \
11       -width $arr(-thickness) -outline $arr(-color)
12 }

Instead of a long list of parameters, the parsing procedure now only has one parameter called args, which captures all the actual arguments of the call. The parameters x1, y1 etc have disappeared. They are now handled by a local array.

The first part of the code sets the default values for some options. The second part parses option/value pairs from args. This is done very elegantly with the built-in 'array set' mechanism. It creates new entries in the array arr, using the option names (including the leading dash) as keys into the array, and the option values as the array values.

If the user does not specify -color in the call, we will use the default value of the arr(-color) entry that we set explicitly. If they do specify the color, their value overwrites the default. The final line in the procedure body is the same as in the previous implementations, except that it now uses array entries rather than procedure arguments.

If the user forgets to specify option -x1 in the call, the array entry for -x1 is not set (there is no default for it) and the call to create rectangle results in an error. This example shows that you can give default values to some options, making them optional, while leaving others mandatory by not specifying defaults for them. You may want to provide some user-friendly error message for such cases.

The best format is usually a combination

Now that we have seen some commonly known tricks for Tcl data files (Active File, syntactic sugar, option/value pairs), we can combine their advantages into a single data format. For the mandatory arguments, we should use fixed-position arguments, perhaps combined with dummy prepositions for readability (syntactic sugar). The optional arguments on the other hand, should be handled with the option/value pair mechanism, so that users can leave them out or change their positions in the call. The final format could then look something like this:

example_05/datafile.dat
 1 d_rect from 10 10 to 150 50 -thickness 2
 2 d_rect from 7 7 to 153 53 -color black
 3 d_text at 80 30 "Simple Drawing Tool" -anchor c -color red

assuming that 'blue' is the default color for all items.

As a personal convention, I usually write such commands on multiple lines as follows:

example_05/datafile.dat
 1 d_rect \
 2    from 10 10 \
 3    to 150 50 \
 4    -thickness 2
 5 d_rect \
 6    from 7 7 \
 7    to 153 53 \
 8    -thickness 2
 9 d_text \
10    at 80 30 "Simple Drawing Tool" \
11    -anchor c \
12    -color red

I find it slightly more readable, but that's all a matter of personal taste (or in my case perhaps lack of taste :-).


More complicated data

So far, we have worked on a very simple example involving only rectangles and text on a canvas. The data format was easy to read and easy to parse using the Active File design pattern.

We will now move to a more complex data format, to explain more advanced techniques. This will make you an expert in Tcl data file formats.

The repository tool

I used to collect design patterns. I made a repository of patterns, each with a brief description and some properties. I also kept the names, authors and ISBN numbers of the books in which I found the patterns, as a reference to be able to look them up later. To keep track of all this information, I implemented a repository tool in Tcl. It had features to organize patterns into categories and levels, and to point from each pattern to the book and page number where it was described.

The input to the tool was a file that looked like this:

 1 # First, I describe some books in which you can find good design patterns
 2 # and programming idioms.  Each book, website or other source of patterns
 3 # is specified with the 'Source' keyword, followed by a unique tag and some
 4 # additional information.
 5 
 6 Source GOF {
 7   Design patterns
 8   Elements of reusable object-oriented software
 9   Gamm, Helm, Johnson, Vlissides
10   Addison-Wesley, 1995
11   0 201 63361 2
12 }
13 
14 Source SYST {
15   A system of patterns
16   Pattern-oriented software architecture
17   Buschmann, Meunier, Rohnert, Sommerlad, Stal
18   Wiley, 1996
19   0 471 95869 7
20 }
21 
22 # Next, I describe some categories.  I want to group patterns
23 # in categories so I can find them back more easily.  Each category
24 # has a name (such as "Access control") and a short description.
25 
26 Category "Access control" {
27    How to let one object control the access to one or more
28    other objects.
29 }
30 
31 Category "Distributed systems" {
32    Distributing computation over multiple processes, managing
33    communication between them.
34 }
35 
36 Category "Resource handling" {
37    Preventing memory leaks, managing resources.
38 }
39 
40 Category "Structural decomposition" {
41    To break monoliths down into indpendent components.
42 }
43 
44 # Finally, I describe the patterns themselves.  Each of them has a name,
45 # belongs to one or more categories, and occurs in one or more of the
46 # pattern sources listed above.  Each pattern has a level, which can
47 # be 'arch' (for architectural patterns), 'design' for smaller-scale
48 # design patterns, or 'idiom' for language-specific patterns.
49 
50 Pattern "Broker" {
51   Categories {"Distributed systems"}
52   Level arch 
53   Sources {SYST:99}   ; # This means that this pattern is described in
54                         # the book with tag 'SYST' on page 99.
55   Info {
56     Remote service invocations.
57   }
58 }
59 
60 Pattern "Proxy" {
61   # This pattern fits in two categories:
62   Categories {"Access control" "Structural decomposition::object"}
63   Level design
64   # Both these books talk about the Proxy pattern:
65   Sources {SYST:263 GOF:207}
66   Info {
67     Communicate with a representative rather than with the
68     actual object.
69   }
70 }
71 
72 Pattern "Facade" {
73   Categories {"Access control" "Structural decomposition::object"}
74   Sources {GOF:185}
75   Level design
76   Info {
77     Group sub-interfaces into a single interface.
78   }
79 }
80 
81 Pattern "Counted Pointer" {
82   Categories {"Resource handling"}
83   Level idiom
84   Sources {SYST:353}
85   Info {
86     Reference counting prevents memory leaks.
87   }
88 }

The descriptions of the patterns are short and pretty stupid, but that's OK for this example.

As you can see, this data file has a number of interesting new features:

You may think that this format is a lot more complicated than the one in our previous example, and that it is nearly impossible to write a clean parser for this in Tcl. What may not seem straightforward, is that we can use the Active File pattern again, making the task a lot simpler. The parsing procedures are a bit more elaborate than before, but they are definitely not "complicated". The main trick is to use Tcl's uplevel command to "parse" and even "execute" the struct bodies.

Here's the part of my tool that parses a data file such as the one above:

 1 # We will internally store the data in these three lists:
 2 set l_patterns [list]
 3 set l_sources [list]
 4 set l_categories [list]
 5 
 6 # We also need a variable to keep track of the Pattern structure we are
 7 # currently in:
 8 set curPattern ""
 9 
10 # This is the parsing procedure for the 'Source' keyword.
11 # As you can see, the keyword is followed by an id (the unique tag for the
12 # source), and some textual description of the source.
13 proc Source {id info} {
14    # Remember that we saw this source.
15    global l_sources
16    lappend l_sources $id
17 
18    # Remember the info of this source in a global array.
19    global a_sources
20    set a_sources($id,info) $info
21 }
22 
23 # The parsing procedure for the 'Category' keyword is similar.
24 proc Category {id info} {
25    global l_categories
26    lappend l_categories $id
27 
28    global a_categories
29    set a_categories($id,info) $info
30 }
31 
32 # This is the parsing procedure for the 'Pattern' keyword.
33 # Since a 'Pattern' structure can contain sub-structures,
34 # we use 'uplevel' to recursively handle those.
35 proc Pattern {name args} {
36    global curPattern
37    set curPattern $name   ; # This will be used in the sub-structures,
38                             # which are parsed next.
39    global l_patterns
40    lappend l_patterns $curPattern
41 
42    # We treat the final argument as a piece of Tcl code.
43    # We execute that code in the caller's scope.  It contains calls
44    # to 'Categories', 'Level' and other commands which implement
45    # the sub-structures.
46    # This is similar to how we use the 'source' command to parse the entire
47    # data file.
48    uplevel 1 [lindex $args end]
49 
50    # We're no longer inside a pattern body, so set curPattern to empty.
51    set curPattern ""
52 }
53 
54 # The parsing procedure for one of the sub-structures.  It is called
55 # by 'uplevel' as we described in the comments above.
56 proc Categories {categoryList} {
57    global curPattern   ; # We access the global variable 'curPattern'
58                          # to find out inside which structure we are.
59    global a_patterns
60    set a_patterns($curPattern,categories) $categoryList
61 }
62 
63 # The following parsing procedures are for the other sub-structures
64 # of the Pattern structure.
65 
66 proc Level {level} {
67    global curPattern
68    global a_patterns
69    set a_patterns($curPattern,level) $level
70 }
71 
72 proc Sources {sourceList} {
73    global curPattern
74    global a_patterns
75    # We store the codes such as 'SYST:99' in a global array.
76    # My implementation uses regular expressions to extract the source tag
77    # and the page number from such a code (not shown here).
78    set a_patterns($curPattern,sources) $sourceList
79 }
80 
81 proc Info {info} {
82    global curPattern
83    global a_patterns
84    set a_patterns($curPattern,info) $info
85 }

At first sight, this may seem to take much more work than what we did for the simple canvas example. But think of the power of this technique. With only a few parsing procedures and by making clever use of the uplevel command, we can parse data files with intricate structure, containing comments, nested sub-structures and freeform textual data. Imagine writing a parser for this from scratch.

The data is parsed by the procedures such as Source, Pattern or Info. The parsed data is stored internally in three arrays, and we keep the IDs of all the structures in three lists. The nestedness of the data is handled by calls to uplevel, and by remembering in which 'scope' we currently are using the global variable curPattern.

Note that this technique requires that your data follows Tcl syntax. This implies, among other things, that opening curly braces should be placed at the end of a line, not on the beginning of the next line. This enforces a consistent syntax onto your data, which is actually a Good Thing.

Recursive structures

In the pattern repository example, the structures of type Pattern contain sub-structures of other types such as Info and Sources. What happens when a structure contains sub-structures of the same type? In other words, how do we handle recursive structures?

Suppose, for example, that you want to describe the design of an object-oriented system, which is divided recursively into subsystems:

example_06/datafile.dat
 1 # Description of an object-oriented video game
 2 System VideoGame {
 3   System Maze {
 4     System Walls {
 5       Object WallGenerator
 6       Object TextureMapper
 7     }
 8     System Monsters {
 9       Object FightingEngine
10       Object MonsterManipulator
11     }
12   }
13   System Scores {
14     Object ScoreKeeper
15   }
16 }

To keep track of which System structure we are currently in, it may seem that we need more than just a single global variable like currPattern before. At any point during parsing, we can be inside many nested System structures, so we probably need some kind of stack, on which we push a value when we enter the System parsing procedure, and from which we pop again at the end of the procedure. We can make such a stack using a Tcl list.

But there is a way to avoid maintaining your own stack. It is again based on a very simple suggestion: When you need a stack, see if you can use the function call stack itself. Just store the variables locally in each function call, so that Tcl's call stack takes care of the recursion automatically.

When dealing with such recursive data, I usually implement my parsing procedures like this:

example_06/parser.tcl
 1 set currSystem ""
 2 
 3 proc System {name args} {
 4   # Instead of pushing the new system on the 'stack' of current systems,
 5   # we remember it in a local variable, which ends up on TCL's
 6   # function call stack.
 7   global currSystem
 8   set oldSystem $currSystem
 9   set currSystem $name   ; # Thanks to this, all sub-structures called by
10                            # 'uplevel' will know what the name of their
11                            # immediate parent System is
12 
13   # Store the system in an internal data structure
14   # (details not shown here)
15   puts "Storing system '$currSystem'"
16 
17   # Execute the parsing procedures for the sub-systems
18   uplevel 1 [lindex $args end]
19 
20   # Pop the system off the 'stack' again.  Restore the old system as if nothing happened.
21   set currSystem $oldSystem
22 }
23 
24 proc Object {name} {
25   global currSystem
26   # Store the object in the internal data structure of the current
27   # system (details not shown here)
28   puts "System '$currSystem' contains object '$name'"
29 }
30 
31 source "datafile.dat"

We just store the names in local variables called tmpSystem. Since the parsing procedures are automatically called in a stack-based order by Tcl, we do not need to explicitly push/pop anything.

Another example: a CGI library by Don Libes

The CGI library by Don Libes uses the Active File pattern to represent HTML documents. The idea is that you write a Tcl script that acts as an HTML document and generates pure HTML for you. The documents contain nested structures for bulleted lists, preformatted text and other HTML elements. The parsing procedures call uplevel to handle recursive sub-structures.

Here is a part of Don's code, showing you how he uses some of the tricks described in this article:

 1 # Output preformatted text.  This text must be surrounded by '<pre>' tags.
 2 # Since it can recursively contain other tags such as '<em>' or hyperlinks,
 3 # the procedure uses 'uplevel' on its final argument.
 4 proc cgi_preformatted {args} {
 5    cgi_put "<pre"
 6    cgi_close_proc_push "cgi_puts </pre>"
 7 
 8    if {[llength $args]} {
 9       cgi_put "[cgi_lrange $args 0 [expr [llength $args]-2]]"
10    }
11    cgi_puts ">"
12    uplevel 1 [lindex $args end]
13    cgi_close_proc
14 }
15 
16 # Output a single list bullet.
17 proc cgi_li {args} {
18    cgi_put "<li"
19    if {[llength $args] > 1} {
20       cgi_put "[cgi_lrange $args 0 [expr [llength $args]-2]]"
21    }
22    cgi_puts ">[lindex $args end]"
23 }
24 
25 # Output a bullet list.  It contains list bullets, represented
26 # by calls to 'cgi_li' above.  Those calls are executed thanks
27 # to 'uplevel'.
28 proc cgi_bullet_list {args} {
29    cgi_put "<ul"
30    cgi_close_proc_push "cgi_puts </ul>"
31 
32    if {[llength $args] > 1} {
33       cgi_put "[cgi_lrange $args 0 [expr [llength $args]-2]]"
34    }
35    cgi_puts ">"
36    uplevel 1 [lindex $args end]
37 
38    cgi_close_proc
39 }

I am not going to explain the fine details of this great library, but you can find out for yourself by downloading it from Don's homepage.

Another example: Making Tcl look like C++

I once wrote a (very!) simple parser for C++ class implementations. Lazy as I am, I wrote the parser in Tcl, using many of the techniques in this article. It actually turned out to be too complicated to be of any use, but it shows how far you can go with the Active File pattern. Just look at this "data file" containing something that looks like very twisted C++ code:

 1 // The following is NOT C++, it is Tcl!!
 2 // Note the "documentation string" just before each class and method body.
 3 
 4 class myListElt: public CListElt, private FString {
 5   This is a documentation string for the class 'myListElt'.
 6   You can see multiple inheritance at work here.
 7 } {
 8 
 9 public:
10    method int GetLength(void) {
11       This is the documentation string for the GetLength method.
12    } {
13       // This is the final argument of the 'method' command.
14       // It contains freeform text, so this is where I can write
15       // pure C++ code, including the comment you are now reading.
16       return myLength;
17    }
18 
19    method char* GetString(void) {
20       This is the documentation string for the GetString method.
21    } {
22       append(0);
23       return (char*)data;
24    }
25 
26 private:
27    method virtual void privateMethod(short int p1, short int p2) {
28       A private method with parameters.
29    } {
30       printf("Boo!  p1=%d, p2=%d\n", p1, p2);
31    }
32 }
33 
34 // We need a 'data' command for variables.  We cannot say 'int b'
35 // without writing a command like 'int' for each available type.
36 data short int b {This is the documentation string for 'b'.}
37 data void* somePointer {This is the documentation string for 'somePointer'.}
38 
39 method void error(short int errNo, char* message) {
40    This is a global library procedure, which reports an error message.
41 } {
42    cout << "Hey, there was an error (" << errNo << ") " << message << endl;
43 }
44 
45 cpp_report

This example may be far-fetched, but it gives you an idea of the power of the Active File pattern. What you see is Tcl code, but it looks a lot like C++ code, and it can automatically generate documentation, class diagrams, programming references and of course compilable C++ code.

The parsing procedures such as method and class store the C++ implementation in internal Tcl data structures. Note that we need keywords like 'method' and 'data' which are not part of C++ itself. We need them here because they are the names of the parsing procedures that I wrote.

The call to cpp_report generates the resulting C++ code.

The following fragment from the parser gives you an idea of how you can bend the Tcl interpreter to make it read a file with C++-like syntax:

 1 # This is the parsing procedure for the 'class' keyword.
 2 # Arguments:
 3 # - class name
 4 # - list of inheritance specifications, optional
 5 # - comment block
 6 # - body block
 7 proc class {args} {
 8    global _cpp
 9 
10    # split names from special characters like ':' ',' '*'
11    set cargs [expand [lrange $args 0 [expr [llength $args] - 3]]]
12    # -3 to avoid the comment block and the class body.
13 
14    # First process the name.
15    set className [lindex $cargs 0]
16    if { $_cpp(CL) == "" } {
17       set _cpp(CL) $className   ; # This is like 'currPattern' in the
18                                   # pattern repository example.
19    } else {
20       error "Class definition for $className: we are already inside class $_cpp(CL)"
21    }
22 
23    # Then process the inheritance arguments.
24    # Obvisouly, this is already a lot more complicated than in the
25    # previous examples.
26    set inhr [list]
27    set mode beforeColon
28    set restArgs [lrange $cargs 1 end]
29    foreach arg $restArgs {
30       if { $arg == ":" } {
31          if { $mode != "beforeColon" } {
32             error "Misplaced \":\" in declaration \"class $className $restArgs\""
33          }
34          set mode afterColon
35        } elseif { $arg == "public" || $arg == "private" } {
36          if { $mode != "afterColon" } {
37             error "Misplaced \"$arg\" in declaration \"class $className $restArgs\""
38          }
39          set mode $arg
40        } elseif { $arg == "," } {
41          if { $mode != "afterInherit" } {
42             error "Misplaced \",\" in declaration \"class $className $restArgs\""
43          }
44          set mode afterColon
45        } else {
46          if { $mode != "public" &&  $mode != "private" } {
47             error "Misplaced \"$arg\" in declaration \"class $className $restArgs\""
48          }
49          if { ![IsID $arg] } {
50             warning "$arg is not a valid C++ identifier..."
51          }
52          lappend inhr [list $mode $arg]
53          set mode afterInherit
54       }
55    }
56 
57    if { $mode != "afterInherit"  &&  $mode != "beforeColon" } {
58       error "Missing something at end of declaration \"class $className $restArgs\""
59    }
60 
61    set _cpp(CLih) $inhr
62    set _cpp(CLac) "private"
63 
64    # First execute the comment block.
65    uplevel 1 [list syn_cpp_docClass [lindex $args [expr [llength $args] - 2]]]
66 
67    # Then execute the body, using 'uplevel'.
68    uplevel 1 [list syn_cpp_bodyClass [lindex $args end]]
69 
70    set _cpp(CL) ""
71    set _cpp(CLac) ""
72    set _cpp(CLih) ""
73 }

This is only part of the implementation, just to give you a general idea. I do not actually recommend that you parse C++ code this way, because it leads into all kinds of thorny problems. I just wanted to show an advanced example of the techniques we have described in this article.


Conclusion:

According to Perl's Larry Wall, one of the most important talents of a good programmer is lazyness. Creative lazyness, that is. This article makes two suggestions that both come down to the same thing: be lazy.

"Reuse" is not all about encapsulation and information hiding. Sometimes it's just about being lazy.