Mercurial > public > mercurial-scm > hg
annotate mercurial/minirst.py @ 9737:5f101af4a921
minirst: combine list parsing in one function
Bullet, option, field, and definition lists were parsed very similar
code. They are now parsed by a single function (splitparagraphs).
Some logic from the old parsing functions has been moved down to
formatblock. This simplifies the parsing while putting the logic where
it's really needed.
author | Martin Geisler <mg@lazybytes.net> |
---|---|
date | Fri, 06 Nov 2009 00:30:35 +0100 |
parents | 97d0d910fa5d |
children | f52c4f7a4732 |
rev | line source |
---|---|
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
1 # minirst.py - minimal reStructuredText parser |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
2 # |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
3 # Copyright 2009 Matt Mackall <mpm@selenic.com> and others |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
4 # |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
5 # This software may be used and distributed according to the terms of the |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
6 # GNU General Public License version 2, incorporated herein by reference. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
7 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
8 """simplified reStructuredText parser. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
9 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
10 This parser knows just enough about reStructuredText to parse the |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
11 Mercurial docstrings. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
12 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
13 It cheats in a major way: nested blocks are not really nested. They |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
14 are just indented blocks that look like they are nested. This relies |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
15 on the user to keep the right indentation for the blocks. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
16 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
17 It only supports a small subset of reStructuredText: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
18 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
19 - paragraphs |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
20 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
21 - definition lists (must use ' ' to indent definitions) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
22 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
23 - lists (items must start with '-') |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
24 |
9293
e48a48b754d3
minirst: parse field lists
Martin Geisler <mg@lazybytes.net>
parents:
9292
diff
changeset
|
25 - field lists (colons cannot be escaped) |
e48a48b754d3
minirst: parse field lists
Martin Geisler <mg@lazybytes.net>
parents:
9292
diff
changeset
|
26 |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
27 - literal blocks |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
28 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
29 - option lists (supports only long options without arguments) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
30 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
31 - inline markup is not recognized at all. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
32 """ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
33 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
34 import re, sys, textwrap |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
35 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
36 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
37 def findblocks(text): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
38 """Find continuous blocks of lines in text. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
39 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
40 Returns a list of dictionaries representing the blocks. Each block |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
41 has an 'indent' field and a 'lines' field. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
42 """ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
43 blocks = [[]] |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
44 lines = text.splitlines() |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
45 for line in lines: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
46 if line.strip(): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
47 blocks[-1].append(line) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
48 elif blocks[-1]: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
49 blocks.append([]) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
50 if not blocks[-1]: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
51 del blocks[-1] |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
52 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
53 for i, block in enumerate(blocks): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
54 indent = min((len(l) - len(l.lstrip())) for l in block) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
55 blocks[i] = dict(indent=indent, lines=[l[indent:] for l in block]) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
56 return blocks |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
57 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
58 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
59 def findliteralblocks(blocks): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
60 """Finds literal blocks and adds a 'type' field to the blocks. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
61 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
62 Literal blocks are given the type 'literal', all other blocks are |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
63 given type the 'paragraph'. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
64 """ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
65 i = 0 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
66 while i < len(blocks): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
67 # Searching for a block that looks like this: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
68 # |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
69 # +------------------------------+ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
70 # | paragraph | |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
71 # | (ends with "::") | |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
72 # +------------------------------+ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
73 # +---------------------------+ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
74 # | indented literal block | |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
75 # +---------------------------+ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
76 blocks[i]['type'] = 'paragraph' |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
77 if blocks[i]['lines'][-1].endswith('::') and i+1 < len(blocks): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
78 indent = blocks[i]['indent'] |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
79 adjustment = blocks[i+1]['indent'] - indent |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
80 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
81 if blocks[i]['lines'] == ['::']: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
82 # Expanded form: remove block |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
83 del blocks[i] |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
84 i -= 1 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
85 elif blocks[i]['lines'][-1].endswith(' ::'): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
86 # Partially minimized form: remove space and both |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
87 # colons. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
88 blocks[i]['lines'][-1] = blocks[i]['lines'][-1][:-3] |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
89 else: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
90 # Fully minimized form: remove just one colon. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
91 blocks[i]['lines'][-1] = blocks[i]['lines'][-1][:-1] |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
92 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
93 # List items are formatted with a hanging indent. We must |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
94 # correct for this here while we still have the original |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
95 # information on the indentation of the subsequent literal |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
96 # blocks available. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
97 if blocks[i]['lines'][0].startswith('- '): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
98 indent += 2 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
99 adjustment -= 2 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
100 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
101 # Mark the following indented blocks. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
102 while i+1 < len(blocks) and blocks[i+1]['indent'] > indent: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
103 blocks[i+1]['type'] = 'literal' |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
104 blocks[i+1]['indent'] -= adjustment |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
105 i += 1 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
106 i += 1 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
107 return blocks |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
108 |
9737
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
109 _bulletre = re.compile(r'- ') |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
110 _optionre = re.compile(r'^(--[a-z-]+)((?:[ =][a-zA-Z][\w-]*)? +)(.*)$') |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
111 _fieldre = re.compile(r':(?![: ])([^:]*)(?<! ):( +)(.*)') |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
112 _definitionre = re.compile(r'[^ ]') |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
113 |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
114 def splitparagraphs(blocks): |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
115 """Split paragraphs into lists.""" |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
116 # Tuples with (list type, item regexp, single line items?). Order |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
117 # matters: definition lists has the least specific regexp and must |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
118 # come last. |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
119 listtypes = [('bullet', _bulletre, True), |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
120 ('option', _optionre, True), |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
121 ('field', _fieldre, True), |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
122 ('definition', _definitionre, False)] |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
123 |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
124 def match(lines, i, itemre, singleline): |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
125 """Does itemre match an item at line i? |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
126 |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
127 A list item can be followed by an idented line or another list |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
128 item (but only if singleline is True). |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
129 """ |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
130 line1 = lines[i] |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
131 line2 = i+1 < len(lines) and lines[i+1] or '' |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
132 if not itemre.match(line1): |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
133 return False |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
134 if singleline: |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
135 return line2 == '' or line2[0] == ' ' or itemre.match(line2) |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
136 else: |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
137 return line2.startswith(' ') |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
138 |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
139 i = 0 |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
140 while i < len(blocks): |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
141 if blocks[i]['type'] == 'paragraph': |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
142 lines = blocks[i]['lines'] |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
143 for type, itemre, singleline in listtypes: |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
144 if match(lines, 0, itemre, singleline): |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
145 items = [] |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
146 for j, line in enumerate(lines): |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
147 if match(lines, j, itemre, singleline): |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
148 items.append(dict(type=type, lines=[], |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
149 indent=blocks[i]['indent'])) |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
150 items[-1]['lines'].append(line) |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
151 blocks[i:i+1] = items |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
152 break |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
153 i += 1 |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
154 return blocks |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
155 |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
156 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
157 def findsections(blocks): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
158 """Finds sections. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
159 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
160 The blocks must have a 'type' field, i.e., they should have been |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
161 run through findliteralblocks first. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
162 """ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
163 for block in blocks: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
164 # Searching for a block that looks like this: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
165 # |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
166 # +------------------------------+ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
167 # | Section title | |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
168 # | ------------- | |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
169 # +------------------------------+ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
170 if (block['type'] == 'paragraph' and |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
171 len(block['lines']) == 2 and |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
172 block['lines'][1] == '-' * len(block['lines'][0])): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
173 block['type'] = 'section' |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
174 return blocks |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
175 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
176 |
9623
32727ce029de
minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents:
9540
diff
changeset
|
177 def inlineliterals(blocks): |
32727ce029de
minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents:
9540
diff
changeset
|
178 for b in blocks: |
32727ce029de
minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents:
9540
diff
changeset
|
179 if b['type'] == 'paragraph': |
32727ce029de
minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents:
9540
diff
changeset
|
180 b['lines'] = [l.replace('``', '"') for l in b['lines']] |
32727ce029de
minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents:
9540
diff
changeset
|
181 return blocks |
32727ce029de
minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents:
9540
diff
changeset
|
182 |
32727ce029de
minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents:
9540
diff
changeset
|
183 |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
184 def addmargins(blocks): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
185 """Adds empty blocks for vertical spacing. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
186 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
187 This groups bullets, options, and definitions together with no vertical |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
188 space between them, and adds an empty block between all other blocks. |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
189 """ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
190 i = 1 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
191 while i < len(blocks): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
192 if (blocks[i]['type'] == blocks[i-1]['type'] and |
9293
e48a48b754d3
minirst: parse field lists
Martin Geisler <mg@lazybytes.net>
parents:
9292
diff
changeset
|
193 blocks[i]['type'] in ('bullet', 'option', 'field', 'definition')): |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
194 i += 1 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
195 else: |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
196 blocks.insert(i, dict(lines=[''], indent=0, type='margin')) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
197 i += 2 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
198 return blocks |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
199 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
200 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
201 def formatblock(block, width): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
202 """Format a block according to width.""" |
9417
4c3fb45123e5
util, minirst: do not crash with COLUMNS=0
Martin Geisler <mg@lazybytes.net>
parents:
9293
diff
changeset
|
203 if width <= 0: |
4c3fb45123e5
util, minirst: do not crash with COLUMNS=0
Martin Geisler <mg@lazybytes.net>
parents:
9293
diff
changeset
|
204 width = 78 |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
205 indent = ' ' * block['indent'] |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
206 if block['type'] == 'margin': |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
207 return '' |
9735
97d0d910fa5d
minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents:
9623
diff
changeset
|
208 if block['type'] == 'literal': |
9291
cd5b6a11b607
minirst: indent literal blocks with two spaces
Martin Geisler <mg@lazybytes.net>
parents:
9156
diff
changeset
|
209 indent += ' ' |
cd5b6a11b607
minirst: indent literal blocks with two spaces
Martin Geisler <mg@lazybytes.net>
parents:
9156
diff
changeset
|
210 return indent + ('\n' + indent).join(block['lines']) |
9735
97d0d910fa5d
minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents:
9623
diff
changeset
|
211 if block['type'] == 'section': |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
212 return indent + ('\n' + indent).join(block['lines']) |
9735
97d0d910fa5d
minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents:
9623
diff
changeset
|
213 if block['type'] == 'definition': |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
214 term = indent + block['lines'][0] |
9737
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
215 hang = len(block['lines'][-1]) - len(block['lines'][-1].lstrip()) |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
216 defindent = indent + hang * ' ' |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
217 text = ' '.join(map(str.strip, block['lines'][1:])) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
218 return "%s\n%s" % (term, textwrap.fill(text, width=width, |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
219 initial_indent=defindent, |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
220 subsequent_indent=defindent)) |
9735
97d0d910fa5d
minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents:
9623
diff
changeset
|
221 initindent = subindent = indent |
97d0d910fa5d
minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents:
9623
diff
changeset
|
222 if block['type'] == 'bullet': |
97d0d910fa5d
minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents:
9623
diff
changeset
|
223 subindent = indent + ' ' |
9737
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
224 elif block['type'] == 'field': |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
225 m = _fieldre.match(block['lines'][0]) |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
226 if m: |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
227 key, spaces, rest = m.groups() |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
228 # Turn ":foo: bar" into "foo bar". |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
229 block['lines'][0] = '%s %s%s' % (key, spaces, rest) |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
230 subindent = indent + (2 + len(key) + len(spaces)) * ' ' |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
231 elif block['type'] == 'option': |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
232 m = _optionre.match(block['lines'][0]) |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
233 if m: |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
234 option, arg, rest = m.groups() |
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
235 subindent = indent + (len(option) + len(arg)) * ' ' |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
236 |
9737
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
237 text = ' '.join(map(str.strip, block['lines'])) |
9735
97d0d910fa5d
minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents:
9623
diff
changeset
|
238 return textwrap.fill(text, width=width, |
97d0d910fa5d
minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents:
9623
diff
changeset
|
239 initial_indent=initindent, |
97d0d910fa5d
minirst: remove unnecessary "elif:" statements
Martin Geisler <mg@lazybytes.net>
parents:
9623
diff
changeset
|
240 subsequent_indent=subindent) |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
241 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
242 |
9540
cad36e496640
help: un-indent help topics
Martin Geisler <mg@lazybytes.net>
parents:
9417
diff
changeset
|
243 def format(text, width, indent=0): |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
244 """Parse and format the text according to width.""" |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
245 blocks = findblocks(text) |
9540
cad36e496640
help: un-indent help topics
Martin Geisler <mg@lazybytes.net>
parents:
9417
diff
changeset
|
246 for b in blocks: |
cad36e496640
help: un-indent help topics
Martin Geisler <mg@lazybytes.net>
parents:
9417
diff
changeset
|
247 b['indent'] += indent |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
248 blocks = findliteralblocks(blocks) |
9623
32727ce029de
minirst: convert ``foo`` into "foo" upon display
Martin Geisler <mg@lazybytes.net>
parents:
9540
diff
changeset
|
249 blocks = inlineliterals(blocks) |
9737
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
250 blocks = splitparagraphs(blocks) |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
251 blocks = findsections(blocks) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
252 blocks = addmargins(blocks) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
253 return '\n'.join(formatblock(b, width) for b in blocks) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
254 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
255 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
256 if __name__ == "__main__": |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
257 from pprint import pprint |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
258 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
259 def debug(func, blocks): |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
260 blocks = func(blocks) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
261 print "*** after %s:" % func.__name__ |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
262 pprint(blocks) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
263 print |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
264 return blocks |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
265 |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
266 text = open(sys.argv[1]).read() |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
267 blocks = debug(findblocks, text) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
268 blocks = debug(findliteralblocks, blocks) |
9737
5f101af4a921
minirst: combine list parsing in one function
Martin Geisler <mg@lazybytes.net>
parents:
9735
diff
changeset
|
269 blocks = debug(splitparagraphs, blocks) |
9156
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
270 blocks = debug(findsections, blocks) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
271 blocks = debug(addmargins, blocks) |
c9c7e8cdac9c
minimal reStructuredText parser
Martin Geisler <mg@lazybytes.net>
parents:
diff
changeset
|
272 print '\n'.join(formatblock(b, 30) for b in blocks) |