From eecc2d45b94513ba95789dfe0ef58aeb8b029049 Mon Sep 17 00:00:00 2001
From: Yuan Fu
parent field: (child (grandchild (…))) +parent field: (node (child (…)))
child, grand, grand-grandchild, etc., are nodes that -begin at point. parent is the parent node of child. +
where node, child, etc, are nodes which begin at point. +parent is the parent of node. node is displayed in +bold typeface. field-names are field names of node and +child, etc.
-If there is no node that starts at point, i.e., point is in the middle -of a node, then the mode-line only displays the smallest node that -spans the position of point, and its immediate parent. +
If no node starts at point, i.e., point is in the middle of a node, +then the mode line displays the earliest node that spans point, and +its immediate parent.
-This minor mode doesn’t create parsers on its own. It simply uses the
-first parser in (treesit-parser-list) (see Using Tree-sitter Parser).
+
This minor mode doesn’t create parsers on its own. It uses the first
+parser in (treesit-parser-list) (see Using Tree-sitter Parser).
Sometimes, the source of a programming language could contain snippets @@ -76,8 +75,22 @@ need to be assigned different parsers. Traditionally, this is achieved by using narrowing. While tree-sitter works with narrowing (see narrowing), the recommended way is -instead to set regions of buffer text in which a parser will operate. +instead to set regions of buffer text (i.e., ranges) in which a parser +will operate. This section describes functions for setting and +getting ranges for a parser.
+Lisp programs should call treesit-update-ranges to make sure
+the ranges for each parser are correct before using parsers in a
+buffer, and call treesit-language-at to figure out the language
+responsible for the text at some position. These two functions don’t
+work by themselves, they need major modes to set
+treesit-range-settings and
+treesit-language-at-point-function, which do the actual work.
+These functions and variables are explained in more detail towards the
+end of the section.
+
This function sets up parser to operate on ranges. The @@ -126,24 +139,6 @@
Like treesit-parser-set-included-ranges, this function sets
-the ranges of parser-or-lang to ranges. Conveniently,
-parser-or-lang could be either a parser or a language. If it is
-a language, this function looks for the first parser in
-(treesit-parser-list) for that language in the current buffer,
-and sets the ranges for it.
-
This function returns the ranges of parser-or-lang, like
-treesit-parser-included-ranges. And like
-treesit-set-ranges, parser-or-lang can be a parser or
-a language symbol.
-
This function matches source with query and returns the
@@ -166,57 +161,56 @@
treesit-query-error error if query is malformed.
This variable holds the list of range functions. Font-locking and -indenting code use functions in this list to set correct ranges for -a language parser before using it. -
-The signature of each function in the list should be: -
-(start end &rest _) -
where start and end specify the region that is about to be -used. A range function only needs to (but is not limited to) update -ranges in that region. -
-The functions in the list are called in order. -
This function is used by font-lock and indentation to update ranges -before using any parser. Each range function in -treesit-range-functions is called in-order. Arguments -start and end are passed to each range function. -
This function tries to figure out which language is responsible for
-the text at buffer position pos. Under the hood it just calls
-treesit-language-at-point-function.
-
Various Lisp programs use this function. For example, the indentation
-program uses this function to determine which language’s rule to use
-in a multi-language buffer. So it is important to provide
-treesit-language-at-point-function for a multi-language major
-mode.
-
Normally, in a set of languages that can be mixed together, there is a -major language and several embedded languages. A Lisp program usually -first parses the whole document with the major language’s parser, sets -ranges for the embedded languages, and then parses the embedded +
It should suffice for general Lisp programs to call the following two +functions in order to support program sources that mixes multiple languages.
-Suppose we need to parse a very simple document that mixes -HTML, CSS and JavaScript: +
This function updates ranges for parsers in the buffer. It makes sure
+the parsers’ ranges are set correctly between beg and end,
+according to treesit-range-settings. If omitted, beg
+defaults to the beginning of the buffer, and end defaults to the
+end of the buffer.
+
For example, fontification functions use this function before querying +for nodes in a region. +
This function returns the language of the text at buffer position
+pos. Under the hood it calls
+treesit-language-at-point-function and returns its return
+value. If treesit-language-at-point-function is nil,
+this function returns the language of the first parser in the returned
+value of treesit-parser-list. If there is no parser in the
+buffer, it returns nil.
+
Normally, in a set of languages that can be mixed together, there is a +host language and one or more embedded languages. A Lisp +program usually first parses the whole document with the host +language’s parser, retrieves some information, sets ranges for the +embedded languages with that information, and then parses the embedded +languages. +
+Take a buffer containing HTML, CSS and JavaScript
+as an example. A Lisp program will first parse the whole buffer with
+an HTML parser, then query the parser for
+style_element and script_element nodes, which
+correspond to CSS and JavaScript text, respectively. Then
+it sets the range of the CSS and JavaScript parser to the
+ranges in which their corresponding nodes span.
+
Given a simple HTML document:
<html> @@ -225,8 +219,8 @@ </html>
We first parse with HTML, then set ranges for CSS -and JavaScript: +
a Lisp program will first parse with a HTML parser, then set +ranges for CSS and JavaScript parsers:
;; Create parsers. @@ -251,10 +245,76 @@ (treesit-parser-set-included-ranges js js-range)
We use a query pattern (style_element (raw_text) @capture)
-to find CSS nodes in the HTML parse tree. For how
-to write query patterns, see Pattern Matching Tree-sitter Nodes.
+
Emacs automates this process in treesit-update-ranges. A
+multi-language major mode should set treesit-range-settings so
+that treesit-update-ranges knows how to perform this process
+automatically. Major modes should use the helper function
+treesit-range-rules to generate a value that can be assigned to
+treesit-range-settings. The settings in the following example
+directly translate into operations shown above.
(setq-local treesit-range-settings + (treesit-range-rules + :embed 'javascript + :host 'html + '((script_element (raw_text) @capture)) +
+ +
:embed 'css + :host 'html + '((style_element (raw_text) @capture)))) +
This function is used to set treesit-range-settings. It +takes care of compiling queries and other post-processing, and outputs +a value that treesit-range-settings can have. +
+It takes a series of query-specs, where each query-spec is +a query preceded by zero or more pairs of keyword and +value. Each query is a tree-sitter query in either the +string, s-expression or compiled form, or a function. +
+If query is a tree-sitter query, it should be preceeded by two
+:keyword value pairs, where the :embed keyword
+specifies the embedded language, and the :host keyword
+specified the host language.
+
treesit-update-ranges uses query to figure out how to set
+the ranges for parsers for the embedded language. It queries
+query in a host language parser, computes the ranges in which
+the captured nodes span, and applies these ranges to embedded
+language parsers.
+
If query is a function, it doesn’t need any :keyword and +value pair. It should be a function that takes 2 arguments, +start and end, and sets the ranges for parsers in the +current buffer in the region between start and end. It is +fine for this function to set ranges in a larger region that +encompasses the region between start and end. +
This variable helps treesit-update-ranges in updating the
+ranges for parsers in the buffer. It is a list of settings
+where the exact format of a setting is considered internal. You
+should use treesit-range-rules to generate a value that this
+variable can have.
+
This variable’s value should be a function that takes a single
+argument, pos, which is a buffer position, and returns the
+language of the buffer text at pos. This variable is used by
+treesit-language-at.
+
treesit-major-mode-setup.
This function is used to set treesit-font-lock-settings. It takes care of compiling queries and other post-processing, and outputs a value that treesit-font-lock-settings accepts. Here’s an @@ -129,13 +129,18 @@ "(script_element) @font-lock-builtin-face")
This function takes a list of text or s-exp queries. Before each
-query, there are :keyword-value pairs that configure
-that query. The :lang keyword sets the query’s language and
-every query must specify the language. The :feature keyword
-sets the feature name of the query. Users can control which features
-are enabled with font-lock-maximum-decoration and
-treesit-font-lock-feature-list (see below).
+
This function takes a series of query-specs, where each +query-spec is a query preceded by multiple pairs of +:keyword and value. Each query is a tree-sitter +query in either the string, s-expression or compiled form. +
+For each query, the :keyword and value pairs add
+meta information to it. The :lang keyword declares
+query’s language. The :feature keyword sets the feature
+name of query. Users can control which features are enabled
+with font-lock-maximum-decoration and
+treesit-font-lock-feature-list (described below). These two
+keywords are mandated.
Other keywords are optional:
@@ -148,7 +153,7 @@keepLisp programs mark patterns in the query with capture names (names +
Lisp programs mark patterns in query with capture names (names
that starts with @), and tree-sitter will return matched nodes
tagged with those same capture names. For the purpose of
fontification, capture names in query should be face names like
@@ -230,9 +235,10 @@
A list of settings for tree-sitter based font lock. The exact format
-of this variable is considered internal. One should always use
+of each setting is considered internal. One should always use
treesit-font-lock-rules to set this variable.
-
Multi-language major modes should provide range functions in
treesit-range-functions, and Emacs will set the ranges
diff --git a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
index 2fdb50df7c1..5ea1f9bc332 100644
--- a/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
+++ b/admin/notes/tree-sitter/html-manual/Parser_002dbased-Indentation.html
@@ -106,7 +106,8 @@
rule is applicable. Then Emacs passes the node to anchor, which
returns a buffer position. Emacs takes the column number of that
position, adds offset to it, and the result is the indentation
-column for the current line.
+column for the current line. offset can be an integer or a
+variable whose value is an integer.
The matcher and anchor are functions, and Emacs provides
convenient defaults for them.
@@ -117,8 +118,8 @@
position of the first non-whitespace character after the beginning of
the line. The argument node is the largest (highest-in-tree)
node that starts at that position; and parent is the parent of
-node. However, when that position is on a whitespace or inside
-a multi-line string, no node that starts at that position, so
+node. However, when that position is in a whitespace or inside
+a multi-line string, no node can start at that position, so
node is nil. In that case, parent would be the
smallest node that spans that position.
This anchor is a function that is called with 3 arguments: node, parent, and bol, and returns the first non-whitespace charater on the previous line. +
+point-min ¶This anchor is a function is called with 3 arguments: node, +parent, and bol, and returns the beginning of the buffer. +This is useful as the beginning of the buffer is always at column 0.
This function traverses the subtree of node (including
node itself), looking for a node for which predicate
returns non-nil. predicate is a regexp that is matched
-(case-insensitively) against each node’s type, or a predicate function
-that takes a node and returns non-nil if the node matches. The
-function returns the first node that matches, or nil if none
-does.
+against each node’s type, or a predicate function that takes a node
+and returns non-nil if the node matches. The function returns
+the first node that matches, or nil if none does.
By default, this function only traverses named nodes, but if all
is non-nil, it traverses all the nodes. If backward is
@@ -279,9 +278,9 @@
Like treesit-search-subtree, this function also traverses the
parse tree and matches each node with predicate (except for
-start), where predicate can be a (case-insensitive) regexp
-or a function. For a tree like the below where start is marked
-S, this function traverses as numbered from 1 to 12:
+start), where predicate can be a regexp or a function.
+For a tree like the below where start is marked S, this function
+traverses as numbered from 1 to 12:
12 @@ -336,8 +335,8 @@It takes the subtree under root, and combs it so only the nodes that match predicate are left. Like previous functions, the predicate can be a regexp string that matches against each -node’s type case-insensitively, or a function that takes a node and -return non-
nilif it matches. +node’s type, or a function that takes a node and return non-nil+if it matches.For example, for a subtree on the left that consist of both numbers and letters, if predicate is “letter only”, the returned tree