Add line-column tracking for tree-sitter

Add line-column tracking for tree-sitter parsers.  Copied from
comments in treesit.c:

   Technically we had to send tree-sitter the line and column
   position of each edit.  But in practice we just send it dummy
   values, because tree-sitter doesn't use it for parsing and
   mostly just carries the line and column positions around and
   return it when e.g. reporting node positions[1].  This has
   been working fine until we encountered grammars that actually
   utilizes the line and column information for
   parsing (Haskell)[2].

   [1] https://github.com/tree-sitter/tree-sitter/issues/445
   [2] https://github.com/tree-sitter/tree-sitter/issues/4001

   So now we have to keep track of line and column positions and
   pass valid values to tree-sitter.  (It adds quite some
   complexity, but only linearly; one can ignore all the linecol
   stuff when trying to understand treesit code and then come
   back to it later.)  Eli convinced me to disable tracking by
   default, and only enable it for languages that needs it.  So
   the buffer starts out not tracking linecol.  And when a
   parser is created, if the language is in
   treesit-languages-require-line-column-tracking, we enable
   tracking in the buffer, and enable tracking for the parser.
   To simplify things, once a buffer starts tracking linecol, it
   never disables tracking, even if parsers that need tracking
   are all deleted; and for parsers, tracking is determined at
   creation time, if it starts out tracking/non-tracking, it
   stays that way, regardless of later changes to
   treesit-languages-require-line-column-tracking.

   To make calculating line/column positons fast, we store
   linecol caches for begv, point, and zv in the
   buffer (buf->ts_linecol_cache_xxx); and in the parser object,
   we store linecol cache for visible beg/end of that parser.

   In buffer editing functions, we need the linecol for
   start/old_end/new_end, those can be calculated by scanning
   newlines (treesit_linecol_of_pos) from the buffer point
   cache, which should be always near the point.  And we usually
   set the calculated linecol of new_end back to the buffer
   point cache.

   We also need to calculate linecol for the visible_beg/end for
   each parser, and linecol for the buffer's begv/zv, these
   positions are usually far from point, so we have caches for
   all of them (in either the parser object or the buffer).
   These positions are far from point, so it's inefficient to
   scan newlines from point to there to get up-to-date linecol
   for them; but in the same time, because they're far and
   outside the changed region, we can calculate their change in
   line and column number by simply counting how much newlines
   are added/removed in the changed
   region (compute_new_linecol_by_change).

* doc/lispref/parsing.texi (Using Parser): Mention line-column
tracking in manual.
* etc/NEWS: Add news.
* lisp/treesit.el:
(treesit-languages-need-line-column-tracking): New variable.
* src/buffer.c: Include treesit.h (for TREESIT_EMPTY_LINECOL).
(Fget_buffer_create):
(Fmake_indirect_buffer): Initialize new buffer fields.
(Fbuffer_swap_text): Add new buffer fields.
* src/buffer.h (ts_linecol): New struct.
(buffer): New buffer fields.
(BUF_TS_LINECOL_BEGV):
(BUF_TS_LINECOL_POINT):
(BUF_TS_LINECOL_ZV):
(SET_BUF_TS_LINECOL_BEGV):
(SET_BUF_TS_LINECOL_POINT):
(SET_BUF_TS_LINECOL_ZV): New inline functions.
* src/casefiddle.c (casify_region): Record linecol info.
* src/editfns.c (Fsubst_char_in_region):
(Ftranslate_region_internal):
(Ftranspose_regions): Record linecol info.
* src/insdel.c (insert_1_both):
(insert_from_string_1):
(insert_from_gap_1):
(insert_from_buffer):
(replace_range):
(del_range_2): Record linecol info.
* src/treesit.c (TREESIT_BOB_LINECOL):
(TREESIT_EMPTY_LINECOL):
(TREESIT_TS_POINT_1_0): New constants.
(treesit_debug_print_linecol):
(treesit_buf_tracks_linecol_p):
(restore_restriction_and_selective_display):
(treesit_count_lines):
(treesit_debug_validate_linecol):
(treesit_linecol_of_pos):
(treesit_make_ts_point):
(Ftreesit_tracking_line_column_p):
(Ftreesit_parser_tracking_line_column_p): New functions.
(treesit_tree_edit_1): Accept real TSPoint and pass to
tree-sitter.
(compute_new_linecol_by_change): New function.
(treesit_record_change_1): Rename from treesit_record_change,
handle linecol if tracking is enabled.
(treesit_linecol_maybe): New function.
(treesit_record_change): New wrapper around
treesit_record_change_1 that handles some boilerplate and sets
buffer state.
(treesit_sync_visible_region): Handle linecol if tracking is
enabled.
(make_treesit_parser): Setup parser's linecol cache if tracking
is enabled.
(Ftreesit_parser_create): Enable tracking if the parser's
language requires it.
(Ftreesit__linecol_at):
(Ftreesit__linecol_cache_set):
(Ftreesit__linecol_cache): New functions for debugging and
testing.
(syms_of_treesit): New variable
Vtreesit_languages_require_line_column_tracking.
* src/treesit.h (Lisp_TS_Parser): New fields.
(TREESIT_BOB_LINECOL):
(TREESIT_EMPTY_LINECOL): New constants.
* test/src/treesit-tests.el (treesit-linecol-basic):
(treesit-linecol-search-back-across-newline):
(treesit-linecol-col-same-line):
(treesit-linecol-enable-disable): New tests.
* src/lisp.h: Declare display_count_lines.
* src/xdisp.c (display_count_lines): Remove static keyword.
This commit is contained in:
Yuan Fu 2025-03-18 17:26:26 -07:00
parent 159e3a981e
commit 1897da0b59
No known key found for this signature in database
GPG key ID: 56E19BC57664A442
13 changed files with 1086 additions and 69 deletions

View file

@ -419,6 +419,27 @@ tree-sitter can be activated. Major modes should check this value
when deciding whether to enable tree-sitter features.
@end defvar
@defvar treesit-languages-require-line-column-tracking
Emacs by default doesn't keep track of line and column numbers for
positions in a buffer. However, some language grammars utilize the line
and column information for parsing. If parsers of these languages are
created in a buffer, Emacs will turn on line and column tracking and
report these information to these parsers. Once the buffer starts
tracking line and column, it never stops doing so. And once a parser is
created as tracking/not-tracking line and column, it stays that way
regardless of changes to this variable.
This variable is a list of languages that require line and column
tracking. The vast majority of languages don't need line and column
information. So far, only Haskell is known to need it.
@findex treesit-tracking-line-column-p
@findex treesit-parser-tracking-line-column-p
User can use @code{treesit-tracking-line-column-p} and
@code{treesit-parser-tracking-line-column-p} to check if a buffer or
parser is tracking line and column, respectively.
@end defvar
@cindex creating tree-sitter parsers
@cindex tree-sitter parser, creating
@defun treesit-parser-create language &optional buffer no-reuse tag

View file

@ -682,6 +682,23 @@ tree-sitter modes.
Users can customize this variable to add simple custom indentation rules
for tree-sitter major modes.
+++
*** New variable 'treesit-languages-require-line-column-tracking'
Now Emacs can optionally track line and column numbers for buffer edits
and send that information to tree-sitter parsers. Parsers of languages
in this list will receive line and column information. This is only
needed for very few languages. So far only Haskell is known to need it.
+++
*** New function 'treesit-tracking-line-column-p'
New function to check if a buffer is tracking line and column for buffer
edits.
+++
*** New function 'treesit-parser-tracking-line-column-p'
New function to check if a parser is receiving line and column
information.
+++
*** 'treesit-language-at-point-function' is now optional.
Multi-language major modes can rely on the default return value from

View file

@ -1229,7 +1229,11 @@ omitted, default END to BEG."
return rng
finally return nil))))
;;; Language display name
;;; Language
;; Defined in tressit.c. This is just to add some default values.
(defvar treesit-languages-need-line-column-tracking
'(haskell))
;; The entries are sorted by `sort-lines'.
(defvar treesit-language-display-name-alist

View file

@ -48,6 +48,10 @@ along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
#include "w32heap.h" /* for mmap_* */
#endif
#ifdef HAVE_TREE_SITTER
#include "treesit.h"
#endif
/* Work around GCC bug 109847
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109847
which causes GCC to mistakenly complain about
@ -641,6 +645,13 @@ even if it is dead. The return value is never nil. */)
bset_width_table (b, Qnil);
b->prevent_redisplay_optimizations_p = 1;
#ifdef HAVE_TREE_SITTER
/* By default, use empty linecol, which means disable tracking. */
SET_BUF_TS_LINECOL_BEGV (b, TREESIT_EMPTY_LINECOL);
SET_BUF_TS_LINECOL_POINT (b, TREESIT_EMPTY_LINECOL);
SET_BUF_TS_LINECOL_ZV (b, TREESIT_EMPTY_LINECOL);
#endif
/* An ordinary buffer normally doesn't need markers
to handle BEGV and ZV. */
bset_pt_marker (b, Qnil);
@ -867,6 +878,13 @@ Interactively, CLONE and INHIBIT-BUFFER-HOOKS are nil. */)
b->bidi_paragraph_cache = 0;
bset_width_table (b, Qnil);
#ifdef HAVE_TREE_SITTER
/* By default, use empty linecol, which means disable tracking. */
SET_BUF_TS_LINECOL_BEGV (b, TREESIT_EMPTY_LINECOL);
SET_BUF_TS_LINECOL_POINT (b, TREESIT_EMPTY_LINECOL);
SET_BUF_TS_LINECOL_ZV (b, TREESIT_EMPTY_LINECOL);
#endif
name = Fcopy_sequence (name);
set_string_intervals (name, NULL);
bset_name (b, name);
@ -2618,6 +2636,13 @@ results, see Info node `(elisp)Swapping Text'. */)
bset_point_before_scroll (current_buffer, Qnil);
bset_point_before_scroll (other_buffer, Qnil);
#ifdef HAVE_TREE_SITTER
swapfield_ (ts_parser_list, Lisp_Object);
swapfield (ts_linecol_begv, struct ts_linecol);
swapfield (ts_linecol_point, struct ts_linecol);
swapfield (ts_linecol_zv, struct ts_linecol);
#endif
modiff_incr (&current_buffer->text->modiff, 1);
modiff_incr (&other_buffer->text->modiff, 1);
modiff_incr (&current_buffer->text->chars_modiff, 1);

View file

@ -220,6 +220,20 @@ extern ptrdiff_t advance_to_char_boundary (ptrdiff_t byte_pos);
/* Define the actual buffer data structures. */
/* This data structure stores the cache of a position and its line and
column number. The column number is counted in bytes. The line
number and column number don't respect narrowing. */
struct ts_linecol
{
/* The byte position. */
ptrdiff_t bytepos;
/* The line number of this position. */
ptrdiff_t line;
/* The column number (in bytes) of this position (0-based). Basically
the byte offset from BOL (or BOB). */
ptrdiff_t col;
};
/* This data structure describes the actual text contents of a buffer.
It is shared between indirect buffers and their base buffer. */
@ -700,6 +714,25 @@ struct buffer
/* The interval tree containing this buffer's overlays. */
struct itree_tree *overlays;
/* Right now only tree-sitter makes use of this, so I don't want
non-tree-sitter build to pay for it. If something else can make
use of this, we can remove the gate. */
#ifdef HAVE_TREE_SITTER
/* Cache of line and column number of a position. Tree-sitter uses
this cache to calculate line and column of the beginning and end of
buffer edits. Stores three caches for BEGV, point, ZV,
respectively. All three are refreshed in buffer edit functions, so
they're always up-to-date (in the sense that the bytepos and
line/column number are in sync, not in the sense that the bytepos
is at the actual position of point/BEGV/ZV, indeed, most of the
time the bytepos is only near the actual position). All caches are
initialized to empty, meaning no linecol tracking for this
buffer. */
struct ts_linecol ts_linecol_begv;
struct ts_linecol ts_linecol_point;
struct ts_linecol ts_linecol_zv;
#endif
/* Changes in the buffer are recorded here for undo, and t means
don't record anything. This information belongs to the base
buffer of an indirect buffer. But we can't store it in the
@ -1134,6 +1167,45 @@ BUFFER_CHECK_INDIRECTION (struct buffer *b)
}
}
#ifdef HAVE_TREE_SITTER
INLINE struct ts_linecol
BUF_TS_LINECOL_BEGV (struct buffer *buf)
{
return buf->ts_linecol_begv;
}
INLINE struct ts_linecol
BUF_TS_LINECOL_POINT (struct buffer *buf)
{
return buf->ts_linecol_point;
}
INLINE struct ts_linecol
BUF_TS_LINECOL_ZV (struct buffer *buf)
{
return buf->ts_linecol_zv;
}
INLINE void
SET_BUF_TS_LINECOL_BEGV (struct buffer *buf, struct ts_linecol linecol)
{
buf->ts_linecol_begv = linecol;
}
INLINE void
SET_BUF_TS_LINECOL_POINT (struct buffer *buf, struct ts_linecol linecol)
{
buf->ts_linecol_point = linecol;
}
INLINE void
SET_BUF_TS_LINECOL_ZV (struct buffer *buf, struct ts_linecol linecol)
{
buf->ts_linecol_zv = linecol;
}
#endif
/* This structure holds the default values of the buffer-local variables
that have special slots in each buffer.
The default value occupies the same slot in this structure

View file

@ -543,6 +543,12 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
#ifdef HAVE_TREE_SITTER
ptrdiff_t start_byte = CHAR_TO_BYTE (start);
ptrdiff_t old_end_byte = CHAR_TO_BYTE (end);
struct ts_linecol start_linecol
= treesit_linecol_maybe (start, start_byte,
BUF_TS_LINECOL_POINT (current_buffer));
struct ts_linecol old_end_linecol
= treesit_linecol_maybe (end, old_end_byte,
BUF_TS_LINECOL_POINT (current_buffer));
#endif
ptrdiff_t orig_end = end;
@ -565,8 +571,11 @@ casify_region (enum case_action flag, Lisp_Object b, Lisp_Object e)
update_compositions (start, end, CHECK_ALL);
}
#ifdef HAVE_TREE_SITTER
treesit_record_change (start_byte, old_end_byte,
CHAR_TO_BYTE (orig_end + added));
ptrdiff_t new_end = orig_end + added;
ptrdiff_t new_end_byte = CHAR_TO_BYTE (new_end);
treesit_record_change (start_byte, old_end_byte, new_end_byte,
start_linecol, old_end_linecol, new_end);
#endif
return orig_end + added;

View file

@ -2305,6 +2305,19 @@ Both characters must have the same length of multi-byte form. */)
= !NILP (BVAR (current_buffer, enable_multibyte_characters));
int fromc, toc;
#ifdef HAVE_TREE_SITTER
ptrdiff_t start_char = fix_position (start);
ptrdiff_t old_end_char = fix_position (end);
ptrdiff_t start_byte = CHAR_TO_BYTE (start_char);
ptrdiff_t old_end_byte = CHAR_TO_BYTE (old_end_char);
struct ts_linecol start_linecol
= treesit_linecol_maybe (start_char, start_byte,
BUF_TS_LINECOL_POINT (current_buffer));
struct ts_linecol old_end_linecol
= treesit_linecol_maybe (old_end_char, old_end_byte,
BUF_TS_LINECOL_POINT (current_buffer));
#endif
restart:
validate_region (&start, &end);
@ -2405,7 +2418,8 @@ Both characters must have the same length of multi-byte form. */)
if (changed > 0)
{
#ifdef HAVE_TREE_SITTER
treesit_record_change (changed, last_changed, last_changed);
treesit_record_change (start_byte, old_end_byte, old_end_byte,
start_linecol, old_end_linecol, old_end_char);
#endif
signal_after_change (changed,
last_changed - changed, last_changed - changed);
@ -2592,6 +2606,15 @@ It returns the number of characters changed. */)
}
else
{
#ifdef HAVE_TREE_SITTER
struct ts_linecol linecol_cache
= BUF_TS_LINECOL_POINT (current_buffer);
struct ts_linecol start_linecol
= treesit_linecol_maybe (pos, pos_byte, linecol_cache);
struct ts_linecol old_end_linecol
= treesit_linecol_maybe (pos + 1, pos_byte + len,
start_linecol);
#endif
record_change (pos, 1);
while (str_len-- > 0)
*p++ = *str++;
@ -2604,7 +2627,8 @@ It returns the number of characters changed. */)
modified buffer content manually, so we need to
notify tree-sitter manually. */
treesit_record_change (pos_byte, pos_byte + len,
pos_byte + len);
pos_byte + len, start_linecol,
old_end_linecol, pos + 1);
#endif
}
characters_changed++;
@ -4555,6 +4579,15 @@ ring. */)
start1_byte = CHAR_TO_BYTE (start1);
end2_byte = CHAR_TO_BYTE (end2);
#ifdef HAVE_TREE_SITTER
struct ts_linecol start_linecol
= treesit_linecol_maybe (start1, start1_byte,
BUF_TS_LINECOL_POINT (current_buffer));
struct ts_linecol old_end_linecol
= treesit_linecol_maybe (end2, end2_byte,
BUF_TS_LINECOL_POINT (current_buffer));
#endif
/* Make sure the gap won't interfere, by moving it out of the text
we will operate on. */
if (start1 < gap && gap < end2)
@ -4694,10 +4727,8 @@ ring. */)
}
#ifdef HAVE_TREE_SITTER
/* I don't think it's common to transpose two far-apart regions, so
amalgamating the edit into one should be fine. This is what the
signal_after_change below does, too. */
treesit_record_change (start1_byte, end2_byte, end2_byte);
treesit_record_change (start1_byte, end2_byte, end2_byte,
start_linecol, old_end_linecol, end2);
#endif
signal_after_change (start1, end2 - start1, end2 - start1);

View file

@ -898,6 +898,12 @@ insert_1_both (const char *string,
if (NILP (BVAR (current_buffer, enable_multibyte_characters)))
nchars = nbytes;
#ifdef HAVE_TREE_SITTER
struct ts_linecol start_linecol
= treesit_linecol_maybe (PT, PT_BYTE,
BUF_TS_LINECOL_POINT (current_buffer));
#endif
if (prepare)
/* Do this before moving and increasing the gap,
because the before-change hooks might move the gap
@ -952,7 +958,9 @@ insert_1_both (const char *string,
#ifdef HAVE_TREE_SITTER
eassert (nbytes >= 0);
eassert (PT_BYTE >= 0);
treesit_record_change (PT_BYTE, PT_BYTE, PT_BYTE + nbytes);
treesit_record_change (PT_BYTE, PT_BYTE, PT_BYTE + nbytes,
start_linecol, start_linecol, PT + nchars);
#endif
adjust_point (nchars, nbytes);
@ -1024,6 +1032,12 @@ insert_from_string_1 (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
= count_size_as_multibyte (SDATA (string) + pos_byte,
nbytes);
#ifdef HAVE_TREE_SITTER
struct ts_linecol start_linecol
= treesit_linecol_maybe (PT, PT_BYTE,
BUF_TS_LINECOL_POINT (current_buffer));
#endif
/* Do this before moving and increasing the gap,
because the before-change hooks might move the gap
or make it smaller. */
@ -1088,7 +1102,9 @@ insert_from_string_1 (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
#ifdef HAVE_TREE_SITTER
eassert (nbytes >= 0);
eassert (PT_BYTE >= 0);
treesit_record_change (PT_BYTE, PT_BYTE, PT_BYTE + nbytes);
treesit_record_change (PT_BYTE, PT_BYTE, PT_BYTE + nbytes,
start_linecol, start_linecol, PT + nchars);
#endif
adjust_point (nchars, outgoing_nbytes);
@ -1101,7 +1117,8 @@ insert_from_string_1 (Lisp_Object string, ptrdiff_t pos, ptrdiff_t pos_byte,
GPT_ADDR (if not text_at_gap_tail).
Contrary to insert_from_gap, this does not invalidate any cache,
nor update any markers, nor record any buffer modification information
of any sort, with the single exception of notifying tree-sitter. */
of any sort, with the single exception of notifying tree-sitter and
updating tree-sitter linecol cache. */
void
insert_from_gap_1 (ptrdiff_t nchars, ptrdiff_t nbytes, bool text_at_gap_tail)
{
@ -1110,6 +1127,9 @@ insert_from_gap_1 (ptrdiff_t nchars, ptrdiff_t nbytes, bool text_at_gap_tail)
#ifdef HAVE_TREE_SITTER
ptrdiff_t ins_bytepos = GPT_BYTE;
struct ts_linecol start_linecol
= treesit_linecol_maybe (GPT, GPT_BYTE,
BUF_TS_LINECOL_POINT (current_buffer));
#endif
GAP_SIZE -= nbytes;
@ -1130,7 +1150,9 @@ insert_from_gap_1 (ptrdiff_t nchars, ptrdiff_t nbytes, bool text_at_gap_tail)
#ifdef HAVE_TREE_SITTER
eassert (nbytes >= 0);
eassert (ins_bytepos >= 0);
treesit_record_change (ins_bytepos, ins_bytepos, ins_bytepos + nbytes);
treesit_record_change (ins_bytepos, ins_bytepos, ins_bytepos + nbytes,
start_linecol, start_linecol, ins_bytepos + nbytes);
#endif
}
@ -1193,6 +1215,9 @@ insert_from_buffer (struct buffer *buf,
#ifdef HAVE_TREE_SITTER
ptrdiff_t obyte = PT_BYTE;
struct ts_linecol start_linecol
= treesit_linecol_maybe (opoint, obyte,
BUF_TS_LINECOL_POINT (current_buffer));
#endif
insert_from_buffer_1 (buf, charpos, nchars, inherit);
@ -1203,7 +1228,9 @@ insert_from_buffer (struct buffer *buf,
eassert (PT_BYTE >= BEG_BYTE);
eassert (obyte >= BEG_BYTE);
eassert (PT_BYTE >= obyte);
treesit_record_change (obyte, obyte, PT_BYTE);
treesit_record_change (obyte, obyte, PT_BYTE,
start_linecol, start_linecol, PT);
#endif
}
@ -1494,6 +1521,16 @@ replace_range (ptrdiff_t from, ptrdiff_t to, Lisp_Object new,
if (nbytes_del <= 0 && inschars == 0)
return;
#ifdef HAVE_TREE_SITTER
struct ts_linecol start_linecol
= treesit_linecol_maybe (from, from_byte,
BUF_TS_LINECOL_POINT (current_buffer));
struct ts_linecol old_end_linecol
= treesit_linecol_maybe (to, to_byte,
BUF_TS_LINECOL_POINT (current_buffer));
#endif
ptrdiff_t insbeg_bytes, insend_bytes;
ptrdiff_t insbytes;
unsigned char *insbeg_ptr;
@ -1633,7 +1670,9 @@ replace_range (ptrdiff_t from, ptrdiff_t to, Lisp_Object new,
eassert (to_byte >= from_byte);
eassert (outgoing_insbytes >= 0);
eassert (from_byte >= 0);
treesit_record_change (from_byte, to_byte, from_byte + outgoing_insbytes);
treesit_record_change (from_byte, to_byte, from_byte + outgoing_insbytes,
start_linecol, old_end_linecol, from + inschars);
#endif
/* Relocate point as if it were a marker. */
@ -1960,6 +1999,15 @@ del_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
nchars_del = to - from;
nbytes_del = to_byte - from_byte;
#ifdef HAVE_TREE_SITTER
struct ts_linecol start_linecol
= treesit_linecol_maybe (from, from_byte,
BUF_TS_LINECOL_POINT (current_buffer));
struct ts_linecol old_end_linecol
= treesit_linecol_maybe (to, to_byte,
BUF_TS_LINECOL_POINT (current_buffer));
#endif
/* Make sure the gap is somewhere in or next to what we are deleting. */
if (from > GPT)
gap_right (from, from_byte);
@ -2019,7 +2067,8 @@ del_range_2 (ptrdiff_t from, ptrdiff_t from_byte,
#ifdef HAVE_TREE_SITTER
eassert (from_byte <= to_byte);
eassert (from_byte >= 0);
treesit_record_change (from_byte, to_byte, from_byte);
treesit_record_change (from_byte, to_byte, from_byte,
start_linecol, old_end_linecol, from);
#endif
return deletion;

View file

@ -4415,6 +4415,10 @@ extern void update_echo_area (void);
extern void truncate_echo_area (ptrdiff_t);
extern void redisplay (void);
extern ptrdiff_t count_lines (ptrdiff_t start_byte, ptrdiff_t end_byte);
extern ptrdiff_t display_count_lines (ptrdiff_t start_byte,
ptrdiff_t limit_byte,
ptrdiff_t count,
ptrdiff_t *byte_pos_ptr);
void set_frame_cursor_types (struct frame *, Lisp_Object);
extern void syms_of_xdisp (void);

View file

@ -307,18 +307,13 @@ init_treesit_functions (void)
in Emacs's use cases.
- Many tree-sitter functions take a TSPoint, which is basically a
row and column. Emacs uses a gap buffer and does not keep
information about the row and column position of a buffer.
According to the author of tree-sitter, those functions only take
a TSPoint so that it can be moved alongside the byte position and
returned to the caller afterwards, and the position actually used
is the specified byte position. He also said that he _thinks_
that just passing a byte position will also work. As a result, a
dummy value is used in place of each TSPoint. Judging by the
nature of parsing algorithms, I think it is safe to use only the
byte position, and I don't think this will change in the future.
See: https://github.com/tree-sitter/tree-sitter/issues/445
line and column. Emacs uses a gap buffer and does not keep
information about the line and column positions in a buffer, so
it's hard for us to pass it to tree-sitter. Instead we just give
it dummy values. But there are certain languages that does need
the line and column positions to work right, like Haskell. So we
added optional line and column tracking. See the linecol section
below.
treesit.h has some commentary on the two main data structure for
the parser and node. treesit_sync_visible_region has some
@ -350,8 +345,8 @@ init_treesit_functions (void)
Tree-sitter-related code in other files:
- src/alloc.c for gc for parser and node
- src/casefiddle.c & src/insdel.c for notifying tree-sitter
parser of buffer changes.
- src/casefiddle.c, src/insdel.c, src/editfns.c for notifying
tree-sitter parser of buffer changes.
- lisp/emacs-lisp/cl-preloaded.el & data.c & lisp.h for parser and
node type.
- print.c for printing tree-sitter objects (node, parser, query).
@ -406,7 +401,66 @@ init_treesit_functions (void)
from the user's POV, each buffer, regardless of indirect or not,
appears to have their own parser list. A discussion can be found in
bug#59693. Note that that discussion led to an earlier design, which
is different from the current one. */
is different from the current one.
Line and column reporting to tree-sitter: technically we had to send
tree-sitter the line and column position of each edit. But in
practice we just send it dummy values, because tree-sitter doesn't
use it for parsing and mostly just carries the line and column
positions around and return it when e.g. reporting node positions[1].
This has been working fine until we encountered grammars that
actually utilizes the line and column information for parsing
(Haskell)[2].
[1] https://github.com/tree-sitter/tree-sitter/issues/445
[2] https://github.com/tree-sitter/tree-sitter/issues/4001
So now we have to keep track of line and column positions and pass
valid values to tree-sitter. (It adds quite some complexity, but
only linearly; one can ignore all the linecol stuff when trying to
understand treesit code and then come back to it later.) Eli
convinced me to disable tracking by default, and only enable it for
languages that needs it. So the buffer starts out not tracking
linecol. And when a parser is created, if the language is in
treesit-languages-require-line-column-tracking, we enable tracking in
the buffer, and enable tracking for the parser. To simplify things,
once a buffer starts tracking linecol, it never disables tracking,
even if parsers that need tracking are all deleted; and for parsers,
tracking is determined at creation time, if it starts out
tracking/non-tracking, it stays that way, regardless of later changes
to treesit-languages-require-line-column-tracking.
To make calculating line/column positons fast, we store linecol
caches for begv, point, and zv in the buffer
(buf->ts_linecol_cache_xxx); and in the parser object, we store
linecol cache for visible beg/end of that parser.
In buffer editing functions, we need the linecol for
start/old_end/new_end, those can be calculated by scanning newlines
(treesit_linecol_of_pos) from the buffer point cache, which should be
always near the point. And we usually set the calculated linecol of
new_end back to the buffer point cache.
We also need to calculate linecol for the visible_beg/end for each
parser, and linecol for the buffer's begv/zv, these positions are
usually far from point, so we have caches for all of them (in either
the parser object or the buffer). These positions are far from
point, so it's inefficient to scan newlines from point to there to
get up-to-date linecol for them; but in the same time, because
they're far and outside the changed region, we can calculate their
change in line and column number by simply counting how much newlines
are added/removed in the changed region
(compute_new_linecol_by_change). */
/*** Constants */
/* A linecol_cache that points to BOB, this is always valid. */
const struct ts_linecol TREESIT_BOB_LINECOL = { 1, 1, 0 };
/* An uninitialized linecol. */
const struct ts_linecol TREESIT_EMPTY_LINECOL = { 0, 0, 0 };
const TSPoint TREESIT_TS_POINT_1_0 = { 1, 0 };
/*** Initialization */
@ -864,6 +918,241 @@ loaded or the file name couldn't be determined, return nil. */)
}
/*** Linecol functions */
#define TREESIT_DEBUG_LINECOL false
void treesit_debug_print_linecol (struct ts_linecol);
void
treesit_debug_print_linecol (struct ts_linecol linecol)
{
printf ("{ line=%ld col=%ld bytepos=%ld }\n", linecol.line, linecol.col, linecol.bytepos);
}
/* Returns true if BUF tracks linecol. */
bool treesit_buf_tracks_linecol_p (struct buffer *buf)
{
return BUF_TS_LINECOL_BEGV (buf).bytepos != 0;
}
static void
restore_restriction_and_selective_display (Lisp_Object record)
{
save_restriction_restore (Fcar (record));
BVAR (current_buffer, selective_display) = Fcdr (record);
return;
}
/* Similar to display_count_lines, but behaves differently when
searching backwards: when found a newline, stop at the newline,
return count as normal (display_count_lines stops after the newline
and subtracts one from count). When searching forward, stop at the
position after the newline. Another difference is this function
disregards narrowing, so it works on bytepos outside of the visible
range. */
static ptrdiff_t
treesit_count_lines (ptrdiff_t start_byte,
ptrdiff_t limit_byte, ptrdiff_t count,
ptrdiff_t *byte_pos_ptr)
{
/* I don't think display_count_lines signals, so the unwind-protect
technically isn't necessary. Also treesit_count_lines aren't
suppose to signal either since it's used in functions that aren't
supposed to signal (treesit_record_change and friends). */
Lisp_Object record = Fcons (save_restriction_save (),
BVAR (current_buffer, selective_display));
specpdl_ref pdl_count = SPECPDL_INDEX ();
record_unwind_protect (restore_restriction_and_selective_display, record);
BVAR (current_buffer, selective_display) = Qnil;
labeled_restrictions_remove_in_current_buffer ();
Fwiden ();
ptrdiff_t counted = display_count_lines (start_byte, limit_byte,
count, byte_pos_ptr);
unbind_to (pdl_count, Qnil);
/* If searching backwards and we found COUNT newlines, countermand the
different logic in display_count_lines. */
if (count < 0 && limit_byte != *byte_pos_ptr)
{
counted += 1;
*byte_pos_ptr -= 1;
}
return counted;
}
static void
treesit_debug_validate_linecol (struct ts_linecol linecol)
{
eassert (linecol.bytepos <= Z_BYTE);
/* We can't use count_lines as ground truth because it respects
narrowing, and calling it with a bytepos outside of the visible
portion results in infloop. */
ptrdiff_t _unused;
ptrdiff_t true_line_count = treesit_count_lines (BEG_BYTE, linecol.bytepos,
Z_BYTE, &_unused) + 1;
eassert (true_line_count == linecol.line);
}
/* Calculate and return the line and column number of BYTE_POS by
scanning newlines from CACHE. CACHE must be valid. */
static struct ts_linecol
treesit_linecol_of_pos (ptrdiff_t target_bytepos,
struct ts_linecol cache)
{
if (TREESIT_DEBUG_LINECOL)
{
treesit_debug_validate_linecol (cache);
}
/* When we finished searching for newlines between CACHE and
TARGET_POS, BYTE_POS_2 is at TARGET_POS, and BYTE_POS_1 is at the
previous newline. If TARGET_POS happends to be on a newline,
BYTE_POS_1 will be on that position. BYTE_POS_1 is used for
calculating the column. (If CACHE and TARGET_POS are in the same
line, BYTE_POS_1 is unset and we don't use it.) */
ptrdiff_t byte_pos_1 = 0;
ptrdiff_t byte_pos_2 = 0;
/* Number of lines between CACHE and TARGET_POS. */
ptrdiff_t line_delta = 0;
if (target_bytepos == cache.bytepos)
return cache;
/* Search forward. */
if (cache.bytepos < target_bytepos)
{
byte_pos_2 = cache.bytepos;
while (byte_pos_2 < target_bytepos)
{
ptrdiff_t counted = treesit_count_lines (byte_pos_2, target_bytepos,
1, &byte_pos_2);
if (counted > 0)
{
byte_pos_1 = byte_pos_2;
}
line_delta += counted;
}
eassert (byte_pos_2 == target_bytepos);
/* At this point, byte_pos_2 is at target_pos, and byte_pos_1 is
at the previous newline if we went across any. */
struct ts_linecol target_linecol;
target_linecol.bytepos = target_bytepos;
target_linecol.line = cache.line + line_delta;
/* If we moved across any newline, use the previous newline to
calculate the column; if we stayed at the same line, use the
cached column to calculate the new column. */
target_linecol.col = line_delta > 0
? target_bytepos - byte_pos_1
: target_bytepos - cache.bytepos + cache.col;
if (TREESIT_DEBUG_LINECOL)
{
treesit_debug_validate_linecol (target_linecol);
}
return target_linecol;
}
/* Search backward. */
byte_pos_2 = cache.bytepos;
while (byte_pos_2 > target_bytepos)
{
ptrdiff_t counted = treesit_count_lines (byte_pos_2, target_bytepos,
-1, &byte_pos_2);
line_delta -= counted;
}
eassert (byte_pos_2 == target_bytepos);
/* At this point, pos_2 is at target_pos. */
struct ts_linecol target_linecol;
target_linecol.bytepos = target_bytepos;
target_linecol.line = cache.line + line_delta;
eassert (cache.line + line_delta > 0);
/* Calculate the column. */
if (line_delta == 0)
{
target_linecol.col = cache.col - (cache.bytepos - target_bytepos);
}
else
{
/* We need to find the previous newline in order to calculate the
column. */
ptrdiff_t counted = treesit_count_lines (byte_pos_2, BEG_BYTE, -1, &byte_pos_2);
target_linecol.col
= target_bytepos - (byte_pos_2 + counted == 1 ? 1 : 0);
}
if (TREESIT_DEBUG_LINECOL)
{
treesit_debug_validate_linecol (target_linecol);
}
return target_linecol;
}
/* Return a TSPoint given POS and VISIBLE_BEG. VISIBLE_BEG must be
before POS. */
static TSPoint
treesit_make_ts_point (struct ts_linecol visible_beg,
struct ts_linecol pos)
{
TSPoint point;
if (visible_beg.line == pos.line)
{
point.row = 0;
point.column = pos.col - visible_beg.col;
eassert (point.column >= 0);
}
else
{
point.row = pos.line - visible_beg.line;
eassert (point.row > 0);
point.column = pos.col;
}
return point;
}
DEFUN ("treesit-tracking-line-column-p",
Ftreesit_tracking_line_column_p,
Streesit_tracking_line_column_p, 0, 1, 0,
doc : /* Return non-nil if BUFFER is tracking line and column.
Return nil otherwise. BUFFER defaults to the current buffer. */)
(Lisp_Object buffer)
{
struct buffer *buf = current_buffer;
if (!NILP (buffer))
{
CHECK_BUFFER (buffer);
buf = XBUFFER (buffer);
}
return treesit_buf_tracks_linecol_p (buf) ? Qt : Qnil;
}
DEFUN ("treesit-parser-tracking-line-column-p",
Ftreesit_parser_tracking_line_column_p,
Streesit_parser_tracking_line_column_p, 1, 1, 0,
doc : /* Return non-nil if PARSER is tracking line and column.
Return nil otherwise.*/)
(Lisp_Object parser)
{
CHECK_TS_PARSER (parser);
return XTS_PARSER (parser)->visi_beg_linecol.bytepos == 0 ? Qnil : Qt;
}
/*** Parsing functions */
static void
@ -879,34 +1168,147 @@ treesit_check_parser (Lisp_Object obj)
larger than UINT32_MAX. */
static inline void
treesit_tree_edit_1 (TSTree *tree, ptrdiff_t start_byte,
ptrdiff_t old_end_byte, ptrdiff_t new_end_byte)
ptrdiff_t old_end_byte, ptrdiff_t new_end_byte,
TSPoint start_point, TSPoint old_end_point,
TSPoint new_end_point)
{
eassert (start_byte >= 0);
eassert (start_byte <= old_end_byte);
eassert (start_byte <= new_end_byte);
TSPoint dummy_point = {0, 0};
eassert (start_byte <= UINT32_MAX);
eassert (old_end_byte <= UINT32_MAX);
eassert (new_end_byte <= UINT32_MAX);
TSInputEdit edit = {(uint32_t) start_byte,
(uint32_t) old_end_byte,
(uint32_t) new_end_byte,
dummy_point, dummy_point, dummy_point};
start_point, old_end_point, new_end_point};
ts_tree_edit (tree, &edit);
}
/* Update each parser's tree after the user made an edit. This
function does not parse the buffer and only updates the tree, so it
should be very fast. */
void
treesit_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
ptrdiff_t new_end_byte)
/* Given a position at POS_LINECOL, and the linecol of a buffer change
(START_LINECOL, OLD_END_LINECOL, and NEW_END_LINCOL), compute the new
linecol for that position, then scan from this now valid linecol to
TARGET_BYTEPOS and return the linecol at TARGET_BYTEPOS.
When POS_LINECOL is outside of the range between START_LINECOL and
OLD_END_LINECOL, we can calculate the change in line and column
number of POS_LINECOL by simply counting how many newlines are
removed/added in the change. Once we have the up-to-date line and
column number at POS_LINECOL.bytepos, we can just scan to
TARGET_BYTEPOS to get a linecol for it. The assumption is that
TARGET_BYTEPOS is far from START_LINECOL, etc, but close to
POS_LINECOL. So we avoids scanning longs distance from
START_LINECOL, etc.
However, this optimization only works when POS_LINECOL is outside the
range between START_LINECOL and OLD_END_LINECOL. If not, we've have
to scan from START_LINECOL or NEW_END_LINECOL to TARGET_BYTEPOS. */
static struct ts_linecol
compute_new_linecol_by_change (struct ts_linecol pos_linecol,
struct ts_linecol start_linecol,
struct ts_linecol old_end_linecol,
struct ts_linecol new_end_linecol,
ptrdiff_t target_bytepos)
{
struct ts_linecol new_linecol = { 0, 0, 0 };
/* 1. Even start is behind pos, pos isn't affected. */
if (start_linecol.bytepos >= pos_linecol.bytepos)
{
new_linecol = pos_linecol;
}
/* 2. When old_end (oe) is before pos, the differnce between pos and
pos' is the difference between old_end and new_end (ne).
| | | | | |
s oe pos s oe pos
OR
| | | | |
s ne pos' s ne pos'
*/
else if (old_end_linecol.bytepos <= pos_linecol.bytepos)
{
ptrdiff_t line_delta = new_end_linecol.line - old_end_linecol.line;
new_linecol.line = pos_linecol.line + line_delta;
new_linecol.bytepos
= pos_linecol.bytepos + new_end_linecol.bytepos - old_end_linecol.bytepos;
/* Suppose # is text, | is cursor:
################
########|########|
oe pos
Now, if we insert something:
################
########|OOOOO
OOOOOOOOOO|########|
ne pos'
Clearly, col for pos' is just the col of new_end plus the
distance between old_end and pos. The same goes for deletion.
*/
if (old_end_linecol.line == pos_linecol.line)
{
eassert (old_end_linecol.col <= pos_linecol.col);
ptrdiff_t old_end_to_pos = pos_linecol.col - old_end_linecol.col;
new_linecol.col = new_end_linecol.col + old_end_to_pos;
}
else
{
new_linecol.col = pos_linecol.col;
}
}
/* 3. At this point, start < pos < old_end. We're kinda cooked, there
aren't much we can do other than scan the buffer from new_end or
start. */
else if (target_bytepos - start_linecol.bytepos
< eabs (target_bytepos - new_end_linecol.bytepos))
{
new_linecol = treesit_linecol_of_pos (target_bytepos, start_linecol);
}
else
{
new_linecol = treesit_linecol_of_pos (target_bytepos, new_end_linecol);
}
/* Now new_linecol is a valid linecol, scan from it to target_bytepos. */
if (new_linecol.bytepos != target_bytepos)
{
new_linecol = treesit_linecol_of_pos (target_bytepos, new_linecol);
}
if (TREESIT_DEBUG_LINECOL)
treesit_debug_validate_linecol (new_linecol);
return new_linecol;
}
/* Update each parser's tree after the user made an edit. This function
does not parse the buffer and only updates the tree, so it should be
very fast. If the caller knows there's no parser in the current
buffer, they can pass empty linecol for
START/OLD_END/NEW_END_linecol.
If the current buffer doesn't track linecol, start_linecol,
old_end_linecol, and new_end_linecol will be empty. In that case,
don't process linecols. */
static void
treesit_record_change_1 (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
ptrdiff_t new_end_byte,
struct ts_linecol start_linecol,
struct ts_linecol old_end_linecol,
struct ts_linecol new_end_linecol)
{
struct buffer *base_buffer = current_buffer;
if (current_buffer->base_buffer)
base_buffer = current_buffer->base_buffer;
Lisp_Object parser_list = BVAR (base_buffer, ts_parser_list);
bool buf_tracks_linecol = start_linecol.bytepos != 0;
FOR_EACH_TAIL_SAFE (parser_list)
{
CHECK_CONS (parser_list);
@ -916,16 +1318,22 @@ treesit_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
/* See comment (ref:visible-beg-null) if you wonder why we don't
update visible_beg/end when tree is NULL. */
bool parser_tracks_linecol
= XTS_PARSER (lisp_parser)->visi_beg_linecol.bytepos != 0;
if (tree != NULL)
{
eassert (start_byte <= old_end_byte);
eassert (start_byte <= new_end_byte);
/* Think the recorded change as a delete followed by an
insert, and think of them as moving unchanged text back
and forth. After all, the whole point of updating the
tree is to update the position of unchanged text. */
ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
/* Before sending the edit to tree-sitter, we need to first
clip the beg/end to visible_beg and visible_end of the
parser. A tip for understanding the code below: think the
recorded change as a delete followed by an insert, and
think of them as moving unchanged text back and forth.
After all, the whole point of updating the tree is to
update the position of unchanged text. */
const ptrdiff_t visible_beg = XTS_PARSER (lisp_parser)->visible_beg;
const ptrdiff_t visible_end = XTS_PARSER (lisp_parser)->visible_end;
eassert (visible_beg >= 0);
eassert (visible_beg <= visible_end);
@ -949,10 +1357,6 @@ treesit_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
eassert (start_offset <= old_end_offset);
eassert (start_offset <= new_end_offset);
treesit_tree_edit_1 (tree, start_offset, old_end_offset,
new_end_offset);
XTS_PARSER (lisp_parser)->need_reparse = true;
/* VISIBLE_BEG/END records tree-sitter's range of view in
the buffer. We need to adjust them when tree-sitter's
view changes. */
@ -966,19 +1370,133 @@ treesit_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
visi_beg_delta = (old_end_byte < visible_beg
? new_end_byte - old_end_byte : 0);
XTS_PARSER (lisp_parser)->visible_beg = visible_beg + visi_beg_delta;
XTS_PARSER (lisp_parser)->visible_end = (visible_end
+ visi_beg_delta
+ (new_end_offset
- old_end_offset));
const ptrdiff_t new_visible_beg = visible_beg + visi_beg_delta;
const ptrdiff_t new_visible_end
= (visible_end + visi_beg_delta
+ (new_end_offset - old_end_offset));
eassert (XTS_PARSER (lisp_parser)->visible_beg >= 0);
eassert (XTS_PARSER (lisp_parser)->visible_beg
<= XTS_PARSER (lisp_parser)->visible_end);
XTS_PARSER (lisp_parser)->visible_beg = new_visible_beg;
XTS_PARSER (lisp_parser)->visible_end = new_visible_end;
eassert (BEG_BYTE <= new_visible_beg);
eassert (new_visible_beg <= new_visible_end);
eassert (new_visible_end <= Z_BYTE);
/* (Optionally) calculate the point for start/old_end/new_end
to be sent to tree-sitter. Also update parser cache for
linecol. */
TSPoint start_point = TREESIT_TS_POINT_1_0;
TSPoint old_end_point = TREESIT_TS_POINT_1_0;
TSPoint new_end_point = TREESIT_TS_POINT_1_0;
if (parser_tracks_linecol)
{
eassert (buf_tracks_linecol);
struct ts_linecol old_visi_beg_linecol
= XTS_PARSER (lisp_parser)->visi_beg_linecol;
struct ts_linecol old_visi_end_linecol
= XTS_PARSER (lisp_parser)->visi_end_linecol;
const struct ts_linecol new_visi_beg_linecol
= compute_new_linecol_by_change (old_visi_beg_linecol,
start_linecol,
old_end_linecol,
new_end_linecol,
new_visible_beg);
const struct ts_linecol new_visi_end_linecol
= compute_new_linecol_by_change (old_visi_end_linecol,
start_linecol,
old_end_linecol,
new_end_linecol,
new_visible_end);
XTS_PARSER (lisp_parser)->visi_beg_linecol
= new_visi_beg_linecol;
XTS_PARSER (lisp_parser)->visi_end_linecol
= new_visi_end_linecol;
/* Now, calculate TSPoints and finally update the tree. */
struct ts_linecol new_begv_linecol
= XTS_PARSER (lisp_parser)->visi_beg_linecol;
old_end_point = treesit_make_ts_point (old_visi_beg_linecol,
old_end_linecol);
start_point = treesit_make_ts_point (new_begv_linecol,
start_linecol);
new_end_point = treesit_make_ts_point (new_begv_linecol,
new_end_linecol);
}
treesit_tree_edit_1 (tree, start_offset, old_end_offset,
new_end_offset, start_point, old_end_point,
new_end_point);
XTS_PARSER (lisp_parser)->need_reparse = true;
}
}
}
/* Return the linecol of POS, calculated from CACHE. But if there's no
parser in the current buffer, or line-column tracking is disabled,
skip calculation and return an empty linecol instead. */
struct ts_linecol
treesit_linecol_maybe (ptrdiff_t pos, ptrdiff_t pos_byte,
struct ts_linecol cache)
{
if (NILP (BVAR (current_buffer, ts_parser_list))
|| !treesit_buf_tracks_linecol_p (current_buffer))
return TREESIT_EMPTY_LINECOL;
return treesit_linecol_of_pos (pos_byte, cache);
}
/* Update each parser's tree after the user made an edit. This function
does not parse the buffer and only updates the tree, so it should be
very fast.
This is a wrapper over treesit_record_change that does a bit more
boilerplate work: it (optionally) calculates linecol for new_end,
pass all the positions into treesit_record_change_1 which does the
real work, and finally (optionally) sets buffer's linecol cache to
new_end's linecol.
If NEW_END is next to NEW_END_BYTE in the arglist, caller might
accidentally swap them, so I placed NEW_END at the end of the
arglist.
If the current buffer doesn't track linecol, start_linecol and
old_end_linecol will be empty. In that case, don't process
linecols. */
void
treesit_record_change (ptrdiff_t start_byte, ptrdiff_t old_end_byte,
ptrdiff_t new_end_byte,
struct ts_linecol start_linecol,
struct ts_linecol old_end_linecol,
ptrdiff_t new_end)
{
struct ts_linecol new_end_linecol
= treesit_linecol_maybe (new_end, new_end_byte, start_linecol);
treesit_record_change_1 (start_byte, old_end_byte, new_end_byte,
start_linecol, old_end_linecol, new_end_linecol);
if (new_end_linecol.bytepos != 0)
{
const struct ts_linecol new_begv_linecol
= compute_new_linecol_by_change (BUF_TS_LINECOL_BEGV (current_buffer),
start_linecol,
old_end_linecol,
new_end_linecol,
BEGV_BYTE);
const struct ts_linecol new_zv_linecol
= compute_new_linecol_by_change (BUF_TS_LINECOL_ZV (current_buffer),
start_linecol,
old_end_linecol,
new_end_linecol,
ZV_BYTE);
SET_BUF_TS_LINECOL_BEGV (current_buffer, new_begv_linecol);
SET_BUF_TS_LINECOL_POINT (current_buffer, new_end_linecol);
SET_BUF_TS_LINECOL_ZV (current_buffer, new_zv_linecol);
}
}
static TSRange *treesit_make_ts_ranges (Lisp_Object, Lisp_Object,
uint32_t *);
@ -1034,6 +1552,7 @@ treesit_sync_visible_region (Lisp_Object parser)
{
TSTree *tree = XTS_PARSER (parser)->tree;
struct buffer *buffer = XBUFFER (XTS_PARSER (parser)->buffer);
const bool track_linecol = treesit_buf_tracks_linecol_p (buffer);
/* If we are setting visible_beg/end for the first time, we can skip
the offset acrobatics and updating the tree below. */
@ -1046,6 +1565,7 @@ treesit_sync_visible_region (Lisp_Object parser)
ptrdiff_t visible_beg = XTS_PARSER (parser)->visible_beg;
ptrdiff_t visible_end = XTS_PARSER (parser)->visible_end;
eassert (0 <= visible_beg);
eassert (visible_beg <= visible_end);
@ -1066,39 +1586,81 @@ treesit_sync_visible_region (Lisp_Object parser)
from ________|xxxx|__
to |xxxx|__________ */
struct ts_linecol visi_beg_linecol = track_linecol
? XTS_PARSER (parser)->visi_beg_linecol : TREESIT_EMPTY_LINECOL;
struct ts_linecol visi_end_linecol = track_linecol
? XTS_PARSER (parser)->visi_end_linecol : TREESIT_EMPTY_LINECOL;
struct ts_linecol buffer_begv_linecol = track_linecol
? treesit_linecol_of_pos (BUF_BEGV_BYTE (buffer), BUF_TS_LINECOL_BEGV (buffer))
: TREESIT_EMPTY_LINECOL;
struct ts_linecol buffer_zv_linecol = track_linecol
? treesit_linecol_of_pos (BUF_ZV_BYTE (buffer), BUF_TS_LINECOL_ZV (buffer))
: TREESIT_EMPTY_LINECOL;
if (track_linecol) eassert (visi_beg_linecol.bytepos == visible_beg);
/* 1. Make sure visible_beg <= BUF_BEGV_BYTE. */
if (visible_beg > BUF_BEGV_BYTE (buffer))
{
TSPoint point_new_end = track_linecol
? treesit_make_ts_point (buffer_begv_linecol, visi_beg_linecol)
: TREESIT_TS_POINT_1_0;
/* Tree-sitter sees: insert at the beginning. */
treesit_tree_edit_1 (tree, 0, 0, visible_beg - BUF_BEGV_BYTE (buffer));
treesit_tree_edit_1 (tree, 0, 0, visible_beg - BUF_BEGV_BYTE (buffer),
TREESIT_TS_POINT_1_0, TREESIT_TS_POINT_1_0,
point_new_end);
visible_beg = BUF_BEGV_BYTE (buffer);
visi_beg_linecol = buffer_begv_linecol;
eassert (visible_beg <= visible_end);
}
/* 2. Make sure visible_end = BUF_ZV_BYTE. */
if (visible_end < BUF_ZV_BYTE (buffer))
{
TSPoint point_start = track_linecol
? treesit_make_ts_point (visi_beg_linecol, visi_end_linecol)
: TREESIT_TS_POINT_1_0;
TSPoint point_new_end = track_linecol
? treesit_make_ts_point (visi_beg_linecol, buffer_zv_linecol)
: TREESIT_TS_POINT_1_0;
/* Tree-sitter sees: insert at the end. */
treesit_tree_edit_1 (tree, visible_end - visible_beg,
visible_end - visible_beg,
BUF_ZV_BYTE (buffer) - visible_beg);
BUF_ZV_BYTE (buffer) - visible_beg,
point_start, point_start, point_new_end);
visible_end = BUF_ZV_BYTE (buffer);
visi_end_linecol = buffer_zv_linecol;
eassert (visible_beg <= visible_end);
}
else if (visible_end > BUF_ZV_BYTE (buffer))
{
TSPoint point_start = track_linecol
? treesit_make_ts_point (visi_beg_linecol, buffer_zv_linecol)
: TREESIT_TS_POINT_1_0;
TSPoint point_old_end = track_linecol
? treesit_make_ts_point (visi_beg_linecol, visi_end_linecol)
: TREESIT_TS_POINT_1_0;
/* Tree-sitter sees: delete at the end. */
treesit_tree_edit_1 (tree, BUF_ZV_BYTE (buffer) - visible_beg,
visible_end - visible_beg,
BUF_ZV_BYTE (buffer) - visible_beg);
BUF_ZV_BYTE (buffer) - visible_beg,
point_start, point_old_end, point_start);
visible_end = BUF_ZV_BYTE (buffer);
visi_end_linecol = buffer_zv_linecol;
eassert (visible_beg <= visible_end);
}
/* 3. Make sure visible_beg = BUF_BEGV_BYTE. */
if (visible_beg < BUF_BEGV_BYTE (buffer))
{
TSPoint point_old_end = track_linecol
? treesit_make_ts_point (visi_beg_linecol, buffer_begv_linecol)
: TREESIT_TS_POINT_1_0;
/* Tree-sitter sees: delete at the beginning. */
treesit_tree_edit_1 (tree, 0, BUF_BEGV_BYTE (buffer) - visible_beg, 0);
treesit_tree_edit_1 (tree, 0, BUF_BEGV_BYTE (buffer) - visible_beg, 0,
TREESIT_TS_POINT_1_0, point_old_end,
TREESIT_TS_POINT_1_0);
visible_beg = BUF_BEGV_BYTE (buffer);
visi_beg_linecol = buffer_begv_linecol;
eassert (visible_beg <= visible_end);
}
eassert (0 <= visible_beg);
@ -1108,6 +1670,14 @@ treesit_sync_visible_region (Lisp_Object parser)
XTS_PARSER (parser)->visible_beg = visible_beg;
XTS_PARSER (parser)->visible_end = visible_end;
XTS_PARSER (parser)->visi_beg_linecol = visi_beg_linecol;
XTS_PARSER (parser)->visi_end_linecol = visi_end_linecol;
if (track_linecol)
{
eassert (visi_beg_linecol.bytepos == visible_beg);
eassert (visi_end_linecol.bytepos == visible_end);
}
/* Fix ranges so that the ranges stays with in visible_end. Here we
try to do minimal work so that the ranges is minimally correct and
@ -1356,7 +1926,7 @@ treesit_read_buffer (void *parser, uint32_t byte_index,
Lisp_Object
make_treesit_parser (Lisp_Object buffer, TSParser *parser,
TSTree *tree, Lisp_Object language_symbol,
Lisp_Object tag)
Lisp_Object tag, bool tracks_linecol)
{
struct Lisp_TS_Parser *lisp_parser;
@ -1381,6 +1951,27 @@ make_treesit_parser (Lisp_Object buffer, TSParser *parser,
lisp_parser->need_to_gc_buffer = false;
lisp_parser->within_reparse = false;
eassert (lisp_parser->visible_beg <= lisp_parser->visible_end);
if (tracks_linecol)
{
struct buffer *old_buf = current_buffer;
set_buffer_internal (XBUFFER (buffer));
/* treesit_linecol_of_pos doesn't signal, so no need to
unwind-protect. */
lisp_parser->visi_beg_linecol
= treesit_linecol_of_pos (BEGV_BYTE, TREESIT_BOB_LINECOL);
lisp_parser->visi_end_linecol
= treesit_linecol_of_pos (ZV_BYTE, lisp_parser->visi_beg_linecol);
set_buffer_internal (old_buf);
}
else
{
lisp_parser->visi_beg_linecol = TREESIT_EMPTY_LINECOL;
lisp_parser->visi_end_linecol = TREESIT_EMPTY_LINECOL;
}
return make_lisp_ptr (lisp_parser, Lisp_Vectorlike);
}
@ -1698,13 +2289,28 @@ an indirect buffer. */)
always succeed. */
ts_parser_set_language (parser, lang);
const bool lang_need_linecol_tracking
= !NILP (Fmemq (remapped_lang,
Vtreesit_languages_require_line_column_tracking));
/* Create parser. Use the unmapped LANGUAGE symbol, so the nodes
created by this parser (and the parser itself) identify themselves
as the unmapped language. This makes the grammar mapping
completely transparent. */
Lisp_Object lisp_parser = make_treesit_parser (buf_orig,
parser, NULL,
language, tag);
language, tag,
lang_need_linecol_tracking);
/* Enable line-column tracking if this language requires it. */
if (lang_need_linecol_tracking && !treesit_buf_tracks_linecol_p (buf))
{
/* We can use TREESIT_BOB_LINECOL for begv and zv since these
cache doesn't need to be always in sync with BEGV and ZV. */
SET_BUF_TS_LINECOL_BEGV (buf, TREESIT_BOB_LINECOL);
SET_BUF_TS_LINECOL_POINT (buf, TREESIT_BOB_LINECOL);
SET_BUF_TS_LINECOL_ZV (buf, TREESIT_BOB_LINECOL);
}
/* Update parser-list. */
BVAR (buf, ts_parser_list) = Fcons (lisp_parser, BVAR (buf, ts_parser_list));
@ -4376,6 +4982,65 @@ nodes in the subtree, including NODE. */)
}
}
DEFUN ("treesit--linecol-at", Ftreesit__linecol_at,
Streesit__linecol_at, 1, 1, 0,
doc: /* Test buffer-local linecol cache.
Calculate the line and column at POS using the buffer-local cache,
return the line and column in the form of
(LINE . COL)
This is used for internal testing and debugging ONLY. */)
(Lisp_Object pos)
{
CHECK_NUMBER (pos);
struct ts_linecol pos_linecol
= treesit_linecol_of_pos (CHAR_TO_BYTE (XFIXNUM (pos)),
BUF_TS_LINECOL_POINT (current_buffer));
return Fcons (make_fixnum (pos_linecol.line), make_fixnum (pos_linecol.col));
}
DEFUN ("treesit--linecol-cache-set", Ftreesit__linecol_cache_set,
Streesit__linecol_cache_set, 3, 3, 0,
doc: /* Set the linecol cache for the current buffer.
This is used for internal testing and debugging ONLY. */)
(Lisp_Object line, Lisp_Object col, Lisp_Object bytepos)
{
CHECK_FIXNUM (line);
CHECK_FIXNUM (col);
CHECK_FIXNUM (bytepos);
struct ts_linecol linecol;
linecol.line = XFIXNUM (line);
linecol.col = XFIXNUM (col);
linecol.bytepos = XFIXNUM (bytepos);
SET_BUF_TS_LINECOL_POINT (current_buffer, linecol);
return Qnil;
}
DEFUN ("treesit--linecol-cache", Ftreesit__linecol_cache,
Streesit__linecol_cache, 0, 0, 0,
doc: /* Return the buffer-local linecol cache for debugging.
Return a plist (:line LINE :col COL :pos POS :bytepos BYTEPOS). This is
used for internal testing and debugging ONLY. */)
(void)
{
struct ts_linecol cache = BUF_TS_LINECOL_POINT (current_buffer);
Lisp_Object plist = (list4 (QCcol, make_fixnum (cache.col),
QCbytepos, make_fixnum (cache.bytepos)));
plist = Fcons (make_fixnum (cache.line), plist);
plist = Fcons (QCline, plist);
return plist;
}
#endif /* HAVE_TREE_SITTER */
DEFUN ("treesit-available-p", Ftreesit_available_p,
@ -4418,6 +5083,11 @@ syms_of_treesit (void)
DEFSYM (QCequal, ":equal");
DEFSYM (QCmatch, ":match");
DEFSYM (QCpred, ":pred");
DEFSYM (QCline, ":line");
DEFSYM (QCcol, ":col");
DEFSYM (QCpos, ":pos");
DEFSYM (QCbytepos, ":bytepos");
DEFSYM (Qnot_found, "not-found");
DEFSYM (Qsymbol_error, "symbol-error");
@ -4552,6 +5222,17 @@ applies to LANGUAGE-A will be redirected to LANGUAGE-B instead. */);
DEFSYM (Qtreesit_language_remap_alist, "treesit-language-remap-alist");
Fmake_variable_buffer_local (Qtreesit_language_remap_alist);
DEFVAR_LISP ("treesit-languages-require-line-column-tracking",
Vtreesit_languages_require_line_column_tracking,
doc:
/* A list of languages that need line-column tracking.
Most tree-sitter language grammars don't require line and column
tracking to work, but some languages do. When creating a parser, if the
language is in this list, Emacs enables line-column tracking for the
buffer. */);
Vtreesit_languages_require_line_column_tracking = Qnil;
staticpro (&Vtreesit_str_libtree_sitter);
Vtreesit_str_libtree_sitter = build_string ("libtree-sitter-");
staticpro (&Vtreesit_str_tree_sitter);
@ -4596,6 +5277,9 @@ applies to LANGUAGE-A will be redirected to LANGUAGE-B instead. */);
defsubr (&Streesit_language_abi_version);
defsubr (&Streesit_grammar_location);
defsubr (&Streesit_parser_tracking_line_column_p);
defsubr (&Streesit_tracking_line_column_p);
defsubr (&Streesit_parser_p);
defsubr (&Streesit_node_p);
defsubr (&Streesit_compiled_query_p);
@ -4649,6 +5333,10 @@ applies to LANGUAGE-A will be redirected to LANGUAGE-B instead. */);
defsubr (&Streesit_induce_sparse_tree);
defsubr (&Streesit_node_match_p);
defsubr (&Streesit_subtree_stat);
defsubr (&Streesit__linecol_at);
defsubr (&Streesit__linecol_cache);
defsubr (&Streesit__linecol_cache_set);
#endif /* HAVE_TREE_SITTER */
defsubr (&Streesit_available_p);
#ifdef WINDOWSNT

View file

@ -26,6 +26,7 @@ along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>. */
#include <tree_sitter/api.h>
#include "lisp.h"
#include "buffer.h"
INLINE_HEADER_BEGIN
@ -97,6 +98,14 @@ struct Lisp_TS_Parser
(ref:visible-beg-null) in treesit.c for more explanation. */
ptrdiff_t visible_beg;
ptrdiff_t visible_end;
/* Caches the line and column number of VISIBLE_BEG. It's always
valid and matches VISIBLE_BEG (because it's updated at each buffer
edit). (It has to be, because in treesit_record_change, we need to
calculate the line/col offset of old_end_linecol, the exact reason
why is left as an exercise to the reader.) */
struct ts_linecol visi_beg_linecol;
/* Similar to VISI_BEG_LINECOL but caches VISIBLE_END. */
struct ts_linecol visi_end_linecol;
/* This counter is incremented every time a change is made to the
buffer in treesit_record_change. The node retrieved from this parser
inherits this timestamp. This way we can make sure the node is
@ -222,9 +231,21 @@ CHECK_TS_COMPILED_QUERY (Lisp_Object query)
INLINE_HEADER_END
extern void treesit_record_change (ptrdiff_t, ptrdiff_t, ptrdiff_t);
extern const struct ts_linecol TREESIT_BOB_LINECOL;
/* An uninitialized linecol. */
extern const struct ts_linecol TREESIT_EMPTY_LINECOL;
extern const TSPoint TREESIT_TS_POINT_1_0;
extern bool treesit_buf_tracks_linecol_p (struct buffer *);
extern struct ts_linecol linecol_offset (struct ts_linecol,
struct ts_linecol);
extern struct ts_linecol treesit_linecol_maybe (ptrdiff_t, ptrdiff_t,
struct ts_linecol);
extern void treesit_record_change (ptrdiff_t, ptrdiff_t, ptrdiff_t,
struct ts_linecol, struct ts_linecol,
ptrdiff_t);
extern Lisp_Object make_treesit_parser (Lisp_Object, TSParser *, TSTree *,
Lisp_Object, Lisp_Object);
Lisp_Object, Lisp_Object, bool);
extern Lisp_Object make_treesit_node (Lisp_Object, TSNode);
extern bool treesit_node_uptodate_p (Lisp_Object);

View file

@ -1163,8 +1163,6 @@ static const char *decode_mode_spec (struct window *, int, int, Lisp_Object *);
static void display_menu_bar (struct window *);
static void display_tab_bar (struct window *);
static void update_tab_bar (struct frame *, bool);
static ptrdiff_t display_count_lines (ptrdiff_t, ptrdiff_t, ptrdiff_t,
ptrdiff_t *);
static void pint2str (register char *, register int, register ptrdiff_t);
static int display_string (const char *, Lisp_Object, Lisp_Object,
@ -29399,7 +29397,7 @@ count_lines (ptrdiff_t start_byte, ptrdiff_t end_byte)
found COUNT lines, or LIMIT_BYTE if we hit the limit before finding
COUNT lines. */
static ptrdiff_t
ptrdiff_t
display_count_lines (ptrdiff_t start_byte,
ptrdiff_t limit_byte, ptrdiff_t count,
ptrdiff_t *byte_pos_ptr)

View file

@ -224,6 +224,84 @@
(kill-buffer base)
(kill-buffer indirect))))
;;; Linecol
(ert-deftest treesit-linecol-basic ()
"Tests for basic lincol synchronization."
(with-temp-buffer
(should (equal (treesit--linecol-cache)
'(:line 0 :col 0 :bytepos 0)))
(treesit--linecol-cache-set 1 0 1)
(should (equal (treesit--linecol-at (point))
'(1 . 0)))
(insert "\n")
;; Buffer content: a single newline.
(should (equal (treesit--linecol-at (point))
'(2 . 0)))
(treesit--linecol-cache-set 2 0 2)
(should (equal (treesit--linecol-cache)
'(:line 2 :col 0 :bytepos 2)))
(goto-char (point-min))
(should (equal (treesit--linecol-at (point))
'(1 . 0)))
(insert "0123456789")
;; Buffer content: ten chars followed by a newline.
(treesit--linecol-cache-set 1 0 1)
(should (equal (treesit--linecol-at (point))
'(1 . 10)))
(goto-char (point-max))
(should (equal (treesit--linecol-at (point))
'(2 . 0)))
(treesit--linecol-cache-set 1 5 6)
(should (equal (treesit--linecol-at (point))
'(2 . 0)))
(treesit--linecol-cache-set 2 0 12)
;; Position 6 is in the middle of the first line.
(should (equal (treesit--linecol-at 6)
'(1 . 5)))
;; Position 11 is at the end of the line.
(should (equal (treesit--linecol-at 11)
'(1 . 10)))))
(ert-deftest treesit-linecol-search-back-across-newline ()
"Search for newline backwards."
(with-temp-buffer
(insert "\n ")
(treesit--linecol-cache-set 2 1 3)
(should (equal (treesit--linecol-at (point)) '(2 . 1)))
(should (equal (treesit--linecol-at 2) '(2 . 0)))
(should (equal (treesit--linecol-at 1) '(1 . 0)))))
(ert-deftest treesit-linecol-col-same-line ()
"Test col calculation when cache and target pos is in the same line."
(with-temp-buffer
(insert "aaaaaa")
(treesit--linecol-cache-set 1 5 6)
(should (equal (treesit--linecol-at 6) '(1 . 5)))
(should (equal (treesit--linecol-at 2) '(1 . 1)))
(should (equal (treesit--linecol-at 1) '(1 . 0)))))
(ert-deftest treesit-linecol-enable-disable ()
"Test enabling/disabling linecol tracking."
(skip-unless (treesit-language-available-p 'json))
(with-temp-buffer
(let ((treesit-languages-require-line-column-tracking nil)
parser)
(setq parser (treesit-parser-create 'json))
(should (not (treesit-tracking-line-column-p)))
(should (not (treesit-parser-tracking-line-column-p parser)))
(setq treesit-languages-require-line-column-tracking '(json))
(setq parser (treesit-parser-create 'json nil t))
(should (treesit-tracking-line-column-p))
(should (treesit-parser-tracking-line-column-p parser)))))
;;; Tree traversal
(ert-deftest treesit-search-subtree ()