emacs/admin/notes/tree-sitter/treesit_record_change
2024-07-18 11:46:50 +02:00

218 lines
9.2 KiB
Text
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

NOTES ON TREESIT_RECORD_CHANGE
It is vital that Emacs informs tree-sitter of every change made to the
buffer, lest tree-sitter's parse tree would be corrupted/out of sync.
Almost all buffer changes in Emacs are made through functions in
insdel.c (see below for exceptions), I augmented functions in insdel.c
with calls to treesit_record_change. Below is a manifest of all the
relevant functions in insdel.c as of Emacs 29:
Function Calls
----------------------------------------------------------------------
copy_text (*1)
insert insert_1_both
insert_and_inherit insert_1_both
insert_char insert
insert_string insert
insert_before_markers insert_1_both
insert_before_markers_and_inherit insert_1_both
insert_1_both treesit_record_change
insert_from_string insert_from_string_1
insert_from_string_before_markers insert_from_string_1
insert_from_string_1 treesit_record_change
insert_from_gap_1 treesit_record_change
insert_from_gap insert_from_gap_1
insert_from_buffer treesit_record_change
insert_from_buffer_1 (used by insert_from_buffer) (*2)
replace_range treesit_record_change
replace_range_2 (caller needs to call treesit_r_c)
del_range del_range_1
del_range_1 del_range_2
del_range_byte del_range_2
del_range_both del_range_2
del_range_2 treesit_record_change
(*1) This functions is used only to copy from string to string when
used outside of insdel.c, and when used inside insdel.c, the caller
calls treesit_record_change.
(*2) This function is a static function, and insert_from_buffer is its
only caller. So it should be fine to call treesit_record_change in
insert_from_buffer but not insert_from_buffer_1. I also left a
reminder comment.
EXCEPTIONS
There are a couple of functions that replaces characters in-place
rather than insert/delete. They are in casefiddle.c and editfns.c.
In casefiddle.c, do_casify_unibyte_region and
do_casify_multibyte_region modifies buffer, but they are static
functions and are called by casify_region, which calls
treesit_record_change. Other higher-level functions calls
casify_region to do the work.
In editfns.c, subst-char-in-region and translate-region-internal might
replace characters in-place, I made them to call
treesit_record_change. transpose-regions uses memcpy to move text
around, it calls treesit_record_change too.
I found these exceptions by grepping for signal_after_change and
checking each caller manually. Below is all the result as of Emacs 29
and some comment for each one. Readers can use
(highlight-regexp "^[^[:space:]]+?\\.c:[[:digit:]]+:[^z-a]+?$" 'highlight)
to make things easier to read.
grep [...] --color=auto -i --directories=skip -nH --null -e signal_after_change *.c
callproc.c:789: calling prepare_to_modify_buffer and signal_after_change.
callproc.c:793: is one call to signal_after_change in each of the
callproc.c:800: signal_after_change hasn't. A continue statement
callproc.c:804: again, and this time signal_after_change gets called,
Not code.
callproc.c:820: signal_after_change (PT - nread, 0, nread);
callproc.c:863: signal_after_change (PT - process_coding.produced_char,
Both are called in call-process. I dont think well ever use
tree-sitter in call-processs stdio buffer, right? I didnt check
line-by-line, but it seems to only use insert_1_both and del_range_2.
casefiddle.c:558: signal_after_change (start, end - start - added, end - start);
Called in casify-region, calls treesit_record_change.
decompress.c:195: signal_after_change (data->orig, data->start - data->orig,
Called in unwind_decompress, uses del_range_2, insdel function.
decompress.c:334: signal_after_change (istart, iend - istart, unwind_data.nbytes);
Called in zlib-decompress-region, uses del_range_2, insdel function.
editfns.c:2139: signal_after_change (BEGV, size_a, ZV - BEGV);
Called in replace-buffer-contents, which calls del_range and
Finsert_buffer_substring, both are ok.
editfns.c:2416: signal_after_change (changed,
Called in subst-char-in-region, which either calls replace_range (a
insdel function) or modifies buffer content by itself (need to call
treesit_record_change).
editfns.c:2544: /* Reload as signal_after_change in last iteration may GC. */
Not code.
editfns.c:2604: signal_after_change (pos, 1, 1);
Called in translate-region-internal, which has three cases:
if (nc != oc && nc >= 0) {
if (len != str_len) {
replace_range()
} else {
while (str_len-- > 0)
*p++ = *str++;
}
}
else if (nc < 0) {
replace_range()
}
replace_range is ok, but in the case where it manually modifies buffer
content, it needs to call treesit_record_change.
editfns.c:4779: signal_after_change (start1, end2 - start1, end2 - start1);
Called in transpose-regions. It just uses memcpys and doesnt use
insdel functions; needs to call treesit_record_change.
fileio.c:4825: signal_after_change (PT, 0, inserted);
Called in insert_file_contents. Uses insert_1_both (very first in the
function); del_range_1 and del_range_byte (the optimized way to
implement replace when decoding isnt needed); del_range_byte and
insert_from_buffer (the optimized way used when decoding is needed);
decode_coding_gap or insert_from_gap_1 (Im not sure the condition for
this, but anyway its safe). The function also calls memcpy and
memmove, but they are irrelevant: memcpy is used for decoding, and
memmove is moving stuff inside the gap for decode_coding_gap.
Id love someone to verify this function, since its so complicated
and large, but from what I can tell its safe.
fns.c:3998: signal_after_change (XFIXNAT (beg), 0, inserted_chars);
Called in base64-decode-region, uses insert_1_both and del_range_both,
safe.
insdel.c:681: signal_after_change (opoint, 0, len);
insdel.c:696: signal_after_change (opoint, 0, len);
insdel.c:741: signal_after_change (opoint, 0, len);
insdel.c:757: signal_after_change (opoint, 0, len);
insdel.c:976: signal_after_change (opoint, 0, PT - opoint);
insdel.c:996: signal_after_change (opoint, 0, PT - opoint);
insdel.c:1187: signal_after_change (opoint, 0, PT - opoint);
insdel.c:1412: signal_after_change. */
insdel.c:1585: signal_after_change (from, nchars_del, GPT - from);
insdel.c:1600: prepare_to_modify_buffer and never call signal_after_change.
insdel.c:1603: region once. Apart from signal_after_change, any caller of this
insdel.c:1747: signal_after_change (from, to - from, 0);
insdel.c:1789: signal_after_change (from, to - from, 0);
insdel.c:1833: signal_after_change (from, to - from, 0);
insdel.c:2223:signal_after_change (ptrdiff_t charpos, ptrdiff_t lendel, ptrdiff_t lenins)
insdel.c:2396: signal_after_change (begpos, endpos - begpos - change, endpos - begpos);
Ive checked all insdel functions. We can assume insdel functions are
all safe.
json.c:790: signal_after_change (PT, 0, inserted);
Called in json-insert, calls either decode_coding_gap or
insert_from_gap_1, both are safe. Calls memmove but its for
decode_coding_gap.
keymap.c:2873: /* Insert calls signal_after_change which may GC. */
Not code.
print.c:219: signal_after_change (PT - print_buffer.pos, 0, print_buffer.pos);
Called in print_finish, calls copy_text and insert_1_both, safe.
process.c:6365: process buffer is changed in the signal_after_change above.
search.c:2763: (see signal_before_change and signal_after_change). Try to error
Not code.
search.c:2777: signal_after_change (sub_start, sub_end - sub_start, SCHARS (newtext));
Called in replace_match. Calls replace_range, upcase-region,
upcase-initials-region (both calls casify_region in the end), safe.
Calls memcpy but its for string manipulation.
textprop.c:1261: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
textprop.c:1272: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
textprop.c:1283: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
textprop.c:1458: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
textprop.c:1652: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
textprop.c:1661: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
textprop.c:1672: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
textprop.c:1750: before changes are made and signal_after_change when we are done.
textprop.c:1752: and call signal_after_change before returning if MODIFIED. */
textprop.c:1764: signal_after_change (XFIXNUM (start),
textprop.c:1778: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
textprop.c:1791: signal_after_change (XFIXNUM (start), XFIXNUM (end) - XFIXNUM (start),
textprop.c:1810: signal_after_change (XFIXNUM (start),
We dont care about text property changes.
Grep finished with 51 matches found at Wed Jun 28 15:12:23