mirror of
git://git.sv.gnu.org/emacs.git
synced 2026-02-16 17:24:23 +00:00
Add line-column tracking for tree-sitter parsers. Copied from comments in treesit.c: Technically we had to send tree-sitter the line and column position of each edit. But in practice we just send it dummy values, because tree-sitter doesn't use it for parsing and mostly just carries the line and column positions around and return it when e.g. reporting node positions[1]. This has been working fine until we encountered grammars that actually utilizes the line and column information for parsing (Haskell)[2]. [1] https://github.com/tree-sitter/tree-sitter/issues/445 [2] https://github.com/tree-sitter/tree-sitter/issues/4001 So now we have to keep track of line and column positions and pass valid values to tree-sitter. (It adds quite some complexity, but only linearly; one can ignore all the linecol stuff when trying to understand treesit code and then come back to it later.) Eli convinced me to disable tracking by default, and only enable it for languages that needs it. So the buffer starts out not tracking linecol. And when a parser is created, if the language is in treesit-languages-require-line-column-tracking, we enable tracking in the buffer, and enable tracking for the parser. To simplify things, once a buffer starts tracking linecol, it never disables tracking, even if parsers that need tracking are all deleted; and for parsers, tracking is determined at creation time, if it starts out tracking/non-tracking, it stays that way, regardless of later changes to treesit-languages-require-line-column-tracking. To make calculating line/column positons fast, we store linecol caches for begv, point, and zv in the buffer (buf->ts_linecol_cache_xxx); and in the parser object, we store linecol cache for visible beg/end of that parser. In buffer editing functions, we need the linecol for start/old_end/new_end, those can be calculated by scanning newlines (treesit_linecol_of_pos) from the buffer point cache, which should be always near the point. And we usually set the calculated linecol of new_end back to the buffer point cache. We also need to calculate linecol for the visible_beg/end for each parser, and linecol for the buffer's begv/zv, these positions are usually far from point, so we have caches for all of them (in either the parser object or the buffer). These positions are far from point, so it's inefficient to scan newlines from point to there to get up-to-date linecol for them; but in the same time, because they're far and outside the changed region, we can calculate their change in line and column number by simply counting how much newlines are added/removed in the changed region (compute_new_linecol_by_change). * doc/lispref/parsing.texi (Using Parser): Mention line-column tracking in manual. * etc/NEWS: Add news. * lisp/treesit.el: (treesit-languages-need-line-column-tracking): New variable. * src/buffer.c: Include treesit.h (for TREESIT_EMPTY_LINECOL). (Fget_buffer_create): (Fmake_indirect_buffer): Initialize new buffer fields. (Fbuffer_swap_text): Add new buffer fields. * src/buffer.h (ts_linecol): New struct. (buffer): New buffer fields. (BUF_TS_LINECOL_BEGV): (BUF_TS_LINECOL_POINT): (BUF_TS_LINECOL_ZV): (SET_BUF_TS_LINECOL_BEGV): (SET_BUF_TS_LINECOL_POINT): (SET_BUF_TS_LINECOL_ZV): New inline functions. * src/casefiddle.c (casify_region): Record linecol info. * src/editfns.c (Fsubst_char_in_region): (Ftranslate_region_internal): (Ftranspose_regions): Record linecol info. * src/insdel.c (insert_1_both): (insert_from_string_1): (insert_from_gap_1): (insert_from_buffer): (replace_range): (del_range_2): Record linecol info. * src/treesit.c (TREESIT_BOB_LINECOL): (TREESIT_EMPTY_LINECOL): (TREESIT_TS_POINT_1_0): New constants. (treesit_debug_print_linecol): (treesit_buf_tracks_linecol_p): (restore_restriction_and_selective_display): (treesit_count_lines): (treesit_debug_validate_linecol): (treesit_linecol_of_pos): (treesit_make_ts_point): (Ftreesit_tracking_line_column_p): (Ftreesit_parser_tracking_line_column_p): New functions. (treesit_tree_edit_1): Accept real TSPoint and pass to tree-sitter. (compute_new_linecol_by_change): New function. (treesit_record_change_1): Rename from treesit_record_change, handle linecol if tracking is enabled. (treesit_linecol_maybe): New function. (treesit_record_change): New wrapper around treesit_record_change_1 that handles some boilerplate and sets buffer state. (treesit_sync_visible_region): Handle linecol if tracking is enabled. (make_treesit_parser): Setup parser's linecol cache if tracking is enabled. (Ftreesit_parser_create): Enable tracking if the parser's language requires it. (Ftreesit__linecol_at): (Ftreesit__linecol_cache_set): (Ftreesit__linecol_cache): New functions for debugging and testing. (syms_of_treesit): New variable Vtreesit_languages_require_line_column_tracking. * src/treesit.h (Lisp_TS_Parser): New fields. (TREESIT_BOB_LINECOL): (TREESIT_EMPTY_LINECOL): New constants. * test/src/treesit-tests.el (treesit-linecol-basic): (treesit-linecol-search-back-across-newline): (treesit-linecol-col-same-line): (treesit-linecol-enable-disable): New tests. * src/lisp.h: Declare display_count_lines. * src/xdisp.c (display_count_lines): Remove static keyword. |
||
|---|---|---|
| .. | ||
| data | ||
| infra | ||
| lib-src | ||
| lisp | ||
| manual | ||
| misc | ||
| src | ||
| ChangeLog.1 | ||
| file-organization.org | ||
| Makefile.in | ||
| README | ||
Copyright (C) 2008-2025 Free Software Foundation, Inc.
See the end of the file for license conditions.
This directory contains files intended to test various aspects of
Emacs's functionality. Please help add tests!
See the file file-organization.org for the details of the directory
structure and file-naming conventions.
For tests in the manual/ subdirectory, look there for separate README
files, or look for instructions in the test files themselves.
Emacs uses ERT, Emacs Lisp Regression Testing, for testing. See (info
"(ert)") or https://www.gnu.org/software/emacs/manual/html_node/ert/
for more information on writing and running tests.
Tests could be tagged by the developer. In this test directory, the
following tags are recognized:
* :expensive-test
The test needs a serious amount of time to run. It is not intended
to run on a regular basis by users. Instead, it runs on demand
only, or during regression tests.
* :nativecomp
The test runs only if Emacs is configured with Lisp native compiler
support.
* :unstable
The test is under development. It shall run on demand only.
The Makefile sets the environment variable $EMACS_TEST_DIRECTORY,
which points to this directory. This environment variable does not
exist when the tests are run outside make. The Makefile supports the
following targets:
* make check
Run all tests as defined in the directory. Expensive and unstable
tests are suppressed. The result of the tests for <filename>.el is
stored in <filename>.log.
* make check-maybe
Like "make check", but run only the tests for files which have
unresolved prerequisites.
* make check-expensive
Like "make check", but run also the tests marked as expensive.
* make check-all
Like "make check", but run all tests.
* make check-<dirname>
Like "make check", but run only the tests in test/<dirname>/*.el.
<dirname> is a relative directory path, which has replaced "/" by "-",
like in "check-src" or "check-lisp-net".
* make <filename> -or- make <filename>.log
Run all tests declared in <filename>.el. This includes expensive
tests. In the former case the output is shown on the terminal, in
the latter case the output is written to <filename>.log.
<filename> could be either a relative file name like
"lisp/files-tests", or a package name like "files-tests".
ERT offers selectors, which make it possible to filter out which test
cases shall run. The make variable $(SELECTOR) gives you a simple
mean to use your own selectors. The ERT manual describes how
selectors are constructed, see (info "(ert)Test Selectors") or
https://www.gnu.org/software/emacs/manual/html_node/ert/Test-Selectors.html
You could use predefined selectors of the Makefile. "make <filename>
SELECTOR='$(SELECTOR_DEFAULT)'" runs all tests for <filename>.el
except the tests tagged as expensive or unstable. Other predefined
selectors are $(SELECTOR_EXPENSIVE) (run all tests except unstable
ones) and $(SELECTOR_ALL) (run all tests).
If your test file contains the tests "test-foo", "test2-foo" and
"test-foo-remote", and you want to run only the former two tests, you
could use a selector regexp (note that the "$" needs to be doubled to
protect against "make" variable expansion):
make <filename> SELECTOR='"foo$$"'
In case you want to use the symbol name of a test as selector, you can
use it directly:
make <filename> SELECTOR='test-foo-remote'
Note that although the test files are always compiled (unless they set
no-byte-compile), the source files will be run when expensive or
unstable tests are involved, to give nicer backtraces. To run the
compiled version of a test use
make TEST_LOAD_EL=no ...
Some tests might take long time to run. In order to summarize the
<nn> tests with the longest duration, call
make SUMMARIZE_TESTS=<nn> ...
The backtrace of failing tests are truncated to the default value of
'ert-batch-backtrace-right-margin'. To see more of the backtrace, use
make TEST_BACKTRACE_LINE_LENGTH=<nn> ...
The tests are run in batch mode by default; sometimes it's useful to
get precisely the same environment but run in interactive mode for
debugging. To do that, use
make TEST_INTERACTIVE=yes ...
Sometimes, some further settings are needed in order to run the batch
test. This can be indicated by the $EMACS_EXTRAOPT environment
variable, like
make ... EMACS_EXTRAOPT="--eval '(setopt ert-batch-print-length nil ert-batch-print-level nil)'"
By default, ERT test failure summaries are quite brief in batch
mode--only the names of the failed tests are listed. If the
$EMACS_TEST_VERBOSE environment variable is set and non-empty, the
failure summaries will also include the data from the failing test.
If the $EMACS_TEST_JUNIT_REPORT environment variable is set to a file
name, a JUnit test report is generated under this name.
Some of the tests require a remote temporary directory
(autorevert-tests.el, dnd-tests.el, eglot-tests.el, filenotify-tests.el,
shadowfile-tests.el and tramp-tests.el). Per default, a mock-up
connection method is used (this might not be possible when running on
MS Windows). If you want to test a real remote connection, set
$REMOTE_TEMPORARY_FILE_DIRECTORY to a suitable value in order to
overwrite the default value:
env REMOTE_TEMPORARY_FILE_DIRECTORY=/ssh:host:/tmp make ...
There are also continuous integration tests on
<https://hydra.nixos.org/jobset/gnu/emacs-trunk> (see
admin/notes/hydra) and <https://emba.gnu.org/emacs/emacs> (see
admin/notes/emba). Both environments provide an environment variable,
which could be used to determine, whether the tests run in one of
these test environments.
$EMACS_HYDRA_CI indicates the hydra environment, and $EMACS_EMBA_CI
indicates the emba environment, respectively.
If tests on these premises take too long, and it is needed to create a
core dump for further analysis, the environment variable
$EMACS_TEST_TIMEOUT could set a limit (in seconds) when this shall
happen.
(Also, see etc/compilation.txt for compilation mode font lock tests
and etc/grep.txt for grep mode font lock tests.)
This file is part of GNU Emacs.
GNU Emacs is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
GNU Emacs is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.