README: Extending with your own regular expressions

2026-02-16 13:24:15 +00:00 · 2020-03-01 14:36:24 +01:00 · 2020-03-01 14:36:24 +01:00 · 644cdb9da4
commit 644cdb9da4
parent 76998327f0
1 changed files with 52 additions and 0 deletions
--- a/README.org
+++ b/README.org
@ -156,6 +156,58 @@ vk@sherri ~tmp % guessfilename *mp4

 The =info.json= files are not removed or renamed.

+** Extending with your own regular expressions
+
+The structure of the script is like the following:
+
+- general header, command-line argument parser, ...
+- =handle_logging()=
+- =error_exit()=
+- =FileSizePlausibilityException()=
+- =class GuessFilename()=
+  - *a long list of regular expression definitions*
+  - =derive_new_filename_from_old_filename()=
+    - here, you can *add code to interpret the regular expressions*
+  - =derive_new_filename_from_content()=
+    - if you want to parse PDF content, add your code here
+  - =derive_new_filename_from_json_metadata()=
+    - this handles the JSON meta-data files generated by [[https://ytdl-org.github.io/youtube-dl/index.html][youtube-dl]] (see above)
+  - =handle_file()=
+    - the function that loops over all files is probing for new file names until a function is returning with a new name:
+      1. =derive_new_filename_from_old_filename()=
+      2. =derive_new_filename_from_content()=
+      3. =derive_new_filename_from_json_metadata()=
+      4. if no name returned until here: prints out a warning that no new name could be derived
+  - The rest of the class consist of a bunch of tool functions, e.g., for parsing and querying:
+  - =adding_tags()=
+  - =split_filename_entities()=
+  - =contains_one_of()=
+  - =contains_all_of()=
+  - =fuzzy_contains_one_of()=
+  - =fuzzy_contains_all_of()=
+  - =has_euro_charge()=
+  - =get_euro_charge()=
+  - =get_euro_charge_from_context_or_basename()=
+  - =get_euro_charge_from_context()=
+  - =rename_file()=
+  - =get_datetime_string_from_named_groups()=
+  - =get_date_string_from_named_groups()=
+  - =get_datetime_description_extension_filename()=
+  - =get_date_description_extension_filename()=
+  - =NumToMonth()=
+  - =translate_ORF_quality_string_to_tag()=
+  - =get_file_size()=
+  - =warn_if_ORF_file_seems_to_small_according_to_duration_and_quality_indicator()=
+- =move_to_success_dir()=
+- =move_to_error_dir()=
+- =main()=
+
+For the most basic pattern matching, you just have to add regular
+expressions to the =GuessFilename()= class and add the regex matching
+code to =derive_new_filename_from_old_filename()=.
+
+Do not forget to add simple tests to =guessfilename_test.py= as well!
+
 * Related tools and workflows

 This tool is part of a tool-set which I use to manage my digital files