From 644cdb9da474046aceabb77b8048bde31dc9b6c6 Mon Sep 17 00:00:00 2001 From: Karl Voit Date: Sun, 1 Mar 2020 14:36:24 +0100 Subject: [PATCH] README: Extending with your own regular expressions --- README.org | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/README.org b/README.org index 828c256..44d8ff1 100644 --- a/README.org +++ b/README.org @@ -156,6 +156,58 @@ vk@sherri ~tmp % guessfilename *mp4 The =info.json= files are not removed or renamed. +** Extending with your own regular expressions + +The structure of the script is like the following: + +- general header, command-line argument parser, ... +- =handle_logging()= +- =error_exit()= +- =FileSizePlausibilityException()= +- =class GuessFilename()= + - *a long list of regular expression definitions* + - =derive_new_filename_from_old_filename()= + - here, you can *add code to interpret the regular expressions* + - =derive_new_filename_from_content()= + - if you want to parse PDF content, add your code here + - =derive_new_filename_from_json_metadata()= + - this handles the JSON meta-data files generated by [[https://ytdl-org.github.io/youtube-dl/index.html][youtube-dl]] (see above) + - =handle_file()= + - the function that loops over all files is probing for new file names until a function is returning with a new name: + 1. =derive_new_filename_from_old_filename()= + 2. =derive_new_filename_from_content()= + 3. =derive_new_filename_from_json_metadata()= + 4. if no name returned until here: prints out a warning that no new name could be derived + - The rest of the class consist of a bunch of tool functions, e.g., for parsing and querying: + - =adding_tags()= + - =split_filename_entities()= + - =contains_one_of()= + - =contains_all_of()= + - =fuzzy_contains_one_of()= + - =fuzzy_contains_all_of()= + - =has_euro_charge()= + - =get_euro_charge()= + - =get_euro_charge_from_context_or_basename()= + - =get_euro_charge_from_context()= + - =rename_file()= + - =get_datetime_string_from_named_groups()= + - =get_date_string_from_named_groups()= + - =get_datetime_description_extension_filename()= + - =get_date_description_extension_filename()= + - =NumToMonth()= + - =translate_ORF_quality_string_to_tag()= + - =get_file_size()= + - =warn_if_ORF_file_seems_to_small_according_to_duration_and_quality_indicator()= +- =move_to_success_dir()= +- =move_to_error_dir()= +- =main()= + +For the most basic pattern matching, you just have to add regular +expressions to the =GuessFilename()= class and add the regex matching +code to =derive_new_filename_from_old_filename()=. + +Do not forget to add simple tests to =guessfilename_test.py= as well! + * Related tools and workflows This tool is part of a tool-set which I use to manage my digital files