README: Extending with your own regular expressions

This commit is contained in:
Karl Voit 2020-03-01 14:36:24 +01:00
parent 76998327f0
commit 644cdb9da4

View file

@ -156,6 +156,58 @@ vk@sherri ~tmp % guessfilename *mp4
The =info.json= files are not removed or renamed.
** Extending with your own regular expressions
The structure of the script is like the following:
- general header, command-line argument parser, ...
- =handle_logging()=
- =error_exit()=
- =FileSizePlausibilityException()=
- =class GuessFilename()=
- *a long list of regular expression definitions*
- =derive_new_filename_from_old_filename()=
- here, you can *add code to interpret the regular expressions*
- =derive_new_filename_from_content()=
- if you want to parse PDF content, add your code here
- =derive_new_filename_from_json_metadata()=
- this handles the JSON meta-data files generated by [[https://ytdl-org.github.io/youtube-dl/index.html][youtube-dl]] (see above)
- =handle_file()=
- the function that loops over all files is probing for new file names until a function is returning with a new name:
1. =derive_new_filename_from_old_filename()=
2. =derive_new_filename_from_content()=
3. =derive_new_filename_from_json_metadata()=
4. if no name returned until here: prints out a warning that no new name could be derived
- The rest of the class consist of a bunch of tool functions, e.g., for parsing and querying:
- =adding_tags()=
- =split_filename_entities()=
- =contains_one_of()=
- =contains_all_of()=
- =fuzzy_contains_one_of()=
- =fuzzy_contains_all_of()=
- =has_euro_charge()=
- =get_euro_charge()=
- =get_euro_charge_from_context_or_basename()=
- =get_euro_charge_from_context()=
- =rename_file()=
- =get_datetime_string_from_named_groups()=
- =get_date_string_from_named_groups()=
- =get_datetime_description_extension_filename()=
- =get_date_description_extension_filename()=
- =NumToMonth()=
- =translate_ORF_quality_string_to_tag()=
- =get_file_size()=
- =warn_if_ORF_file_seems_to_small_according_to_duration_and_quality_indicator()=
- =move_to_success_dir()=
- =move_to_error_dir()=
- =main()=
For the most basic pattern matching, you just have to add regular
expressions to the =GuessFilename()= class and add the regex matching
code to =derive_new_filename_from_old_filename()=.
Do not forget to add simple tests to =guessfilename_test.py= as well!
* Related tools and workflows
This tool is part of a tool-set which I use to manage my digital files