guess-filename.py/README.org

* guessfilename.py

This Python script tries to come up with a new file name for each
file from command line argument.

It does this with several methods: first, the current file name is
analyzed and any [[https://en.wikipedia.org/wiki/Iso_date][ISO date/timestamp]] and [[https://github.com/novoid/filetags/][filetags]] are re-used.
Secondly, if the parsing of the file name did not lead to any new file
name, the content of the file is analyzed. Following file types are
supported by now:
- PDF files

The script accepts an arbitrary number of files (see your shell for
possible length limitations).

- *Target group*: users who are able to use command line tools and who
  are using tags in file names.
- Hosted on github: https://github.com/novoid/guess-filename.py

** Why

I do scan almost all paper mail. Many of those documents are sent to
me regularily. Such documents are bills or insurance informations, for
example.

Being too lazy to name those files manually with high chances of
getting many variants for the same document type, I came up with a
method to derive file names from either the old file name (cues I
enter without knowing the exact target file name) or the file content.

Analyzing the content enables this script to recognize bills via
customer numbers or phone numbers, amounts to pay, and so on.

** Usage

#+BEGIN_SRC sh :results output :wrap src
guessfilename --help
#+END_SRC

#+BEGIN_src
Usage:
    guessfilename [<options>] <list of files>

This little Python script tries to rename files according to pre-defined rules.

It does this with several methods: first, the current file name is analyzed and
any ISO date/timestamp and filetags are re-used. Secondly, if the parsing of the
file name did not lead to any new file name, the content of the file is analyzed.

You have to adapt the rules in the Python script to meet your requirements.
The default rule-set follows the filename convention described on
http://karl-voit.at/managing-digital-photographs/


:copyright: (c) by Karl Voit
:license: GPL v3 or any later version
:URL: https://github.com/novoid/guess-filename.py
:bugreports: via github or <tools@Karl-Voit.at>


Options:
  -h, --help     show this help message and exit
  -d, --dryrun   enable dryrun mode: just simulate what would happen, do not
                 modify files
  -v, --verbose  enable verbose mode
  -q, --quiet    enable quiet mode
  --version      display version and exit
#+END_src

** MediathekView
:PROPERTIES:
:CREATED:  [2018-05-10 Thu 17:03]
:END:

When downloading TV shows using [[https://github.com/mediathekview/MediathekView][MediathekView]], you should use the following download pattern:

- MediathekView v11:
  : %DT%d %s - %t - %T -ORIGINAL- %N.mp4

- MediathekView v13:
  - Einstellungen > Aufzeichnen und Abspielen > Set bearbeiten
    - [Set-Name] > Hilfsprogramme:
      - ffmpeg > Zieldateiname > =%DT%d %s - %t - %T -ORIGINALhd- %N.mp4=
      - ffmpeg > Schalter > =-user_agent "Mozilla" -i %f -c copy -bsf:a aac_adtstoasc **=

When applying =guessfilename= on the resulting files, you will get something like this:

#+BEGIN_EXAMPLE
   20180509T235000 ORF - ZIB 24 - Auswirkungen nach US-Aus für Atomdeal -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Auswirkungen-na__13976363__o__1735069995__s14297628_8__BCK1HD_23514710P_23540405P_Q4A.mp4  ...
       →  2018-05-09T23.51.47 ORF - ZIB 24 - Auswirkungen nach US-Aus für Atomdeal -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Hirntoter Bub plötzlich aufgewacht -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Hirntoter-Bub-p__13976363__o__5119815115__s14297631_1__BCK1HD_00045915P_00072303P_Q4A.mp4  ...
       →  2018-05-09T00.04.59 ORF - ZIB 24 - Hirntoter Bub plötzlich aufgewacht -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Meldungen -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Meldungen__13976363__o__1117657593__s14297632_2__BCK1HD_00072303P_00085816P_Q4A.mp4  ...
       →  2018-05-09T00.07.23 ORF - ZIB 24 - Meldungen -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Neuerung bei Filmfestspielen in Cannes -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Neuerung-bei-Fi__13976363__o__1941003027__s14297634_4__BCK1HD_00085816P_00111715P_Q4A.mp4  ...
       →  2018-05-09T00.08.58 ORF - ZIB 24 - Neuerung bei Filmfestspielen in Cannes -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Trumps CIA-Kandidatin umstritten -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Trumps-Kandidat__13976363__o__1488806017__s14297630_0__BCK1HD_00020922P_00045915P_Q4A.mp4  ...
       →  2018-05-09T00.02.09 ORF - ZIB 24 - Trumps CIA-Kandidatin umstritten -- lowquality.mp4

   20180509T235000 ORF - ZIB 24 - Wetter -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Wetter__13976363__o__2966973785__s14297635_5__BCK1HD_00111715P_00120000P_Q4A.mp4  ...
       →  2018-05-09T00.11.17 ORF - ZIB 24 - Wetter -- lowquality.mp4
#+END_EXAMPLE

As you can see, the temporal order of the chunks is extracted so that
the files are in their correct order.

Please note that this does not work with a show whose chunks do cross
midnight since the date is always taken from the start of the show and
the time from the actual time being shown.

** .info.json Meta-Data Files
:PROPERTIES:
:CREATED:  [2019-10-19 Sat 15:21]
:END:

If you do download a media file and its associated separate
=.info.json= file (both base-names without file extension need to
match), this tool is able to parse the meta-data to derive a new file
name.

Currently, there are two meta-data formats supported: ORG TVthek and
YouTube, both via http://rg3.github.io/youtube-dl/

: youtube-dl --write-info-json <URL>

This results, for example, with files like these:

#+BEGIN_VERSE
Durchbruch bei Brexit-Verhandlungen-14577219.info.json
Durchbruch bei Brexit-Verhandlungen-14577219.mp4
Isolierte Familie - 58-jähriger Österreicher in U-Haft-14577221.info.json
Isolierte Familie - 58-jähriger Österreicher in U-Haft-14577221.mp4
The Star7 PDA Prototype-Ahg8OBYixL0.info.json
The Star7 PDA Prototype-Ahg8OBYixL0.mp4
#+END_VERSE

Please notice the associated =mp4= files as well as the =info.json=
files.

Applying guess-filename on these files look like this:

#+BEGIN_EXAMPLE
vk@sherri ~tmp % guessfilename *mp4

   Durchbruch bei Brexit-Verhandlungen-14577219.mp4  ...
       →  2019-10-17T16.59.07 ORF - ZIB 17 00 - Durchbruch bei Brexit-Verhandlungen -- highquality.mp4

   Isolierte Familie - 58-jähriger Österreicher in U-Haft-14577221.mp4  ...
       →  2019-10-17T17.01.44 ORF - ZIB 17 00 - Isolierte Familie: 58-jähriger Österreicher in U-Haft -- highquality.mp4

   The Star7 PDA Prototype-Ahg8OBYixL0.mp4  ...
       →  2007-09-13 youtube - The Star7 PDA Prototype - Ahg8OBYixL0.mp4
#+END_EXAMPLE

The =info.json= files are not removed or renamed.

* Related tools and workflows

This tool is part of a tool-set which I use to manage my digital files
such as photographs. My work-flows are described in [[http://karl-voit.at/managing-digital-photographs/][this blog posting]]
you might like to read.

In short:

For *tagging*, please refer to [[https://github.com/novoid/filetags][filetags]] and its documentation.

See [[https://github.com/novoid/date2name][date2name]] for easily adding ISO *time-stamps or date-stamps* to
files.

For *easily naming and tagging* files within file browsers that allow
integration of external tools, see [[https://github.com/novoid/appendfilename][appendfilename]] (once more) and
[[https://github.com/novoid/filetags][filetags]].

Moving to the archive folders is done using [[https://github.com/novoid/move2archive][move2archive]].

Having tagged photographs gives you many advantages. For example, I
automatically [[https://github.com/novoid/set_desktop_background_according_to_season][choose my *desktop background image* according to the
current season]].

Files containing an ISO time/date-stamp gets indexed by the
filename-module of [[https://github.com/novoid/Memacs][Memacs]].

-------------

[[http://www.jonasjberg.com/][Jonas Sjöberg]] took my idea and developed the much more advanced (and
thus a bit more complicated) [[https://github.com/jonasjberg/autonameow][autonameow]]. It uses rule-based renaming,
analyzes content of plain text, epub, pdf and rtf files, extracts
meta-data from many different file formats via [[https://www.sno.phy.queensu.ca/%257Ephil/exiftool/][exiftool]] and so forth.

* Contribute!

I am looking for your ideas!

If you want to contribute to this cool project, please fork and
contribute!


* Local Variables                                                  :noexport:
# Local Variables:
# mode: auto-fill
# mode: flyspell
# eval: (ispell-change-dictionary "en_US")
# End: