mirror of
https://github.com/novoid/guess-filename.py.git
synced 2026-02-16 13:24:15 +00:00
207 lines
8.5 KiB
Org Mode
207 lines
8.5 KiB
Org Mode
* guessfilename.py
|
|
|
|
This Python script tries to come up with a new file name for each
|
|
file from command line argument.
|
|
|
|
It does this with several methods: first, the current file name is
|
|
analyzed and any [[https://en.wikipedia.org/wiki/Iso_date][ISO date/timestamp]] and [[https://github.com/novoid/filetags/][filetags]] are re-used.
|
|
Secondly, if the parsing of the file name did not lead to any new file
|
|
name, the content of the file is analyzed. Following file types are
|
|
supported by now:
|
|
- PDF files
|
|
|
|
The script accepts an arbitrary number of files (see your shell for
|
|
possible length limitations).
|
|
|
|
- *Target group*: users who are able to use command line tools and who
|
|
are using tags in file names.
|
|
- Hosted on github: https://github.com/novoid/guess-filename.py
|
|
|
|
** Why
|
|
|
|
I do scan almost all paper mail. Many of those documents are sent to
|
|
me regularily. Such documents are bills or insurance informations, for
|
|
example.
|
|
|
|
Being too lazy to name those files manually with high chances of
|
|
getting many variants for the same document type, I came up with a
|
|
method to derive file names from either the old file name (cues I
|
|
enter without knowing the exact target file name) or the file content.
|
|
|
|
Analyzing the content enables this script to recognize bills via
|
|
customer numbers or phone numbers, amounts to pay, and so on.
|
|
|
|
** Usage
|
|
|
|
#+BEGIN_SRC sh :results output :wrap src
|
|
guessfilename --help
|
|
#+END_SRC
|
|
|
|
#+BEGIN_src
|
|
Usage:
|
|
guessfilename [<options>] <list of files>
|
|
|
|
This little Python script tries to rename files according to pre-defined rules.
|
|
|
|
It does this with several methods: first, the current file name is analyzed and
|
|
any ISO date/timestamp and filetags are re-used. Secondly, if the parsing of the
|
|
file name did not lead to any new file name, the content of the file is analyzed.
|
|
|
|
You have to adapt the rules in the Python script to meet your requirements.
|
|
The default rule-set follows the filename convention described on
|
|
http://karl-voit.at/managing-digital-photographs/
|
|
|
|
|
|
:copyright: (c) by Karl Voit
|
|
:license: GPL v3 or any later version
|
|
:URL: https://github.com/novoid/guess-filename.py
|
|
:bugreports: via github or <tools@Karl-Voit.at>
|
|
|
|
|
|
Options:
|
|
-h, --help show this help message and exit
|
|
-d, --dryrun enable dryrun mode: just simulate what would happen, do not
|
|
modify files
|
|
-v, --verbose enable verbose mode
|
|
-q, --quiet enable quiet mode
|
|
--version display version and exit
|
|
#+END_src
|
|
|
|
** MediathekView
|
|
:PROPERTIES:
|
|
:CREATED: [2018-05-10 Thu 17:03]
|
|
:END:
|
|
|
|
When downloading TV shows using [[https://github.com/mediathekview/MediathekView][MediathekView]], you should use the following download pattern:
|
|
|
|
- MediathekView v11:
|
|
: %DT%d %s - %t - %T -ORIGINAL- %N.mp4
|
|
|
|
- MediathekView v13:
|
|
- Einstellungen > Aufzeichnen und Abspielen > Set bearbeiten
|
|
- [Set-Name] > Hilfsprogramme:
|
|
- ffmpeg > Zieldateiname > =%DT%d %s - %t - %T -ORIGINALhd- %N.mp4=
|
|
- ffmpeg > Schalter > =-user_agent "Mozilla" -i %f -c copy -bsf:a aac_adtstoasc **=
|
|
|
|
When applying =guessfilename= on the resulting files, you will get something like this:
|
|
|
|
#+BEGIN_EXAMPLE
|
|
20180509T235000 ORF - ZIB 24 - Auswirkungen nach US-Aus für Atomdeal -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Auswirkungen-na__13976363__o__1735069995__s14297628_8__BCK1HD_23514710P_23540405P_Q4A.mp4 ...
|
|
→ 2018-05-09T23.51.47 ORF - ZIB 24 - Auswirkungen nach US-Aus für Atomdeal -- lowquality.mp4
|
|
|
|
20180509T235000 ORF - ZIB 24 - Hirntoter Bub plötzlich aufgewacht -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Hirntoter-Bub-p__13976363__o__5119815115__s14297631_1__BCK1HD_00045915P_00072303P_Q4A.mp4 ...
|
|
→ 2018-05-09T00.04.59 ORF - ZIB 24 - Hirntoter Bub plötzlich aufgewacht -- lowquality.mp4
|
|
|
|
20180509T235000 ORF - ZIB 24 - Meldungen -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Meldungen__13976363__o__1117657593__s14297632_2__BCK1HD_00072303P_00085816P_Q4A.mp4 ...
|
|
→ 2018-05-09T00.07.23 ORF - ZIB 24 - Meldungen -- lowquality.mp4
|
|
|
|
20180509T235000 ORF - ZIB 24 - Neuerung bei Filmfestspielen in Cannes -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Neuerung-bei-Fi__13976363__o__1941003027__s14297634_4__BCK1HD_00085816P_00111715P_Q4A.mp4 ...
|
|
→ 2018-05-09T00.08.58 ORF - ZIB 24 - Neuerung bei Filmfestspielen in Cannes -- lowquality.mp4
|
|
|
|
20180509T235000 ORF - ZIB 24 - Trumps CIA-Kandidatin umstritten -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Trumps-Kandidat__13976363__o__1488806017__s14297630_0__BCK1HD_00020922P_00045915P_Q4A.mp4 ...
|
|
→ 2018-05-09T00.02.09 ORF - ZIB 24 - Trumps CIA-Kandidatin umstritten -- lowquality.mp4
|
|
|
|
20180509T235000 ORF - ZIB 24 - Wetter -ORIGINAL- 2018-05-09_2350_tl_01_ZIB-24_Wetter__13976363__o__2966973785__s14297635_5__BCK1HD_00111715P_00120000P_Q4A.mp4 ...
|
|
→ 2018-05-09T00.11.17 ORF - ZIB 24 - Wetter -- lowquality.mp4
|
|
#+END_EXAMPLE
|
|
|
|
As you can see, the temporal order of the chunks is extracted so that
|
|
the files are in their correct order.
|
|
|
|
Please note that this does not work with a show whose chunks do cross
|
|
midnight since the date is always taken from the start of the show and
|
|
the time from the actual time being shown.
|
|
|
|
** .info.json Meta-Data Files
|
|
:PROPERTIES:
|
|
:CREATED: [2019-10-19 Sat 15:21]
|
|
:END:
|
|
|
|
If you do download a media file and its associated separate
|
|
=.info.json= file (both base-names without file extension need to
|
|
match), this tool is able to parse the meta-data to derive a new file
|
|
name.
|
|
|
|
Currently, there are two meta-data formats supported: ORG TVthek and
|
|
YouTube, both via http://rg3.github.io/youtube-dl/
|
|
|
|
: youtube-dl --write-info-json <URL>
|
|
|
|
This results, for example, with files like these:
|
|
|
|
#+BEGIN_VERSE
|
|
Durchbruch bei Brexit-Verhandlungen-14577219.info.json
|
|
Durchbruch bei Brexit-Verhandlungen-14577219.mp4
|
|
Isolierte Familie - 58-jähriger Österreicher in U-Haft-14577221.info.json
|
|
Isolierte Familie - 58-jähriger Österreicher in U-Haft-14577221.mp4
|
|
The Star7 PDA Prototype-Ahg8OBYixL0.info.json
|
|
The Star7 PDA Prototype-Ahg8OBYixL0.mp4
|
|
#+END_VERSE
|
|
|
|
Please notice the associated =mp4= files as well as the =info.json=
|
|
files.
|
|
|
|
Applying guess-filename on these files look like this:
|
|
|
|
#+BEGIN_EXAMPLE
|
|
vk@sherri ~tmp % guessfilename *mp4
|
|
|
|
Durchbruch bei Brexit-Verhandlungen-14577219.mp4 ...
|
|
→ 2019-10-17T16.59.07 ORF - ZIB 17 00 - Durchbruch bei Brexit-Verhandlungen -- highquality.mp4
|
|
|
|
Isolierte Familie - 58-jähriger Österreicher in U-Haft-14577221.mp4 ...
|
|
→ 2019-10-17T17.01.44 ORF - ZIB 17 00 - Isolierte Familie: 58-jähriger Österreicher in U-Haft -- highquality.mp4
|
|
|
|
The Star7 PDA Prototype-Ahg8OBYixL0.mp4 ...
|
|
→ 2007-09-13 youtube - The Star7 PDA Prototype - Ahg8OBYixL0.mp4
|
|
#+END_EXAMPLE
|
|
|
|
The =info.json= files are not removed or renamed.
|
|
|
|
* Related tools and workflows
|
|
|
|
This tool is part of a tool-set which I use to manage my digital files
|
|
such as photographs. My work-flows are described in [[http://karl-voit.at/managing-digital-photographs/][this blog posting]]
|
|
you might like to read.
|
|
|
|
In short:
|
|
|
|
For *tagging*, please refer to [[https://github.com/novoid/filetags][filetags]] and its documentation.
|
|
|
|
See [[https://github.com/novoid/date2name][date2name]] for easily adding ISO *time-stamps or date-stamps* to
|
|
files.
|
|
|
|
For *easily naming and tagging* files within file browsers that allow
|
|
integration of external tools, see [[https://github.com/novoid/appendfilename][appendfilename]] (once more) and
|
|
[[https://github.com/novoid/filetags][filetags]].
|
|
|
|
Moving to the archive folders is done using [[https://github.com/novoid/move2archive][move2archive]].
|
|
|
|
Having tagged photographs gives you many advantages. For example, I
|
|
automatically [[https://github.com/novoid/set_desktop_background_according_to_season][choose my *desktop background image* according to the
|
|
current season]].
|
|
|
|
Files containing an ISO time/date-stamp gets indexed by the
|
|
filename-module of [[https://github.com/novoid/Memacs][Memacs]].
|
|
|
|
-------------
|
|
|
|
[[http://www.jonasjberg.com/][Jonas Sjöberg]] took my idea and developed the much more advanced (and
|
|
thus a bit more complicated) [[https://github.com/jonasjberg/autonameow][autonameow]]. It uses rule-based renaming,
|
|
analyzes content of plain text, epub, pdf and rtf files, extracts
|
|
meta-data from many different file formats via [[https://www.sno.phy.queensu.ca/%257Ephil/exiftool/][exiftool]] and so forth.
|
|
|
|
* Contribute!
|
|
|
|
I am looking for your ideas!
|
|
|
|
If you want to contribute to this cool project, please fork and
|
|
contribute!
|
|
|
|
|
|
* Local Variables :noexport:
|
|
# Local Variables:
|
|
# mode: auto-fill
|
|
# mode: flyspell
|
|
# eval: (ispell-change-dictionary "en_US")
|
|
# End:
|