Commit graph

180 commits

Author SHA1 Message Date
Karl Voit
cb1f43bfd7 moving from hard coded RegEx index to named groups: finished but with outcommented old info 2020-03-01 14:15:09 +01:00
Karl Voit
ab16535991 moving from hard coded RegEx index to named groups (ongoing) 2020-02-29 23:40:17 +01:00
Karl Voit
9b88d4852b moving from hard coded RegEx index to named groups (ongoing) 2020-02-29 22:56:50 +01:00
Karl Voit
6d043c8d2e moving from hard coded RegEx index to named groups (ongoing) 2020-02-29 19:14:27 +01:00
Karl Voit
c9ffea1e64 moving from hard coded RegEx index to named groups (ongoing) 2020-02-29 17:15:19 +01:00
Karl Voit
2e531be5c2 test_get_datetime_string_from_named_groups() and test_get_datetime_description_extension_filename() 2020-02-29 17:14:01 +01:00
Karl Voit
209534a397 added patterns for smart recorder (Android) 2020-02-29 11:48:57 +01:00
Karl Voit
fc90b97c4e README: added link to fs-curator 2020-02-20 09:53:11 +01:00
Karl Voit
1299de9ae9 NEWSPAPER1 pattern
Die Presse
2019-12-04 10:49:48 +01:00
Karl Voit
5f6328e2c1 added signal attachment pattern 2019-11-23 16:12:05 +01:00
Karl Voit
cf4e7a171e "Could not read PDF content": warning->info 2019-11-23 16:11:42 +01:00
Karl Voit
af90a2a2d2 README: fixed GitHub bug with VERSE environments 2019-10-19 16:02:44 +02:00
Karl Voit
2525a0dbf2 bugfixes for info.json + its support for ORF TVthek 2019-10-19 15:27:36 +02:00
Karl Voit
21a505eee3 added derive_new_filename_from_json_metadata handling for YouTube-dl 2019-10-19 14:10:45 +02:00
Karl Voit
0dbdc168ca re-ordered function definitions 2019-10-19 12:53:09 +02:00
Karl Voit
207728809d fixed issue with manually entered URL parsing 2019-10-19 12:15:24 +02:00
Karl Voit
49b1b6aba1 git ignore virtualenv 2019-10-19 11:28:41 +02:00
Karl Voit
9a0499b90f added PDF file patterns for Boox Max 2 exports 2019-10-10 13:44:30 +02:00
Karl Voit
9c7ff7f86a removed pytest call shell script and added misc things to gitignore 2019-10-10 13:09:09 +02:00
Karl Voit
a5b9d45865 Appended ORF MediathekView pattern variant with _sd_ 2019-09-30 10:10:48 +02:00
Karl Voit
c566b1d9e6 tests: added test_film_url_regex 2019-09-30 10:10:29 +02:00
Karl Voit
aaff6f253f adapted changed FILM_URL_REGEX; improved debugging and help texts 2019-09-21 10:35:41 +02:00
Karl Voit
5fc36d3e69 updated MEDIATHEKVIEW_LONG_WITH_DETAILED_TIMESTAMPS_REGEX
which now may also contain characters (not just digits) in some parts I
don't understand yet.
2019-09-03 14:23:51 +02:00
Karl Voit
e86f33a98f disabled size plausibility unit tests
because feature was disables
2019-09-03 14:23:05 +02:00
Karl Voit
1029d17146 added minimum duration check for plausibility check 2019-08-26 10:48:45 +02:00
Karl Voit
530d945ce1 added workaround for salary PDF files
PyPDF2 doesn't support new PDF encryption id:2019-05-24-guessfilename-salary
2019-05-24 17:32:53 +02:00
Karl Voit
1c65c523eb added handling for oekostrom bills 2019-05-05 17:16:47 +02:00
Karl Voit
40a010f6a6 added support for Android Bokeh photographs to IMG_INDEXGROUPS 2019-03-10 12:18:26 +01:00
Karl Voit
7c411ba4e6 adding pattern for MediathekView v13 2018-11-14 10:56:36 +01:00
Karl Voit
f37855945d fix for previous pattern 2018-11-01 22:20:18 +01:00
Karl Voit
fabfc6d29a added ORF Mediathek pattern when original filename is missing 2018-11-01 11:27:25 +01:00
Karl Voit
9650e813c3 updated download URL format 2018-07-06 08:32:14 +02:00
Karl Voit
0ee2ebf32c also accept http URLs (instead of https only) 2018-06-16 11:42:33 +02:00
Karl Voit
09bcc1acb5 added MEDIATHEKVIEW_RAW_REGEX_STRING
for raw ORF MediathekView downloads as a fall-back when wget/curl
download has to replace malfunctioning MediathekView
2018-06-15 21:12:00 +02:00
Karl Voit
085cbe156e fixed issue with ORF MediathekView chunk that spans over midnight 2018-06-10 22:43:47 +02:00
Karl Voit
890e70785f added plausibility size checks for ORF 2018-06-09 18:08:36 +02:00
Karl Voit
f079077dc7 added MEDIATHEKVIEW_LONG_WITHOUT_DETAILED_TIMESTAMPS_REGEX
for ORF chunks without detailed time-stamps but with quality indicators
2018-06-09 16:02:46 +02:00
Karl Voit
9a9fac31d5 added MEDIATHEKVIEW_SHORT_REGEX and manual/interactive fall-back handling for truncated ORF file names 2018-06-09 14:56:20 +02:00
Karl Voit
83f62df07d Salary PDF file content extraction for salary net amount and month 2018-05-31 12:12:02 +02:00
Karl Voit
51c06d475a fix for the ORF/Mediathek pattern 2018-05-31 12:11:46 +02:00
Karl Voit
d3a42aa6ff fixed MEDIATHEKVIEW_LONG_REGEX so that chunk timestamps are optional 2018-05-21 13:57:59 +02:00
Karl Voit
c111ae0391 fixed issue with wrong MediathekView match element 2018-05-12 09:46:25 +02:00
Karl Voit
a50720e74f generalized MediathekView changes for arbitrary stations 2018-05-10 17:12:16 +02:00
Karl Voit
a15533ebc1 added new support for MediathekView/ORF 2018-05-10 17:07:37 +02:00
Karl Voit
f738a5b304 added EASY_SCREENSHOT regex 2018-05-05 16:36:41 +02:00
Karl Voit
ddc4e08584 added plausibility check for Tatort downloads 2018-04-22 19:16:04 +02:00
Karl Voit
51008c8ad5 fixed issue with Android screenshot and added JPEG to it 2018-04-01 14:26:12 +02:00
Karl Voit
428ed2683e fixed issue with Android screenshots 2018-04-01 14:19:00 +02:00
Karl Voit
ac6ddf95f8 added pattern for saved Singal jpeg attachments 2018-04-01 14:17:23 +02:00
Karl Voit
9f3acb3cf9 added MODET_REGEX 2018-03-27 20:15:41 +02:00