Python Help

Soldato
Joined
3 Aug 2012
Posts
2,650
Location
Second Star to the Right
In an effort to try and learn Python, and make something useful for me, I'm trying to create a project that semi automates checking for adverts in TV programs I record on Tvheadend (yes I know Tvh has a comskip plugin, but where's the fun in that! ).


I've got the basic program working so it checks the recording folder, looks to see if the recording has already been processed (gets added to a text file when it has been), then runs comskip if it's a new program.


I'm trying to create two functions, that I'm having problems with:

Checking to see if the recordings, that have already been processed and added to the text file, still exist on disk. If not, I want to remove the line from the text file
Checking to see if a recording is currently active so I can ignore it until it's finished


The relevant code bits I've got so far (which don't work):

Python:
import os
import fnmatch
import subprocess
from time import sleep

RECORDINGS = "PATH/TO/RECORDINGS"
VIDEOS = "OUTPUT FOLDER"
COMSKIP = "comskip"
HD_PROGRAMS = " HD "
ARG1 = "--ts"
ARG2 = "--quiet"
ARG3 = "--vdpau"
ARG4 = "--ini=comskip.ini"
ARG5 = "--output=OUTPUT FOLDER"
PROCESSED_FILES = "processed.txt"


def check_processed():
    """Checks to see if we have already processed the recording and added it to processed.txt"""
    try:
        with open(PROCESSED_FILES, "r") as processed_files:
            file_check = processed_files.readlines()
    except FileNotFoundError:
        with open(PROCESSED_FILES, "w+") as processed_files:
            file_check = processed_files.readlines()
    else:
        for entry in file_check:
            if str(file) in entry:
                return True  # The string is found
        return False  # The string does not exist in the file


def processed_exist_check():
    file = [f for f in os.listdir(RECORDINGS) if fnmatch.fnmatch(f, '*.ts')]
    for f in file:
        with open(PROCESSED_FILES, "r") as pf:
            lines = pf.readlines()
            for line in lines:
                if line not in f:
                    print(f"Can't find {line}")
                else:
                    print(f"Found {line}")

# This function keeps failing saying it can't find the files.


def delete_extra_files():
    """Deletes all the extra files comskip creates when scanning for adverts"""
    filename = os.path.splitext(file)[0]
    os.remove(f"{VIDEOS}{filename}.txt")
    os.remove(f"{VIDEOS}{filename}.edl")
    os.remove(f"{VIDEOS}{filename}.log")


def file_size_check():
    for video in os.listdir(RECORDINGS):
        if fnmatch.fnmatch(video, '*.ts'):
            size_of_file = [
                (video, os.stat(os.path.join(RECORDINGS, video)).st_size)
            ]
            # This just converts the file into MB, and is unnecessary for this project, but nice to know
            # print(size_of_file)
            # for f, s in size_of_file:
            #     print("{} : {}MB".format(f, round(s / (1024 * 1024), 3)))
            return size_of_file[0][1]  # This doesn't work properly, with or without the [0][1]
        else:
            pass


def is_recording():
    # Doesn't work
    print("Checking file size")
    first_check = file_size_check()
    print(first_check)
    sleep(5)
    print("Rechecking file size to ensure it's not still recording")
    second_check = file_size_check()
    print(second_check)
    if second_check > first_check:
        return True
    else:
        return False

# This function only ever seems to return the file size for the same file over and over so never skips files that are actively recording


# Get the video filenames and see if we've already processed them

processed_exist_check()

for file in os.listdir(RECORDINGS):
    if fnmatch.fnmatch(file, '*.ts'):
        if check_processed():
            pass
        elif is_recording():
            pass
        else:
            print(f"Processing {file}")
            result = subprocess.run([COMSKIP, ARG1, ARG2, ARG3, ARG4, ARG5, f"{RECORDINGS}{file}"])
            with open(PROCESSED_FILES, "a") as completed:
                completed.write(f"{file}\n")
                delete_extra_files()



Any ideas? I've spent the last day searching online and trying different ways, but it's clear I'm doing something wrong as neither of these functions work properly.
 
Last edited:
Ok, I see what you're doing.
It would probably be better to check each individual file's size as you come to process it. Otherwise, as you have it at the moment, if any of the files are still being recorded that will stop any other files from being processed - I guess you don't want that but maybe there's a reason to do it.

So I would do something like:

Code:
def file_size_check(video):
    return os.stat(os.path.join(RECORDINGS, video)).st_size #maybe this should be something like video.filename instead of video ?

def is_recording(video):
    first_check = file_size_check(video)
    sleep(5)
    second_check = file_size_check(video)
    if second_check > first_check:
        return True
    else:
        return False

then in the last section where you check IsRecording, pass the file to it:

for file in os.listdir(RECORDINGS):
if fnmatch.fnmatch(file, '*.ts'):
if check_processed():
continue #changed from pass - you want to skip to the next file in the loop if it has already been processed. pass will still execute the rest of the code inside the loop first
elif is_recording(file):
continue #if the file is being recorded, skip to the next loop by using continue.

***I don't know Python so the syntax might need a bit of editing to get it to work
 
Last edited:
Thanks for that. The file size check is now working with:

Python:
def file_size_check(video):
    return os.stat(os.path.join(RECORDINGS, video)).st_size

I still can't get the processed_exist_check() to identify files that are no longer on disk and remove them from the text file as it always says it can't find the file.
 
I still can't get the processed_exist_check() to identify files that are no longer on disk and remove them from the text file as it always says it can't find the file.

The processed_exist_check gets a list of all the files in the directory and then checks them against the contents of the text file. So if a file doesn't exist on disk, it wont be in the list and won't get checked against the text file.
You'll need 2 loops (probably best in 2 separate methods), 1 to loop through all the files on disk and compare them against the text file, 1 to loop through all the file names in the text file and check if they still exist on disk.
 
I've been playing around and have got a little further:

Python:
def processed_exist_check():
    file = [f for f in os.listdir(RECORDINGS) if fnmatch.fnmatch(f, '*.ts')]
    on_disk = ""
    with open(PROCESSED_FILES, "r") as pf:
        lines = pf.readlines()
        on_disk = file
        print(on_disk)
        for line in lines:
            no_newline = line.strip("\n")
            if no_newline in on_disk:
                print(f"FOUND {no_newline}")
            else:
                print(f"{no_newline} not found")

This seems to accurately report the correct results, but when I try and remove the line from the text file with:

Python:
def processed_exist_check():
    file = [f for f in os.listdir(RECORDINGS) if fnmatch.fnmatch(f, '*.ts')]
    on_disk = ""
    with open(PROCESSED_FILES, "r") as pf:
        lines = pf.readlines()
        on_disk = file
        print(on_disk)
        for line in lines:
            no_newline = line.strip("\n")
            if no_newline in on_disk:
                continue
            else:
                with open(PROCESSED_FILES, "w") as pf:
                    pf.write(line)

It ends up removing everything except the last entry. I think I need another for loop in the else: section but my attempts there don't seem to be working either as I end up with just the entire contents rewritten multiple times.
 
More minor edits:

Python:
def processed_exist_check():
    file = [f for f in os.listdir(RECORDINGS) if fnmatch.fnmatch(f, '*.ts')]
    on_disk = ""
    with open(PROCESSED_FILES, "r") as pf:
        lines = pf.readlines()
        on_disk = file
        # print(on_disk)
        for line in lines:
            if line.strip("\n") in on_disk:
                continue
            else:
                with open(PROCESSED_FILES, "w") as pf:
                    if line != on_disk:
                        print(f"deleting {line}")
                        pf.write(line)

The printout suggests it's deleting the correct lines, but again all it does is delete everything except the final line in the text file.
 
I think this section is the problem:

Code:
with open(PROCESSED_FILES, "w") as pf:
                    if line != on_disk:
                        print(f"deleting {line}")
                        pf.write(line)

That's going to overwrite the entire file content with the single line, I think? (and then do it each time for each line)
 
Thanks again. I'm really struggling with this bit. I've tried making two methods to compare, but I'm clearly not getting the syntax right when trying to call it later to get it to actually work:

Python:
def on_disk():
    for vid in os.listdir(RECORDINGS):
        if fnmatch.fnmatch(vid, "*.ts"):
            return vid


def in_file():
    with open(PROCESSED_FILES, "r") as pf:
        lines = pf.readlines()
        for line in lines:
            return line

I also started looking at trying to enumerate the lines, and can correctly print out a list of the lines where the file still exists on disk, but once again can't get it to actually delete the lines that no longer exist:

Python:
def processed_exist_check():
    vid = [f for f in os.listdir(RECORDINGS) if fnmatch.fnmatch(f, '*.ts')]
    on_disk = vid
    with open(PROCESSED_FILES, "r") as pf:
        lines = pf.readlines()
        for number, line in enumerate(lines):
            proc_line = line.strip("\n")
            if proc_line in on_disk:
                print(number, proc_line)

I'm sure it shouldn't be this difficult but it's currently baffling me, though that's not necessarily saying much :)
 
Last edited:
A 'return' statement will only execute once. After a method has returned something it has done it's job and won't continue running.
Looking at the on_disk() method, you have a return statement inside the loop. That means that the first time running through the loop with a filename which matches the "*.ts" check, it'll return that single file then stop executing regardless of how many more filenames are waiting to run through the loop.

A quick change to your code to get it to work would be to create a variable at the start to store all the filenames then add them to that inside the loop then return the whole thing at the end:
Code:
def on_disk():
    #create a new list
    filenames = []
   
    for vid in os.listdir(RECORDINGS):
        if fnmatch.fnmatch(vid, "*.ts"):
            #if it matches, add this to the list
            filenames.append(vid)
           
    #return the whole list
    return filenames

However, I think it would be better to check the individual files when required, like this:

Code:
def on_disk(filename):
    for vid in os.listdir(RECORDINGS):
        if vid == filename:
            #if there's a match, the file exists so return true
            return true
    #now we've been through the full loop and the return statement above hasnt been hit (because otherwise execution would have stopped and this code wouldn't be running) so the file must not exist.
    return false


def in_file(filename):
    with open(PROCESSED_FILES, "r") as pf:
        lines = pf.readlines()
        for line in lines:
            if line == filename:
                #filename exists in text document
                return true
               
        #loop completed without finding a match, filename must not exist in document
        return false


def processed_exist_check():
    #first 2 lines not needed - commented them out
    #vid = [f for f in os.listdir(RECORDINGS) if fnmatch.fnmatch(f, '*.ts')]
    #on_disk = vid
   
    with open(PROCESSED_FILES, "r") as pf:
        lines = pf.readlines()
        for number, line in enumerate(lines):
            proc_line = line.strip("\n")
            #pass the filename into the on_disk method to check if file exists
            if on_disk(proc_line):
                print(number, proc_line)
 
I still couldn't get the code to work yesterday when I tried but, with some of your modifications, along with a spark of inspiration in the early hours of this morning, I think I may now have fixed it:

Python:
def processed_exist_check():
    """Check to see if the processed files still exist on the disk; delete them from the list if not"""
    with open(PROCESSED_FILES, "r") as pf:
        lines = pf.readlines()
        with open(PROCESSED_FILES, "w") as removing:
            for number, line in enumerate(lines):
                proc_line = line.strip("\n")
                if on_disk(proc_line):
                    temp_num = number
                    if number in [temp_num]:
                        removing.write(line)

I've tested this a few times, and it does now appear to work rewriting the text file with only files that have been processed and still exist on disk.

Thanks for all the help.

Now to add a few more bells & whistles. :)
 
Back
Top Bottom