Python Woes...

RedvGreen · 20 Mar 2019 at 16:39

I am not a Python user or expert at all, and have been provided this script to scrape some video content to allow me to work offline, so please forgive my lack of knowledge and understanding here...

When I run the code below, it flags up as 'Invalid Syntax' on each of the apostrophes, if I remove one, it flags up the next. If I add one, it still shows as being a problem.

For info, I am running the script from cmdline:

python.exe download.py uberRequest.json

Any ideas please?

Code:

#!/usr/bin/env python
import json
import sys
import requests
import os
import argparse
from collections import OrderedDict

# Help text and arguments
parser = argparse.ArgumentParser(description='Download videos using this script.')
parser.add_argument('-q', choices=['SD', 'HD'], default='HD', help='Download quality selector; will default to HD (720p).')
parser.add_argument('uberRequest', metavar='uberRequest.json', help='uberRequest.json file, see instructions in git repo.')
args = parser.parse_args()

# Convert input to number
video_quality = 1
if args.q == 'SD':
    video_quality = 0

# Read uber response from file
with open(args.uberRequest, 'r') as json_data:
    uber = json.load(json_data)

print 'Course ': + uber['course']['name']
if not os.path.exists(uber['course']['name']):
    os.makedirs(uber['course']['name'])

is_looping = True
section_counter = 1

# Iterate sections
for section in uber['course']['childNodes']:
    
    print 'Section: ' + section['name']
    section_path = uber['course']['name'] + '/'  + str(section_counter) + '. ' + section['name']

    if not os.path.exists(section_path):
        os.makedirs(section_path)
    
    lesson_counter = 1

    # Iterate lessons
    for lesson in section['childNodes'][0]['learningObjects']:
    
        print 'Downloading: ' + lesson['metadata']['name']
        lesson_path = section_path + '/' + str(lesson_counter) + '. ' + lesson['metadata']['name']

        if not os.path.exists(lesson_path):
            os.makedirs(lesson_path)

        # Grab base URL
        baseUrl = lesson['metadata']['baseUrl']
                
        # Prepare request to script.json to grab video URLs and set cookies
        cookies = OrderedDict()
        for cookie in reversed(lesson['metadata']['cookies']):
            cookies[cookie['key']] = cookie['value']

        # Get video list
        response = requests.get(baseUrl+'/script.json', cookies=cookies)
        
        video_counter = 1

        if response.status_code == 200:
            # Iterate videos
            for video in json.loads(response.content)['slides']:
                # Download with title
                download_response = requests.get(baseUrl+video['video'][video_quality]['URI'], cookies=cookies)
                open(lesson_path + '/' + str(video_counter) + '. ' + video['title'].replace("/","_") + '.mp4', 'wb').write(download_response.content)

                video_counter = video_counter + 1
        else:
            print 'Request for video list failed, your cookies have probably expired.'
            is_looping = False
            break

        lesson_counter = lesson_counter + 1

    if not is_looping:
        # Break out of outer loop in the event of expired cookies
        break

    section_counter = section_counter + 1

Rroff · 20 Mar 2019 at 16:43

"for video in json.loads(response.content)['slides']:"

Are the brackets in the right place? it is awhile since I touched Python so not 100% sure on the syntax.

EDIT: Probably just looks weird to me as I don't work with dynamically typed languages much.

shine · 20 Mar 2019 at 22:22

This line is the problem:

Code:

print 'Course ': + uber['course']['name']

it should be:

Code:

print 'Course :' + uber['course']['name']

RedvGreen · 21 Mar 2019 at 09:02

Updated to the latest version of Python and ran it again:

Code:

C:\Python>python.exe download.py uberRequest.json
  File "download.py", line 24
    print 'Course: ' + uber['course']['name']
                   ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Course: ' + uber['course']['name'])?

C:\Python>

shine · 21 Mar 2019 at 09:13

Did you try the above suggestion in the SyntaxError?

Just looked at one of my json examples and I do wrap everything in brackets.

RedvGreen · 21 Mar 2019 at 09:29

shine said:
Did you try the above suggestion in the SyntaxError?

Just looked at one of my json examples and I do wrap everything in brackets.

After making the bracket changes and your requested change....

Code:

C:\Python>python.exe download.py uberRequest.json
  File "download.py", line 24
    print 'Course: ' + uber['course']['name']
                   ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Course: ' + uber['course']['name'])?

C:\Python>

Hades · 21 Mar 2019 at 09:37

One of the changes from the older Python 2.x to the newer 3.x is that the print statement changed.

Old:
Print 'Hello world!'

New:
print('Hello world!)

Note the addition of the parenthesis.

https://docs.python.org/3/whatsnew/3.0.html

EDIT: I would also highly recommend trying Pycharm as your code editor (or another similar IDE). There is a free community edition. It highlights syntax issues as you type your code out rather than just at run time.

Also, as pointed out above, you have a statement with a colon in it which is outside of the quote character. Apart from changing the print statement to use parenthesis on the later version of python you also need to change that too.

Change this:
print 'Course ': + uber['course']['name']

To this:
print('Course: ' + uber['course']['name'])
And also change all of the other print statements to use parenthesis if using python 3.x

I've changed it for you. Try this:

Code:

#!/usr/bin/env python
import json
import sys
import requests
import os
import argparse
from collections import OrderedDict

# Help text and arguments
parser = argparse.ArgumentParser(description='Download videos using this script.')
parser.add_argument('-q', choices=['SD', 'HD'], default='HD',
                    help='Download quality selector; will default to HD (720p).')
parser.add_argument('uberRequest', metavar='uberRequest.json',
                    help='uberRequest.json file, see instructions in git repo.')
args = parser.parse_args()

# Convert input to number
video_quality = 1
if args.q == 'SD':
    video_quality = 0

# Read uber response from file
with open(args.uberRequest, 'r') as json_data:
    uber = json.load(json_data)

print('Course :' + uber['course']['name'])
if not os.path.exists(uber['course']['name']):
    os.makedirs(uber['course']['name'])

is_looping = True
section_counter = 1

# Iterate sections
for section in uber['course']['childNodes']:

    print ('Section: ' + section['name'])
    section_path = uber['course']['name'] + '/' + str(section_counter) + '. ' + section['name']

    if not os.path.exists(section_path):
        os.makedirs(section_path)

    lesson_counter = 1

    # Iterate lessons
    for lesson in section['childNodes'][0]['learningObjects']:

        print('Downloading: ' + lesson['metadata']['name'])
        lesson_path = section_path + '/' + str(lesson_counter) + '. ' + lesson['metadata']['name']

        if not os.path.exists(lesson_path):
            os.makedirs(lesson_path)

        # Grab base URL
        baseUrl = lesson['metadata']['baseUrl']

        # Prepare request to script.json to grab video URLs and set cookies
        cookies = OrderedDict()
        for cookie in reversed(lesson['metadata']['cookies']):
            cookies[cookie['key']] = cookie['value']

        # Get video list
        response = requests.get(baseUrl + '/script.json', cookies=cookies)

        video_counter = 1

        if response.status_code == 200:
            # Iterate videos
            for video in json.loads(response.content)['slides']:
                # Download with title
                download_response = requests.get(baseUrl + video['video'][video_quality]['URI'], cookies=cookies)
                open(lesson_path + '/' + str(video_counter) + '. ' + video['title'].replace("/", "_") + '.mp4',
                     'wb').write(download_response.content)

                video_counter = video_counter + 1
        else:
            print('Request for video list failed, your cookies have probably expired.')
            is_looping = False
            break

        lesson_counter = lesson_counter + 1

    if not is_looping:
        # Break out of outer loop in the event of expired cookies
        break

    section_counter = section_counter + 1

RedvGreen · 21 Mar 2019 at 10:27

I don't know what you did that I didn't do, but it worked, so many thanks!! (SOMEWHAT...)...

Had to install the requests module...

Code:

python -m pip install requests

Which led onto the following naming issue...

Code:

C:\Python>python.exe download.py uberRequest.json
Course :MGT: Training Program (D01_01_HR_5400)
Traceback (most recent call last):
  File "download.py", line 28, in <module>
    os.makedirs(uber['course']['name'])
  File "C:\Python\lib\os.py", line 221, in makedirs
    mkdir(name, mode)
NotADirectoryError: [WinError 267] The directory name is invalid: 'MGT: Training Program (D01_01_HR_5400)'

C:\Python>

Is there a way to remove unauthorised characters from the name (namely the colon)? I would have thought the os.path2 module would have handled this?

touch · 21 Mar 2019 at 10:37

try:

mkdir(name.replace(":", ""),mode)

RedvGreen · 21 Mar 2019 at 10:42

touch said:
try:

mkdir(name.replace(":", ""),mode)

Brilliant - I swapped that out in the os.py module and we are getting somewhere closer!!

Code:

C:\Python>python.exe download.py uberRequest.json
Course :MGT: Training Program (D01_01_HR_5400)
Section: Getting Started 
Downloading: Welcome to Your Course!
Traceback (most recent call last):
  File "download.py", line 72, in <module>
    'wb').write(download_response.content)
OSError: [Errno 22] Invalid argument: 'MGT: Training Program (D01_01_HR_5400)/1. Getting Started /1. Welcome to Your Course!/1. Welcome to Your Course!.mp4'

C:\Python>

Hades · 21 Mar 2019 at 10:55

I think it's a similar problem. Looking at line 72 it'a actually a continuation of line 71. This is the code fragment:

Code:

open(lesson_path + '/' + str(video_counter) + '. ' + video['title'].replace("/", "_") + '.mp4',
                     'wb').write(download_response.content)

I think it's trying to open a file in the directory you created earlier in the script (the one you just corrected with the colon). So you'll probably need to replace the colon here too. Try this and see what happens:

Code:

open(lesson_path + '/' + str(video_counter) + '. ' + video['title'].replace("/", "_").replace(':', '') + '.mp4',
                     'wb').write(download_response.content)

RedvGreen · 21 Mar 2019 at 11:07

We're getting closer to victory

Code:

Traceback (most recent call last):
  File "download.py", line 72, in <module>
    'wb').write(download_response.content)
OSError: [Errno 22] Invalid argument: 
 'MGT: Training Program (D01_01_HR_5400)/1. Getting Started /1. Welcome to Your Course!/1. Welcome to Your Course!.mp4'

Hades · 21 Mar 2019 at 11:20

That's the same error. Either you haven't applied the change I suggested or the change I suggested it not working, or changing the wrong thing, etc. Unfortunately I'm not able to run the code for two reasons:

1) I don't have your JSON file.

2) I am on Linux instead of Windows. So the directory and file name validation won't be the same on my OS (note that the error is an OS error; it's Windows which is telling Python that something is wrong).

That part of the code seems to be trying to write a new file to the newly created directory, and Windows is telling it that it failed. That could be due to the colon or other invalid characters. Or it could be due to the directory not successfully being created further up the code. Did the directory actually get created?

I'm going to struggle to help much more without being on Windows. But if you want to email me your uberRequests.json file I can try it in Linux. I'm still relatively new to Python and it might take me a couple of days to find time. But it shouldn't be that hard. However you probably shouldn't be emailing random strangers files from your company so that may not be an option.

But can you also post the current code here again as there have been a few changes to it now.

RedvGreen · 21 Mar 2019 at 11:26

Thanks Hades - I am not auth'd to supply the JSON file unfortunately, you can be rest assured that is a massive file though.

Inspecting the JSON file, it does declare the name variable within it - I could attempt to edit this to circumvent the character issue perhaps...

Current code...

Code:

#!/usr/bin/env python
import json
import sys
import requests
import os
import argparse
from collections import OrderedDict

# Help text and arguments
parser = argparse.ArgumentParser(description='Download videos using this script.')
parser.add_argument('-q', choices=['SD', 'HD'], default='HD',
                    help='Download quality selector; will default to HD (720p).')
parser.add_argument('uberRequest', metavar='uberRequest.json',
                    help='uberRequest.json file, see instructions in git repo.')
args = parser.parse_args()

# Convert input to number
video_quality = 1
if args.q == 'SD':
    video_quality = 0

# Read uber response from file
with open(args.uberRequest, 'r') as json_data:
    uber = json.load(json_data)

print('Course :' + uber['course']['name'])
if not os.path.exists(uber['course']['name']):
    os.makedirs(uber['course']['name'])

is_looping = True
section_counter = 1

# Iterate sections
for section in uber['course']['childNodes']:

    print ('Section: ' + section['name'])
    section_path = uber['course']['name'] + '/' + str(section_counter) + '. ' + section['name']

    if not os.path.exists(section_path):
        os.makedirs(section_path)

    lesson_counter = 1

    # Iterate lessons
    for lesson in section['childNodes'][0]['learningObjects']:

        print('Downloading: ' + lesson['metadata']['name'])
        lesson_path = section_path + '/' + str(lesson_counter) + '. ' + lesson['metadata']['name']

        if not os.path.exists(lesson_path):
            os.makedirs(lesson_path)

        # Grab base URL
        baseUrl = lesson['metadata']['baseUrl']

        # Prepare request to script.json to grab video URLs and set cookies
        cookies = OrderedDict()
        for cookie in reversed(lesson['metadata']['cookies']):
            cookies[cookie['key']] = cookie['value']

        # Get video list
        response = requests.get(baseUrl + '/script.json', cookies=cookies)

        video_counter = 1

        if response.status_code == 200:
            # Iterate videos
            for video in json.loads(response.content)['slides']:
                # Download with title
                download_response = requests.get(baseUrl + video['video'][video_quality]['URI'], cookies=cookies)
                open(lesson_path + '/' + str(video_counter) + '. ' + video['title'].replace("/", "_").replace(':', '') + '.mp4',
                     'wb').write(download_response.content)

                video_counter = video_counter + 1
        else:
            print('Request for video list failed, your cookies have probably expired.')
            is_looping = False
            break

        lesson_counter = lesson_counter + 1

    if not is_looping:
        # Break out of outer loop in the event of expired cookies
        break

    section_counter = section_counter + 1

RedvGreen · 21 Mar 2019 at 11:45

Removing the colon from the JSON file name parameter is allowing the script to run without issue ... absolutely facepalm here!

Thank you for all your help everyone

Hades · 21 Mar 2019 at 11:53

Great news