Python Woes...

RedvGreen · 20 Mar 2019 at 16:39

I am not a Python user or expert at all, and have been provided this script to scrape some video content to allow me to work offline, so please forgive my lack of knowledge and understanding here...

When I run the code below, it flags up as 'Invalid Syntax' on each of the apostrophes, if I remove one, it flags up the next. If I add one, it still shows as being a problem.

For info, I am running the script from cmdline:

python.exe download.py uberRequest.json

Any ideas please?

Code:

#!/usr/bin/env python
import json
import sys
import requests
import os
import argparse
from collections import OrderedDict

# Help text and arguments
parser = argparse.ArgumentParser(description='Download videos using this script.')
parser.add_argument('-q', choices=['SD', 'HD'], default='HD', help='Download quality selector; will default to HD (720p).')
parser.add_argument('uberRequest', metavar='uberRequest.json', help='uberRequest.json file, see instructions in git repo.')
args = parser.parse_args()

# Convert input to number
video_quality = 1
if args.q == 'SD':
    video_quality = 0

# Read uber response from file
with open(args.uberRequest, 'r') as json_data:
    uber = json.load(json_data)

print 'Course ': + uber['course']['name']
if not os.path.exists(uber['course']['name']):
    os.makedirs(uber['course']['name'])

is_looping = True
section_counter = 1

# Iterate sections
for section in uber['course']['childNodes']:
    
    print 'Section: ' + section['name']
    section_path = uber['course']['name'] + '/'  + str(section_counter) + '. ' + section['name']

    if not os.path.exists(section_path):
        os.makedirs(section_path)
    
    lesson_counter = 1

    # Iterate lessons
    for lesson in section['childNodes'][0]['learningObjects']:
    
        print 'Downloading: ' + lesson['metadata']['name']
        lesson_path = section_path + '/' + str(lesson_counter) + '. ' + lesson['metadata']['name']

        if not os.path.exists(lesson_path):
            os.makedirs(lesson_path)

        # Grab base URL
        baseUrl = lesson['metadata']['baseUrl']
                
        # Prepare request to script.json to grab video URLs and set cookies
        cookies = OrderedDict()
        for cookie in reversed(lesson['metadata']['cookies']):
            cookies[cookie['key']] = cookie['value']

        # Get video list
        response = requests.get(baseUrl+'/script.json', cookies=cookies)
        
        video_counter = 1

        if response.status_code == 200:
            # Iterate videos
            for video in json.loads(response.content)['slides']:
                # Download with title
                download_response = requests.get(baseUrl+video['video'][video_quality]['URI'], cookies=cookies)
                open(lesson_path + '/' + str(video_counter) + '. ' + video['title'].replace("/","_") + '.mp4', 'wb').write(download_response.content)

                video_counter = video_counter + 1
        else:
            print 'Request for video list failed, your cookies have probably expired.'
            is_looping = False
            break

        lesson_counter = lesson_counter + 1

    if not is_looping:
        # Break out of outer loop in the event of expired cookies
        break

    section_counter = section_counter + 1

RedvGreen · 21 Mar 2019 at 09:02

Updated to the latest version of Python and ran it again:

Code:

C:\Python>python.exe download.py uberRequest.json
  File "download.py", line 24
    print 'Course: ' + uber['course']['name']
                   ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Course: ' + uber['course']['name'])?

C:\Python>

RedvGreen · 21 Mar 2019 at 09:29

shine said:
Did you try the above suggestion in the SyntaxError?

Just looked at one of my json examples and I do wrap everything in brackets.

After making the bracket changes and your requested change....

Code:

C:\Python>python.exe download.py uberRequest.json
  File "download.py", line 24
    print 'Course: ' + uber['course']['name']
                   ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print('Course: ' + uber['course']['name'])?

C:\Python>

RedvGreen · 21 Mar 2019 at 10:27

I don't know what you did that I didn't do, but it worked, so many thanks!! (SOMEWHAT...)...

Had to install the requests module...

Code:

python -m pip install requests

Which led onto the following naming issue...

Code:

C:\Python>python.exe download.py uberRequest.json
Course :MGT: Training Program (D01_01_HR_5400)
Traceback (most recent call last):
  File "download.py", line 28, in <module>
    os.makedirs(uber['course']['name'])
  File "C:\Python\lib\os.py", line 221, in makedirs
    mkdir(name, mode)
NotADirectoryError: [WinError 267] The directory name is invalid: 'MGT: Training Program (D01_01_HR_5400)'

C:\Python>

Is there a way to remove unauthorised characters from the name (namely the colon)? I would have thought the os.path2 module would have handled this?

RedvGreen · 21 Mar 2019 at 10:42

touch said:
try:

mkdir(name.replace(":", ""),mode)

Brilliant - I swapped that out in the os.py module and we are getting somewhere closer!!

Code:

C:\Python>python.exe download.py uberRequest.json
Course :MGT: Training Program (D01_01_HR_5400)
Section: Getting Started 
Downloading: Welcome to Your Course!
Traceback (most recent call last):
  File "download.py", line 72, in <module>
    'wb').write(download_response.content)
OSError: [Errno 22] Invalid argument: 'MGT: Training Program (D01_01_HR_5400)/1. Getting Started /1. Welcome to Your Course!/1. Welcome to Your Course!.mp4'

C:\Python>

RedvGreen · 21 Mar 2019 at 11:07

We're getting closer to victory

Code:

Traceback (most recent call last):
  File "download.py", line 72, in <module>
    'wb').write(download_response.content)
OSError: [Errno 22] Invalid argument: 
 'MGT: Training Program (D01_01_HR_5400)/1. Getting Started /1. Welcome to Your Course!/1. Welcome to Your Course!.mp4'

RedvGreen · 21 Mar 2019 at 11:26

Thanks Hades - I am not auth'd to supply the JSON file unfortunately, you can be rest assured that is a massive file though.

Inspecting the JSON file, it does declare the name variable within it - I could attempt to edit this to circumvent the character issue perhaps...

Current code...

Code:

#!/usr/bin/env python
import json
import sys
import requests
import os
import argparse
from collections import OrderedDict

# Help text and arguments
parser = argparse.ArgumentParser(description='Download videos using this script.')
parser.add_argument('-q', choices=['SD', 'HD'], default='HD',
                    help='Download quality selector; will default to HD (720p).')
parser.add_argument('uberRequest', metavar='uberRequest.json',
                    help='uberRequest.json file, see instructions in git repo.')
args = parser.parse_args()

# Convert input to number
video_quality = 1
if args.q == 'SD':
    video_quality = 0

# Read uber response from file
with open(args.uberRequest, 'r') as json_data:
    uber = json.load(json_data)

print('Course :' + uber['course']['name'])
if not os.path.exists(uber['course']['name']):
    os.makedirs(uber['course']['name'])

is_looping = True
section_counter = 1

# Iterate sections
for section in uber['course']['childNodes']:

    print ('Section: ' + section['name'])
    section_path = uber['course']['name'] + '/' + str(section_counter) + '. ' + section['name']

    if not os.path.exists(section_path):
        os.makedirs(section_path)

    lesson_counter = 1

    # Iterate lessons
    for lesson in section['childNodes'][0]['learningObjects']:

        print('Downloading: ' + lesson['metadata']['name'])
        lesson_path = section_path + '/' + str(lesson_counter) + '. ' + lesson['metadata']['name']

        if not os.path.exists(lesson_path):
            os.makedirs(lesson_path)

        # Grab base URL
        baseUrl = lesson['metadata']['baseUrl']

        # Prepare request to script.json to grab video URLs and set cookies
        cookies = OrderedDict()
        for cookie in reversed(lesson['metadata']['cookies']):
            cookies[cookie['key']] = cookie['value']

        # Get video list
        response = requests.get(baseUrl + '/script.json', cookies=cookies)

        video_counter = 1

        if response.status_code == 200:
            # Iterate videos
            for video in json.loads(response.content)['slides']:
                # Download with title
                download_response = requests.get(baseUrl + video['video'][video_quality]['URI'], cookies=cookies)
                open(lesson_path + '/' + str(video_counter) + '. ' + video['title'].replace("/", "_").replace(':', '') + '.mp4',
                     'wb').write(download_response.content)

                video_counter = video_counter + 1
        else:
            print('Request for video list failed, your cookies have probably expired.')
            is_looping = False
            break

        lesson_counter = lesson_counter + 1

    if not is_looping:
        # Break out of outer loop in the event of expired cookies
        break

    section_counter = section_counter + 1

RedvGreen · 21 Mar 2019 at 11:45

Removing the colon from the JSON file name parameter is allowing the script to run without issue ... absolutely facepalm here!

Thank you for all your help everyone