Looking for a few testers of my IMDB Scrapper

Soldato
Joined
12 Jun 2005
Posts
5,361
Hi there,

Jus' looking for a favor from a few people.

Simply extract and run the program. Put in the directory where all your films are (eg Z:\Movies) and click start. Only works with .avi and .mkv - for testing purposes.

All it will do, is get the information, it won't store it anywhere. All I need you to do, is check that its downloading the information and movie art ok. To do this, just select a film from the results list and the info/image (part of it - its big and its not resized) will appear on the right hand side. Due to the resolution of the images, there might be a delay when loading.

the main purpose of this it to see how it works against peoples filename conventions. So if it doesn't work at all for you, could you tell me your filename convention. Eventually, it will work for all filenames via regex.

Download Link: http://www.twobeds.com/upload/userfiles/uploader/Programs/IMDB Grabber.zip

Thanks.
 
Last edited:
Works For Movies Fine. e.g. Alien Vs Predator,

Doesn't like episode numbers though. e.g. Heroes - Season 1 - E1 (didn't expect it to),

Works with: Family Guy - (episode name),

Also the Artwork is kind of cut off, the image is too big for the box and wont fill out if I make the window bigger.
 
Works For Movies Fine. e.g. Alien Vs Predator,

Doesn't like episode numbers though. e.g. Heroes - Season 1 - E1 (didn't expect it to),

Works with: Family Guy - (episode name),

Also the Artwork is kind of cut off, the image is too big for the box and wont fill out if I make the window bigger.

It's actually only supposed to work with movies, i will be creating a separate one for TV series later.

Yeah the artwork will be cut off due to the fact that it quite high resolution and i haven't scaled it to fit into the box.

I'll see if I can give this a go this evening :D

Does it work for movies within folders etc?

Essentially, it only works for filenames at the moment (not folders), but I am working on an option for folders aswell. It does however look in subdirectories.

For instance, my folder setup is like this:

Movies Folder \ [Movie Name] \ [Movie Name] ([Movie Year]).[Extension]

=======

For anyone that cares what I am hoping to do with this is release it to other programmers as a .dll file so they can include it in their software if they want.
 
This is what I'm seeing:
1. Adding full-stops instead of spaces seems to be confusing things a tad
2. Part names are either not recognised or put to the wrong film -2010.mkv should be 2010- The Final Oddesy, not some random Australian thing! Needs a selection box or similar to deliminate. Also has that problem with remakes and films that share names.
3. I've got 6 films in a row recognised as The Incredible Hulk :confused: (Seems to be an issue with "The" in the file name)
4. Names with strings added on the end (720p, site names etc.) are pretty much failing.

FWIW, all my films are ripped/ fished up and dumped into a single folder, with no name changes or anything else really, never seen the need to sort them further than that.

If you can tweak the source to query http://www.fantasticfiction.co.uk (Audiobooks!), then I'd be extremely interested :) The files in this section are also much better organised too.
If nothing else, please provide a copy of the source. My coding isn't particularly good, but decent open source stuff comes along very rarely.

Cheers

-Leezer-
 
This is what I'm seeing:
1. Adding full-stops instead of spaces seems to be confusing things a tad

I already know of this issue and what's causing it.

2. Part names are either not recognised or put to the wrong film -2010.mkv should be 2010- The Final Oddesy, not some random Australian thing! Needs a selection box or similar to deliminate. Also has that problem with remakes and films that share names.

I am not sure what you mean by part names? At the moment it only works if the movie is in one file, so if you have a movie split up into parts, it won't work - the same with pretty much every imdb grabber under the sun.

For the remakes - yeah this will be an issue and is an issue in many imdb grabbers, thats why i strongly suggest people put the year of the film in the filename. At the moment it's just choosing the first choice, but it will be developed further to take into account more than one match.

3. I've got 6 films in a row recognised as The Incredible Hulk :confused: (Seems to be an issue with "The" in the file name)

Thats because it is only searching for the "The" bit and the first result is The Incredible Hulk. I am assuming that its because of the full-stops instead of spaces.

4. Names with strings added on the end (720p, site names etc.) are pretty much failing.

That shouldn't be happening tbh...but again i think thats to do with the full stops.

FWIW, all my films are ripped/ fished up and dumped into a single folder, with no name changes or anything else really, never seen the need to sort them further than that.

This is why most of them will be failing...if you have even used any media center programs, you'll know that it won't be able to grab the infromation successfully unless you change the names. The best way (that i know of), is to name them according to IMDB titles like "300 (2006).avi".

If you can tweak the source to query http://www.fantasticfiction.co.uk (Audiobooks!), then I'd be extremely interested :) The files in this section are also much better organised too.

That'll be a totally different project. I am current sorting this one out, then doing a scrapper for tv series, and eventually what i want to do is create a media center application that will work "outside of the box", by that, it will do fancy things that Vista Media Center does + more (getting imdb info for instance) and it will have codecs integrated so...it just works.

If nothing else, please provide a copy of the source. My coding isn't particularly good, but decent open source stuff comes along very rarely.

Code if very messy at the moment, but i will eventually clean it up and post it. It's written in c#.NET.


===========

What I think i am going to do is create a little subsection where people can use regular expressions to put in their naming convention so that it will work better with full stops etc....
 
True, I've tagged everything manually (120,000 files collection, mostly audiobooks/ OTR, quite a few TV series, couple of hundred films) for years, as there's never been a decent scrapeable database made for OTR or audiobooks.

I'm using J.River Media Center ( http://www.jrmediacenter.com ), and as such, everything is stored in my database, not in file names or tags, which is a lot of the reason why this is falling over, so not your fault :)

How about a ratings algorithm for finding remakes/ less popular stuff?
I'm not sure how you're scraping at the minute (Need source ;) ), but it ought to be possible to read the number of votes cast on the IMDB rating. Then grab the film with the highest number of votes cast?


Other bits & bobs- You've got the data (In most cases at least!), how about an option to rename files from it?

Keep up the good work :)

-Leezer-
 
True, I've tagged everything manually (120,000 files collection, mostly audiobooks/ OTR, quite a few TV series, couple of hundred films) for years, as there's never been a decent scrapeable database made for OTR or audiobooks.

I assume your audiobooks are mp3s? Not sure what OTR are.....also how can you tag videos? or do you mean just rename them?

I'm using J.River Media Center ( http://www.jrmediacenter.com ), and as such, everything is stored in my database, not in file names or tags, which is a lot of the reason why this is falling over, so not your fault :)

Fair nuff.

How about a ratings algorithm for finding remakes/ less popular stuff?
I'm not sure how you're scraping at the minute (Need source ;) ), but it ought to be possible to read the number of votes cast on the IMDB rating. Then grab the film with the highest number of votes cast?

Well currently, it's using google so it should be showing the most popular one at the top. Doing it the way you suggested would mean that it would take ages to get the info for one film.


======

Source code: http://www.twobeds.com/upload/userfiles/uploader/Programs/IMDB Grabber.cs
 
I assume your audiobooks are mp3s? Not sure what OTR are.....also how can you tag videos? or do you mean just rename them?

Yep, all audiobooks are either MP3 or m4a (Audible).
OTR= Old American radio. With some of it, the legalities are questionable outside the USA (Radio pre 1970 odd had a default copyright period of 5 years in the USA, it was then changed to bring it inline with other bits and peices; The debate is whether to apply the American copyright or EU/ other)

FWIW (This shouldn't sound like an ad too much, I hope), but Media Center provides a file centric database, complete with any user fields you care to create. All info is stored in the database, and some can be written to tags.


==================

Source code wise, I'm thinking on the matter, these are my current thoughts-
First run the search as you are already, but add a small routine to read the number of results. Then use a simple if statement for the number of results. If more than X films are found in the search, then retrieve the names of the top ?ten?, before either scraping the rating, or presenting a dialog to choose the appropriate film.
Otherwise, if you retrieve the text only for the page this should cut down on load times when scraping ratings?

Cheers

-Leezer-
 
Back
Top Bottom