CSS crawl

Soldato
Joined
19 Jan 2005
Posts
2,722
Alright, does anyone know a relatively simple way of crawling a website looking for a specific CSS class?

Any time I search it seems to come up with a ton of different ways of optimising your CSS but none of finding where it occurs.

I've been using import.io to find lots of different things and it's ludicrously useful but I'm struggling with this one, I have all the URLs I want to search and basically I just want to find which ones have <div class="xxxx"> on.

Anyone know? Thanks
 
Associate
Joined
10 Nov 2013
Posts
1,808
Do you have the source code? If so you can just do a 'find in files' in your text editor of choice.

If not, you could probably knock up a quick PHP script or something which would get the contents of a list of URLs and perform a string search for each one?
 
Soldato
OP
Joined
19 Jan 2005
Posts
2,722
Do you have the source code? If so you can just do a 'find in files' in your text editor of choice.

If not, you could probably knock up a quick PHP script or something which would get the contents of a list of URLs and perform a string search for each one?

No I don't, it's a blog pretty much so there's about 900 articles that could have it in.

I don't know PHP so I can't do that. I had someone make me a script to do pretty much the same thing but it was hard coded to a specific piece of code and I can't change it and he's left now.

Do you know much about import.io? From what I've seen it does exactly what you've suggested, I just can't get it to drop the other stuff.

So I want it to search for <div class="xxxx"> but that always surrounds an image so when you 'train' the program it doesn't drop the image, so now the API call is looking for the div and the image.

I've put in another of examples, which it suggests to do but it hasn't stripped out the differences.
 
Associate
Joined
10 Nov 2013
Posts
1,808
No I've never used import.io.
I'm not 100% sure what you're trying to achieve - do you want to strip out all images that are within a div of a certain class?

I had someone make me a script to do pretty much the same thing but it was hard coded to a specific piece of code and I can't change it and he's left now.

Do you have this code to hand?
 
Soldato
OP
Joined
19 Jan 2005
Posts
2,722
No I've never used import.io.
I'm not 100% sure what you're trying to achieve - do you want to strip out all images that are within a div of a certain class?



Do you have this code to hand?

No I don't.

I'm trying to return a list of URLs where this particular CSS clas is used.
 
Associate
Joined
10 Nov 2013
Posts
1,808
OK, here's some php code that might do what you want. You'll need to supply your own URLs and class name, it should output the URLs that contain the class. Just wrote it now on my phone so apologies if it's not any good.

Code:
<?php 
$urls = array("http://www.example.com/", "http://www.another.com"); 

foreach($urls as $url) 
{
    $html = file_get_contents($url); 
    if (strpos($html, "class\"my-class\"") !== false) 
    { 
        echo $url . " \n";
    } 
    else 
    { 
        // not found 
    }
}
?>

Edit: that won't catch anything where there are multiple classes on the same element
 
Last edited:
Back
Top Bottom