Friends,
About a year ago, I set out to build one master adult video scraper to rule them all. One that would be flexible enough that I—or anyone—could add new sites with entirely different structures, without having to revisit the code. And now, after countless late nights, I’m excited to share the first public release (let’s call it a beta) of a project I’ve poured my heart into but can’t exactly put on my LinkedIn: Smutscrape!
Smutscrape scrapes videos with metadata from a rapidly growing list of sites—14 presently, including PornHub, XNXX, xHamster, Xvideos, SpankBang… …and, perhaps revealing more about my specific kinks than anyone would have dared ask: IncestFlix/IncestGuru, FamilyPornTV, FamilyPornHD, Family-Sex, 9Vids, and Motherless.
You can of course grab individual videos, but the real magic happens when you set it to scrape all videos from a set—whether that’s a specific tag, search query, performer, studio, channel, user’s uploads, playlist, etc. Whatever the mode, it gathers relevant metadata as it goes, pulling them into .NFO files alongside the videos for richer information in your media manager of choice (and particularly Stash, Jellyfin, or Plex).
It’ll put the files wherever you like—local filesystem, SMB, or WebDAV share—while respecting the policy you set for handling filename collisions, remembers each successfully scraped URL and avoids checking that URL next time it’s encountered (though you can also have it check for new metadata and refresh existing metadata in your library), and even manages rotating VPN exit nodes on a set interval if that’s how you roll—which I’d respect!
It’s written in Python, configured and customized via one main YAML configuration file, and then relies on separate YAML configurations for each supported site. Adding more sites is dead simple once you’ve a basic grasp of how web scraping works—using Web Inspector in your browser to determine CSS selectors for video streams and metadata elements is all there really is to it.
The interface is something I’m particularly proud of. Run it without arguments and you’ll get a beautiful terminal output with all supported sites, their modes, and examples. Run it with just the site argument and you’ll get a more detailed breakdown of the site featuring the download method, whether Selenium is required, and for some sites (more soon): my own curated notes covering quirks, coverage gaps, modes unique to the site, and more, all generated on the fly from the site configurations in ./sites/
Check it out on GitHub: https://github.com/io-flux/smutscrape
This reminds me of my youth, when I’d write scripts to strip the images off bbs, and then webites, over night, via dial up… Keep up the good work friend, you do God’s work.
I love Linux ISOs. Nice work
That’s awesome! I love the ASCII art banner.
deleted by creator
Definitely gonna check this out when I have the time! I love a good python tool and I love good porn, perfect combo