How It Works: Queue
Overview
The queue is more complex than it may seem at first glance. Beneath the unassuming list is a machine that integrates data from many sources and tries its best to make archiving artwork easy and low-effort. No machine is perfect, however, and understanding its mechanisms and design will make troubleshooting easier.
To begin, the queue is a first-in, first-out list. The images and links you add to it will be processed one by one in the order you added them, in batches of 10.
Supported Sites
Werehouse is able to archive images from all of these websites:
- FurAffinity
- e621 (and e926)
- Itaku
- Bluesky
- Twitter (the new name is an unhinged businessman’s pipe dream that I refuse to use)
- Mastodon servers (yes, specifically just Mastodon)
- Cohost
- Weasyl
- Inkbunny
- Telegram (public channels)
Future development includes plans for the following websites and protocols:
- ActivityPub (this one is complicated: it will require making implementing a large portion of an ActivityPub server.)
If there’s a place you’d like support for which isn't on either of these lists, please open an issue on GitHub to request that it be added!
Unsupported Sites
Werehouse is not able to archive images from these websites:
- DeviantArt: Removed in 2025 because they've gone all-in on AI slop generation, and rewriting the scraper to support their requirement to download deviation media files with authentication was more annoying than just disabling support for the site entirely.
Procedure
Werehouse follows the same procedure each time it tries to process a queue item:
- Find Sources
If the queue item is an image, Werehouse first tries to find sources by asking FuzzySearch and Fluffle.xyz. If neither of them know where the image comes from, Werehouse gives up. On the other paw, if the queue item is a link to a webpage, Werehouse assumes that link is the source, and proceeds onwards. This step accepts a queue item and produces a list of links. - Scrape Sources
Werehouse tries to download information about each of the potential sources it found. This includes obvious things, like the link to the image(s), but also many kinds of metadata, such as maturity level, dimensions, tags, the artist’s profile, and more. This step accepts a list of links and produces a list of scraped image data, one for each link. If the link referred to a post with multiple images (such as on Twitter or Weasyl), the scraped data includes information about all of the images. This step accepts a list of links and produces a list with sub-lists containing the scraped data about each image from each link. - Fetch Images
Werehouse downloads the full-resolution images from every source (if they weren’t already downloaded). This step accepts a list of sub-lists of scraped image data and produces the same list, but with full-size images included. - Duplicate Check
For each of the images, Werehouse computes a content hash of the image to see if something similar has already been archived. It also looks through all of the source links currently in the archive to see if the link has been saved before. If it finds a similar hash or a duplicate link, it stops and asks for help. This step accepts a list of sub-lists of scraped image data and if there were no duplicates, passes it right along through. - Add to Archive
The full-size images and all of the other information are saved to your archive. This step accepts a list of scraped image data. (It produces nothing because it is the final step.)
States
A queue item can be in one of these states:
- To Do: Newly added, ready to be processed.
- To Do (again): You just answered a request for help, and now the queue system needs to process it again.
- Needs Help: There are multiple ways to proceed, and your input is needed to pick which one.
- Error: The item is not archived, and Werehouse will not automatically try again.
- “Temporary” Error Wasn’t: A temporary error (such as a server being overloaded) occurred more than three times in a row, and Werehouse stopped trying (so as to not make the problem worse).
- Archived: The item has been archived.
- Discarded: When answering a request for help, you clicked the “Discard All” button. The record stays in your queue in case you reconsider, but it will be deleted by the “Clean Up” button.