Good lord, I just quoted the Spice Girls. I need to sit down for a second.
OK, better now. Still, this will just be a jumble of thoughts more than a proper entry.
I’ve been playing some more with Spycyroll, and I think I’m making headway on adapting it for my purposes. Until today I’ve just been letting it continue to accumulate posts without removing any; as a result, my aggregate page was well over 400K and growing. (The reason for this is related to the date issues I referred to earlier.) Tonight I got (slightly) brighter and realized all I had to do is to put read files in one directory and unread files in another. Through the magic of Python, that took all of about five lines of code, and my aggregate page is a much healthier 16K.
Holding on to all of those deleted files is still an issue. Because I can’t tell what items are no longer in a feed, it’s necessary to hold on to all of them. I’m thinking a database will be necessary, probably of MD5 checksums for each post, but I’m not comfortable enough with Python to start messing around with its database support yet.
I’ve also realized that although rssparser.py
does nice resource retrieval (using If-Modified-Since
/Last-Modified
, If-None-Match
/ETag
, and Accept-Encoding: gzip
), Spycyroll doesn’t take advantage of it. I’ll probably use the filesystem to store that information too in the interim.
A few things to consider there, and there’s more, but I’m getting antsy to try some of this out….
It works, mostly. Now that I’m taking advantage of smart retrieval, the site links in the blogroll part of the page aren’t being filled in, because they’re pulled from the channel. Two steps forward….
MD5 is what AmphetaFrames uses (see previous comment on “MMmMm. Spycy.”. I’m lazy). MD5 and per item APIs will be added in a future version of AmphetaDesk (this missing functionality, which will give me the chance to do “don’t show me items I’ve seen before”) is the sole reason why AmphetaDesk isn’t a 1.0 product yet.