I've been tempted to look into API-based HN access having scraped the front-page archive about two years ago.
One of the advantages of comments is that there's simply so much more text to work with. For the front page, there is up to 80 characters of context (often deliberately obtuse), as well as metadata (date, story position, votes, site, submitter).
I'd initially embarked on the project to find out what cities were mentioned most often on HN (in front-page titles), though it turned out to be a much more interesting project than I'd anticipated.
(I've somewhat neglected it for a while though I'll occasionally spin it up to check on questions or ideas.)
One of the advantages of comments is that there's simply so much more text to work with. For the front page, there is up to 80 characters of context (often deliberately obtuse), as well as metadata (date, story position, votes, site, submitter).
I'd initially embarked on the project to find out what cities were mentioned most often on HN (in front-page titles), though it turned out to be a much more interesting project than I'd anticipated.
(I've somewhat neglected it for a while though I'll occasionally spin it up to check on questions or ideas.)