A product development and problem-solving workshop [1] for middle school kids. It involves storytelling and hands-on exercises around ideation and solution development, culminating in creating prototypes. I developed it to teach my children and their friends how to think about problems and create solutions. Am continuing it refine it, including offering it to different age groups.
Capturing and visualizing research knowledge is personally an exciting space. I feel that deep reading and absorbing content continues to be challenging, due to the ever-increasing amount of published research, rudimentary reading apps (Google PDF reader finally addressing issue with easily looking up references), and due to somewhat disconnected tools for reading and note-taking. Similar to the readers piggy-backing on the PDFjs library, I've developed an app that helps me capture and organize personal research knowledge [1]. Additionally, visualizations and customizable contexts for notes help to recall and link information.
As a daily Zotero user, not really. The nicest thing I can say about it is, it has plugins and is FOSS. Maybe the new 7.0 release will blow me away, but I've been waiting for it to get out of beta forever.
More fundamentally, we need to stop disseminating scholarly work as PDFs, a format primarily designed for print. Plain HTML would be an improvement. Even better than HTML would be an extended variant with scholarly-specific semantic markup and universal, animated, explorable figures. Embedded notebooks would be cool, too, but disseminating data would still be a major challenge. (And I don't just mean storage/transfer; a lot of researchers are reluctant to share source data to the world.)
So I'm a researcher that almost always uses pdfs... Does HTML have the reproducibility that PDF promises? My feeling is that if I store a PDF, it'll look the same in a decade. But is HTML the same way? It seems like it relies on the web browser and many other things... How would one manage things like images and gifs? Is there a way to keep everything into one HTML file that's easily shareable and feels secure?
The potential to freeze an HTML page in time with minimal changes at render time is already there. [0] Such an ability can even be baked directly into the rendered HTML page so the viewer would be able to download a copy of the page as it is seen at a given time. Other archiving facilities, such as archive.org, take static snapshots of accessible pages if allowed by the publisher of the page and requested by anyone who wants to make that snapshot.
My point is that it is possible to achieve in principle and in practice, albeit that might be practiced as often as one would like to see.
I like SingleFile, but it's not perfect. It usually works just fine, but will occasionally drop the ball depending on the type of JavaScript on the page.
For example, I once backed up a page using it, and while it got all the content, it did not grab the JavaScript necessary for the images to display correctly.
> Does HTML have the reproducibility that PDF promises? My feeling is that if I store a PDF, it'll look the same in a decade.
Feelings and promises are each one thing. Reality is another. PDF doesn't even look "the same" today. I have serious questions about how often folks who think that PDF is reliably consistent from system to system step outside their bubble and just how diverse their setups are that they're testing on.
> is HTML the same way?
Well the status-quo for copy-and-paste in HTML isn't dogshit, it's comparatively trivial to find and use tools that can thoroughly and exhaustively search your collection (or even write your own), and HTML is a dead simple plain-text format that if worst comes to worst you can read with your eyes (unlike needing to run a bunch of inscrutable code from a PostScript subset through an interpreter before you can do anything with it). So, no, I wouldn't call them the same.
Machines and humans can both easily use HTML/XML. Extracting information from PDF’s is so much harder that there’s deep learning products dedicated to doing it. They still make mistakes, too.
I’d much rather have something akin to the CHM files where everything I need is in one file, easy to analyze, and has good readers.
I explored tools to export/interchange PDF to HTML in the KnowledgeGarden app, but the results were not optimal, suffering from non-standard layout and poor typesetting of equations. Publishers of scholarly articles generate web pages of papers, but they're not replicas of PDF files.
Re. self-contained HTML (and slightly off-topic), look at TiddlyWiki, which contains data/code/layout all in one interactive, local or hosted HTML. Extensibility, plugins, and community of contributors are some key highlights, among others.
> As a daily Zotero user, not really. The nicest thing I can say about it is, it has plugins and is FOSS. Maybe the new 7.0 release will blow me away, but I've been waiting for it to get out of beta forever.
Can you elaborate where you think Zotero drops the ball?
one major issue with zotero is the lack of android support. they are working on an android version or app or something since forever.
then is the way you store the pfds. if you want to sync between multiple computers you have to either know how to work with webdav or know how to point zotero at the location where you have your pdfs or (what they most certainly love) pay a lot of money for not so much storage space on their system. that last thing is what i don't like because i just don't trust anyone these days. you get invested in a system, build your routine around it only for them to shut it down, sell it watever and then puff you have to start over.
people keep calling zotero foss but if they were truly foss they would have a much more transparent way for people to roll their own selfhosted zotero server. instead, what they have is a dump of an old version, with next to zero documentation and a bunch of stubborn people that have managed to get something working but not quite.
I get that they are trying to make money but I am sure they could do that and be more transparent.
The other thing is the reliance on so many plugins. While zotero itself may last a while, who can say anything about the many devs of the many plugins that you end up relying on in order to make zotero bow to your routine? I like zotfile and a few others, but how long are they going to last? Also, reinstalling my system is a huge pain to get back to my routine because I have to remember all the settings for each and every plugin I install. They should come up with a way to save all these settings and restore them, and no don't do it through another plugin!
Here's a guide I found useful to set up zotero storage. In brief, it relies on zotfile to flatten the storage (keep all pdfs in one directory) and better bibtex.
I realized that it helped me to get rid of exactly the pain with fresh installs that you mention. I realized that the two plugins give me most of the functionality that I want.
I built on top of the in-browser PDF document-reading app to help me curate, visualize, and recall personal knowledge as I read and annotate research papers; the app extracts data from documents such as URLs and references and makes it readily available to view or download - https://www.knowledgegarden.io
Thanks for the nice references. When I read documents, I look to organize and link information in a way that helps me recall its context. Another realization is that papers are often read with different 'hats' - as a reader, as a reviewer, or as a writer. To help my own process I built a document-reading that helps me curate, visualize, and recall personal knowledge as I read and annotate research papers; the app also extracts data from documents such as URLs and references - https://www.knowledgegarden.io
The approaches for information organization versus info retrieval are often different. For example, we mostly read content based on whatever fixed structure it's in, be it a blog post or a scientific article. Note-taking tends to follow that structure. But we retrieve and consume the information in a non-linear, context-driven search.
Putting everything in a rigid hierarchical oder has some benefits, namely familiarityto the author, but incurs the cost of organizing the material and the mental context switch from the task at hand.
I've been researching ways to make information tagging and visual search easy and effective - at the source - in the narrow case of reading scholarly documents [1]. The goal is to avoid a prescribed organization format in favor of contextual tagging, visualization for personal analytics, and linking of concepts afterwards, in a way that reduces distraction from reading and understanding.
Nice idea and interesting space. I tried 'show counts by type_1 ordered by occurrences' and it showed a bar chart, as expected. But 'show statistical distributions' or 'show something surprising about the data' gave the response 'Vizly does not yet support this chart type.' Note that some of the standard tools like Tableau or Observable show basic charts over columns out-of-box after loading a dataset, so the default tabular view could be augmented. I'm aware of at least one mature-looking tool in this space [1] and an interesting thesis [2]. Best wishes with building it out!
Happy Friday! I have built a document-reading app to help me curate, visualize, and recall my knowledge as I read and annotate research papers - https://www.KnowledgeGarden.io
The app also extracts data from documents, such as urls, keywords, and references, and generates a downloadable pdf report with annotations and extracted data.
This is really nice - the simplicity of the interface and ability to read papers without sacrificing the readability of math equations is awesome. Congrats on publishing it. Do you mind sharing how you convert from pdf to html (if at all)? I'm building a tool for reading, curating, and visualizing personal knowledge [1]. Apps like yours and others mentioned in this thread are nice ways to discover papers and address blind spots during literature reviews.
Provenance, as a concept and specification, is well established in digital domain, as described by W3C's PROV specification https://www.w3.org/TR/prov-overview/ Ability to trace, audit, and reproduce artifacts or processes are some applications of provenance that align with needs for explainability in data analytics and data science/AI (XAI).
Using a multi colored pen has helped me mark important information and add my personal context. For example, during meetings or brainstorming I'll mark up information using different colors (red- to do, green- new idea, pencil- plain, sequential notes). I found a good one that has all the colors I need and a mechanical pencil [1]. There is overhead of thinking which color to use, but the value of having the context later is much more.
Neat! interesting to see your context management system. Thanks for sharing your notes. I like to sketch ideas, so use pencil so that I can erase and refine. Curious, how do you manage information from digital content, e.g., notes from reading technical/research articles or blog posts?
> how do you manage information from digital content, e.g., notes from reading technical/research articles or blog posts?
If I could always remember the right keywords to search for the articles I had in mind, I'd just do that.
For online content, most of the time I'll wait for a later 'cache miss' before any bookmarks/notes.
(I'd either come across content when procrastinating on HN/etc., or from looking for it while doing a task. For the former, it's low-effort consumption. For the latter, it's hard to know if it's going to something I have difficulty finding later. IMO, it's not worth putting easy-to-Google things in; lots of stuff is easy to Google for).
For storing stuff, I prefer bookmarks to end up in pinboard, and notes to end up somewhere in org-mode. I've found the zetteldeft package to be useful for me. https://www.eliasstorms.net/zetteldeft/ (builds upon deft. https://jblevins.org/projects/deft/ ). - If in rare cases I find I want to remember some key idea or jargon without having to look it up, then I'll go to the effort of adding it to Anki.
Thanks for sharing! I was also curious if you apply your paper-pen markup style to digital content. Btw, is there a way to send private/direct messages here on HN? thanks.
[1] https://idea-launch-lab.github.io/