Hacker Newsnew | past | comments | ask | show | jobs | submit | polishdude20's commentslogin

Its less about "crimes" and more about a moral or ethical boundary that people feel is being crossed.

Yeah think of it as a moral crime. Someone can achieve tax evasion completely legally but that doesn't make it fair or right.

What is structured water?

Yeah okay... Surprised to see this as the top comment.

> Hexagonal water, also known as gel water, structured water, cluster water,[1] H3O2 or H3O2 is a term used in a marketing scam[2][3] that claims the ability to create a certain configuration of water that is better for the body.[4]

> The concept of hexagonal water clashes with several established scientific ideas. Although water clusters have been observed experimentally, they have a very short lifetime: the hydrogen bonds are continually breaking and reforming at timescales shorter than 200 femtoseconds.[7] This contradicts the hexagonal water model's claim that the particular structure of water consumed is the same structure used by the body.

https://en.wikipedia.org/wiki/Hexagonal_water


Though funnily enough, you can make real 'structured water' at home in your freezer. Making your ice crystals hexagonal is theoretically possible, but it's really, really hard to grow monocrystaline water ice. That might be a really interesting niche hobby, though.

See https://www.youtube.com/watch?v=VA710QYxEu0 for the latter.


Well yes, that’s in a solid state. Lots of crystals have hexagonal structures since it’s the optimal packing distribution.

If “structured water” just means that there are tiny ice crystals in water, sure that’s very plausible, but I doubt it would have much of an effect.

PS: Trying to grow crystals of different challenging structures does sound like an awesome hobby.


Oh, the pseudo-science 'structured water' is absolutely bonkers. I just went off on a mildly interesting tangent.

A so-called fourth phase of water (liquid, but with some crystalline organization) that grows on hydrophilic surfaces by absorbing ultraviolet and infrared light, and organizes into a honeycomb-like lattice similar to ice, but lacking the H+ binding layers to make it rigid. It has higher viscosity than bulk water, and a net-negative charge.

Yes, it's a relatively recent concept (decades) pursued mostly by Gerald Pollack at University of Washington and not widely replicated, though there is some replication that has prompted critical review (https://pmc.ncbi.nlm.nih.gov/articles/PMC7404113/). It's also downstream of work by Albert Szent-Györgyi (Nobel prize for vitamin C) and Gilbert Ling. And, of course, there are a bunch of folks Pollack distances himself from commercializing the concept.

From the horse's mouth: https://www.pollacklab.org/research

If I had a coloring book for every person who cited wikipedia as a reliable source on cutting-edge science... I'd have Christmas presents for a bunch of people I don't know!


Hey in really interested in your pipeline techniques. I've got some pdfs I need to get processed but processing them in the cloud with big providers requires redaction.

Wondering if a local model or a self hosted one would work just as well.


I run llama.cpp with Qwen3-VL-8B-Instruct-Q4_K_S.gguf with mmproj-F16.gguf for OCR and translation. I also run llama.cpp with Qwen3-Embedding-0.6B-GGUF for embeddings. Drupal 11 with ai_provider_ollama and custom provider ai_provider_llama (heavily derived from ai_provider_ollama) with PostreSQL and pgvector.

People on site scan the documents and upload them for archival. The directory monitor looks for new files in the archive directories and once a new file is available, it is uploaded to Drupal. Once a new content is created in Drupal, Drupal triggers the translation and embedding process through llama.cpp. Qwen3-VL-8B is also used for chat and RAG. Client is familiar with Drupal and CMS in general and wanted to stay in a similar environment. If you are starting new I would recommend looking at docling.


Are you linking any of the processes using the Drupal AI module suite?

Yes, they are all linked using Drupal's AI modules. I have an OpenCV application that removes the old paper look, enhances the contrast and fixes the orientation of the images before they hit llama.cpp for OCR and translation.

Disclaimer: I'm an AI novice relative to many here. FWIW last wknd I spent a couple hours setting up self-hosted n8n with ollama and gemma3:4b [EDIT: not Qwen-3.5], using PDF content extraction for my PoC. 100% local workflow, no runtime dependency on cloud providers. I doubt it'd scale very well (macbook air m4, measly 16GB RAM), but it works as intended.

For those who wish to do OCR on photos, like receipts, or PDFs or anything really, Paperless-NGX works amazingly well and runs on a potato.

How do you extract the content? OCR? Pdf to text then feed into qwen?

I tried something similar where I needed a bunch of tables extracted from the pdf over like 40 pages. It was crazy slow on my MacBook and innacurate


If you have a basic ARM MacBook, GLM-OCR is the best single model I have found for OCR with good table extraction/formatting. It's a compact 0.9b parameter model, so it'll run on systems with only 8 GB of RAM.

https://github.com/zai-org/GLM-OCR

Use mlx-vlm for inference:

https://github.com/zai-org/GLM-OCR/blob/main/examples/mlx-de...

Then you can run a single command to process your PDF:

  glmocr parse example.pdf

  Loading images: example.pdf
  Found 1 file(s)
  Starting Pipeline...
  Pipeline started!
  GLM-OCR initialized in self-hosted mode
  Using Pipeline (enable_layout=true)...

  === Parsing: example.pdf (1/1) ===
My test document contains scanned pages from a law textbook. It's two columns of text with a lot of footnotes. It took 60 seconds to process 5 pages on a MBP with M4 Max chip.

After it's done, you'll have a directory output/example/ that contains .md and .json files. The .md file will contain a markdown rendition of the complete document. The .json file will contain individual labeled regions from the document along with their transcriptions. If you get all the JSON objects with

  "label": "table"
from the JSON file, you can get an HTML-formatted table from each "content" section of these objects.

It might still be inaccurate -- I don't know how challenging your original tables are -- but it shouldn't be terribly slow. The tables it produced for me were good.

I have also built more complex work flows that use a mixture of OCR-specialized models and general purpose VLM models like Qwen 3.5, along with software to coordinate and reconcile operations, but GLM-OCR by itself is the best first thing to try locally.


Thanks! Just tried it on a 40 page pdf. Seems to work for single images but the large pdf gives me connection timeouts

I also get connection timeouts on larger documents, but it automatically retries and completes. All the pages are processed when I'm done. However, I'm using the Python client SDK for larger documents rather than the basic glmocr command line tool. I'm not sure if that makes a difference.

Yeah looks like the cli also retries as well. I was able to get it working using a higher timeout.

Cool! For GLM-OCR, do you use "Option 2: Self-host with vLLM / SGLang" and in that case, am I correct that there is no internet connection involved and hence connection timeouts would be avoided entirely?

When you self-host, there's still a client/server relationship between your self-hosted inference server and the client that manages the processing of individual pages. You can get timeouts depending on the configured timeouts, the speed of your inference server, and the complexity of the pages you're processing. But you can let the client retry and/or raise the initial timeout limit if you keep running into timeouts.

That said, this is already a small and fast model when hosted via MLX on macOS. If you run the inference server with a recent NVidia GPU and vLLM on Linux it should be significantly faster. The big advantage with vLLM for OCR models is its continuous batching capability. Using other OCR models that I couldn't self-host on macOS, like DeepSeek 2 OCR or Chandra 2, vLLM gave dramatic throughput improvements on big documents via continuous batching if I process 8-10 pages at a time. This is with a single 4090 GPU.


1. Correction: I'd planned to use Qwen-3.5 but ended up using gemma3:4b.

2. The n8n workflow passes a given binary pdf to gemma, which (based on a detailed prompt) analyzes it and produces JSON output.

See https://github.com/LinkedInLearning/build-with-ai-running-lo... if you want more details. :)


Python pdftools to convert to images and tesseract to ocr them to text files. Fast free and can run on CPU.

Seconded, would also love to hear your story if you would be willing


Let's see, this is a low speed 2x16GB DDR4 kit for $300.

The closest option on the pcpartpicker chart was about $75 as a stable price. So that one's only a 4x increase.

Versus DDR5 where... it looks like a 5x increase to me? I'm seeing a jump from 200USD up to 1000USD. Edit: Oh there's an extra jump in the last month on the CAD version but not the USD version.


that was like $80 last year.

A few years ago I did a bit more of a crude flow.

Play the footage on a tv in a dark room. Place a 4k camera on a tripod and record the tv with audio into the camera audio port.

Worked perfectly.


Actually not a terrible way to go from interlaced to progressive footage. Depending on the TV and camera

I'd love to see some data for how much it has improved via this process in the last week


It would be the same as kimi k2.5, the underlying model


Surely this Musk project will happen in the time he says it will.


Musk has lots of experience with ignoring lack of experience; should speed the process up.


This is sarcasm, right?


It's probably because usually normal people don't but routers because they get them included in their internet subscription. So the people buying them have a specific reason to that normal routers don't do


It's a travel router which power users buy to get good connectivity away from home and office. An hotel won't offer you that (and chances are that they'll try to rip you off on their wifi).


Assuming you can find an Ethernet port to supply it, that is. Most hotels don't make them easy to find and use, if they even have them.

More common is that you use the travel router to connect to hotel WiFi and then share out that connection. It's slower than using directly, but it's great for family travel since you can name your travel SSID the same as your home network - all your usual devices will connect automatically, and will use any whole-connection VPN you have set up (most of the gl.inets will do Wireguard, OpenVPN, and Tailscale that I know of straight out of the box, and they will let you into luci or via SSH to configure the underlying OpenWRT directly for anything else). And, of course, it's just one device for hotels that try to limit the number of devices you use.


As far as travel and hotel goes, another huge benefit is that the router enables devices without captive portal support, on a recent trip I can use: - Fi base station for my dogs trackers (huge for me) - FireTV stick (no need to trust hotel streaming apps will clear your credentials like they claim)

Also I can WireGuard back home automatically for select IP ranges (no need to configure WireGuard separately on many of my devices)


Yeah and especially the satisfaction that you were able to make a user delighted to use your thing. Fixing bugs, making things faster, adding new features, for me personally I do it because I feels really good when a customer loves to use the thing I've built.

Weather I've done the manual coding work myself or have prompted an LLM to cause these things to happen, I still chose what to work on and if it was worthy of the users' time.


I use my wired sex toys usually. Ethernet works really well.


10G Ethernet has improved their performance so greatly.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: