Hacker Newsnew | past | comments | ask | show | jobs | submit | Leftium's commentslogin

I made a similar project: https://veneer.leftium.com

You can publish any publicly readable sheet/form (append rows with publicly available form):

- sample sheet: https://veneer.leftium.com/s.1RoVLit_cAJPZBeFYzSwHc7vADV_fYL...

- sample form: https://veneer.leftium.com/g.chwbD7sLmAoLe65Z8


How come you don't show the realtime transcription... in realtime?

I think it would make it feel even faster.

> the UX difference between streaming and offline STT is night and day. Words appearing while you're still talking completely changes the feedback loop. You catch errors in real time, you can adjust what you're saying mid-sentence, and the whole thing feels more natural. Going back to "record then wait" feels broken after that.

(https://hw.leftium.com/#/item/47149479)


I think realtime transcription hurts the UX of polishing what's said worse. In FreeFlow the output of the transcription is fed to an LLM to polish in context of where the text is being injected. This way we can go beyond naive transcription.

FreeFlow already feels extremely fast and text being typed as I dictate is distracting especially if the polishing phase edits it.


I would delay polishing until right before delivery.

Eventually, I will add a polishing step to my own https://rift-transcription.vercel.app.

Right now, you can experience what true realtime streaming transcription feels like.

I plan to add two "levels" of polishing:

- Simple deterministic text replacements will be applied to both interim and final text.

- LLM polishing will only be applied right before delivery.

- It will be possible to undo one or both polishing steps. (Actually even more fine-grained undo: at the replacement rule level).


That said, FreeFlow is open source for exactly this reason, everyone will have their own preference. If you would like to turn this behavior into a configurable preference, we'd happily accept a pull request.

I'm not familiar with screenshot drag. Does that copy the image into the target?

If you take a lot of screenshots, I highly recommend https://shottr.cc (nagware/freemium)

- Shows preview with buttons to copy to clipboard and/or save to file

- Can be configured to automatically copy/save (open app to preview last capture)

- Preview has tons of useful features like crop, annotations, color picker, ruler, OCR


This was shared on HN over a decade ago, but still stands the test of time: http://ciar.org/ttk/public/apigee.web_api.pdf


Thank you!


I think it's better to have the AI write scripts that extract the data required from logs vs directly shoving the entire log content into the AI.

An example of this is: I had Claude analyze the hourly precipitation forecasts for an entire year across various cities. Claude saved the API results to .csv files, then wrote a (Python?) script to analyze the data and only output the 60-80% expected values. So this avoided putting every hourly data point (8700+ hours in a year) into the context.

Another example: At first, Claude struggled to extract a very long AI chat session to MD. So Claude only returned summaries of the chats. Later, after I installed the context mode MCP[1], Claude was able to extract the entire AI chat session verbatim, including all tool calls.

1. Sometimes?

2. Described above. I also built a tool that lets the dev/AI filter (browser dev console)logs to only the loggs of interest: https://github.com/Leftium/gg?tab=readme-ov-file#coding-agen...

3. It would be interesting to combine your log compression with the scripting approach I described.

[1]: https://hw.leftium.com/#/item/47193064


This list is a little old, but found some gems: https://web.archive.org/web/20191114220720if_/http://lazerwa...

I recall I enjoyed Hoplite, Data Wing, Mini Metro, Super Mario Run, and a few others.

---

You probably already know Apple arcade curates a set of games. Many of the 'plus' versions of games have the ad/loot box features stripped or set to "free."


A note on Super Mario Run. When this first came out I tried to play it on an airplane and it didn’t work. There was some sort of phone home to make sure it was a legit copy on launch. When it couldn’t perform this check, the game wouldn’t load.

Things could have changed since then, as this was many years ago, but something to look out for and check if this is a concern.


# I think it's possible to architecture around this. For example, here is one idea:

- make the game as functional as possible: as in the game state is stored in a serializable format. New game states are generated by combining the current game state with events (like player input, clock ticks, etc)

- the serialized game state is much more accessible to the AI because it is in the same language AI speaks: text. AI can also simulate the game by sending synthetic events (player inputs, clock ticks, etc)

- the functional serialized game architecture is also great for unit testing: a text-based game state + text-based synthetic events results in another text-based game state. Exactly what you want for unit tests. (Don't even need any mocks or harnesses!)

- the final step is rendering this game state. The part that AI has trouble with is saved for the very end. You probably want to verify the rendering and play-testing manually, but AI has been getting pretty decent at analyzing images (screenshots/renders).

# Here is an example of a simple game developed with functional architecture: https://github.com/Leftium/tictactoe/blob/main/src/index.ts

- Yes, it's very simple but the same concepts will apply to more complex games

- Right now, there is only rendering to the terminal, but you could imagine other renders for the browser and game engines


That's true - state serialization can definitely help.

> AI has been getting pretty decent at analyzing images (screenshots/renders).

I've found AI to be hit or miss on this - especially if the image is busy with lots of elements. They're really good at ad-hoc OCR but struggle more with 3d visualizations in a game that might be using WebGL.

For example, setting up the light sources (directional, lightmaps, etc) in my 3D chess game to ensure everything looked well-lit while also minimizing harsh specular reflections was something VLMs (tested with Claude and Gemini) failed pretty miserably at.

https://shahkur.specr.net


How does this compare to another beautiful way to frame time + c that really made sense to me:

- Everything is moving through space-time at c: c is not a limit; it's just the speed everything moves

- Things that don't appear to be moving in the physical dimensions have most or all of c spent in the time dimension

- Things that move very fast in the physical dimensions have little or none of c spent in the time dimension

- I think this is similar to your section explaining time dilation, but doesn't require rotation: https://lisajguo.substack.com/i/190415584/time-dilation

---

Other questions:

- Does this theory explain why we seem to only be able to travel through time in one direction? Why does the angle/direction of rotation (not) matter?


# My over-engineered console.log replacement is almost API/feature-stable: https://github.com/Leftium/gg

- Named `gg` for grep-ibility and ease of typing.

- However Claude has been inserting most calls for me (and can now read back the client-side results without any dev interaction!)

- Here is how Claude used gg to fix a layout bug in itself (gg ships with an optional dev console): https://github.com/Leftium/gg/blob/main/references/gg-consol...

---

# I've been prototyping realtime streaming transcription UX: https://rift-transcription.vercel.app

- Really want to use dictation app in addition to typing on a daily basis, but the current UX of all apps I've tried are insufficient.

---

# https://veneer.leftium.com is a thin layer over Google forms + sheets

- If you can use Google forms, you can publish a nice-looking web site with an optional form

- Example: https://www.vivimil.com

- Example: https://veneer.leftium.com/s.1RoVLit_cAJPZBeFYzSwHc7vADV_fYL...

- DEMO (feel free to try the sign up feature): https://veneer.leftium.com/g.chwbD7sLmAoLe65Z8


Seems like this one is Windows-only (even though it's Tauri?)

And it's not local (uses a cloud-based transcription API)

Also doesn't seem like it's realtime streaming, either. To get the most connected typing experience, try showing results in under a second from within the first word spoken (not after the utterance is complete)

This HN comment captures why realtime streaming is important: https://hw.leftium.com/#/item/47149479

I've also been prototyping realtime streaming transcription with multimodal input: https://rift-transcription.vercel.app


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: