Great question. It is not limited to GDPR. The same output can be used to document sub processors in your privacy notice and vendor disclosures since it shows which third parties and services personal data flows to in your code.
Thanks for your question. I am one of the co-founders. It is the latter. We analyze the names of functions, methods, and variables to detect likely Personally Identifiable Information (PII), Protected Health Information (PHI), Cardholder Data (CHD), and authentication tokens using well tuned patterns and language specific rules. You can see the full list here: https://github.com/hounddogai/hounddog/blob/main/data-elemen...
When we find a match, we trace that data through the codebase across different paths and transformations, including reassignment, helper functions, and nested calls. We then identify where the data ultimately ends up, such as third party SDKs (e.g. Stripe, Datadog, OpenAI, etc.), exposures in API protocols like REST, GraphQL, or gRPC, as well as functions that write to logs or local storage. Here's a list of all supported data sinks: https://github.com/hounddogai/hounddog/blob/main/data-sinks....
Most privacy frameworks, including GDPR and US Privacy Frameworks, require these flows to be documented, so we use your source code as the source of truth to keep privacy notices accurate and aligned with what the software is actually doing.
This whole issue makes me wonder whether the real problem is Amazon’s high cost structure rather than RTO policies alone. If employees are forced back into expensive offices just to justify those huge campus investments, maybe the better fix is shedding real estate, not tweaking attendance rules.
Occupied offices are more expensive to operate than vacant ones (more power, more HVAC, more janitorial). Those buildings might be expensive, but using them is more expensive.
This is awesome! Would be cool if these LLM visualizations were turned into teaching tools, like showing how attention moves during generation or how prompts shift the model’s output. Feels like that kind of interactive view could really help people get what’s going on under the hood.
This is really cool. A big YC advantage is instant access to 5,000 plus peer companies as warm leads. That built in network makes early cross selling much easier than starting cold.
CPU utilization alone is misleading. Pair it with per core load average or runqueue length to see how threads are actually queuing. That view often reveals the real bottleneck, whether it is I/O, memory, or scheduling delays.
Very cool project. One option worth exploring is FlatGeobuf. It provides fast, indexed storage for large GeoJSON datasets and makes it easy to stream only the features you need per tile. It can slot neatly into a vector tile pipeline and improve both batch and real time generation.