I got the impression that Gretle will be in the loop in creating the synthetic db. What's not clear is if the Amplify model is trained locally or not. I ask since privacy and security were mentioned as motivation. Can you clarify who gets to see what in this process?
Great question! Short answer is that Gretel supports multiple deployment options, depending on your specific circumstances and needs. If you want a more detailed technical answer, I recommend joining our Discord and asking there. One of our engineers will follow up. Hope that's helpful!
https://grtl.ai/discord
If you’re a Vertex AI user, here’s how you can utilize Gretel to create high-quality synthetic tabular data that you can use as training data for a classification model.
Gretel introduces Reinforcement Learning from Privacy Feedback (RLPF), a method that can be used to align large language models (LLMs) to improve generative quality while also making them more privacy-preserving. Language models leaking proprietary data or custom prompts is a problem that's currently plaguing many generative AI applications. We propose RLPF to mitigate some of these issues. We also suggest future directions to reduce bias, discrimination, and other harmful characteristics that might exist in today’s language models.
In case you missed it, several sessions for _synthesize2023 were announced this week. This event is free and open to all who are interested in learning about state-of-the-art applications for synthetic data and generative AI.
Here are some of the speaker highlights:
- Keynote speaker: Sridhar Ramaswamy, CEO and Cofounder at Neeva and n.xyz, former SVP of Engineering and Ads at Google
- Google research scientist Peter Kairouz will discuss how privacy-enhancing technologies (PETs) like synthetic data and federated learning are helping advance the science and safe application of foundation models.
- Illumina's Senior Director of Emerging Solutions, Pam Cheng, will highlight how synthetic data enables medical and life science research and product development.
- NVIDIA product manager Nyla Worker will demonstrate how to train a perception model, an SDK for creating 3D synthetic data.
Data sharing is central to modern business but entails risks. Synthetic data can enable data sharing while reducing the risk of privacy-compromising linkage attacks.
- Performance metrics for evaluating the quality of data
- How to interpret data quality scores
- Use cases for both low fidelity and high fidelity synthetic data
“ Data is the lifeblood of modern artificial intelligence. Getting the right data is both the most important and the most challenging part of building powerful AI. Collecting quality data from the real world is complicated, expensive and time-consuming. This is where synthetic data comes in.”
How we implemented a practical attack on a synthetic data model to validate its ability to protect sensitive information under different parameter settings.