Do these transformations leave any discrepancy or signature in the video or audio that would be detectable by a machine? (So, tiny, tiny discrepancies might work.) Someone could make a browser plugin to alert the user when video/audio has a good chance of being fake.
If software can identify it as fake, another can be improved till that isn't the case anymore. This is actually being used, search Generative Adversarial Network for more info and background.