Hacker Newsnew | past | comments | ask | show | jobs | submit | junyanz's commentslogin

I actually did two internship there. The Seattle lab did ship many new features like content-aware fill, shake reduction introduced in Photoshop CC 2015. The researchers there also has lots of freedom to explore different directions not directly related to the products, and it turns out that many of them will become a new feature of products within a few years.



This is a brilliant idea. I guess it would be difficult for this work to accomplish as you need to train a neural net on tons of data (like 100k images, or millions), and we cannot find so many paintings with consistent style.

Work like Deep style transfer, or Prisma can try to transfer the style of one painting to an existing user photo. But you cannot use it as painting tool for creating new stuff.


Thanks!

There's got to be a way, although it might be incestuous. Use Deep style transfer and/or Prisma to massively increase the body of work, by transforming other work into that style, and then using that as training data for this...? Then I guess the artistry is in filtering those images, but that's a lot of images...

OOOOOHHHH WAIT. Remember how there's that dude who gets shown surveillance images from the middle east, and a computer watches his brain for the faster-than-thought responses to there being things in those images? That same trick MIGHT work for artistic sensibilities, but the response might not be identifiable enough.


We are working on something similar to your idea. We generate sketch images from real images automatically and train a model on the sketch images. So ideally, if a user draw the left wheel of the bicycle, the system will produce the entire bicycle sketch. We will release this 'sketch' feature in a few days and hope it will help a user better sketch object.

As you said, one can also apply other filters like Prisma.


Neat! Who's "we"? I'd like to read more.


Sure. The current generative models cannot produce good details, and the generated images are often low resolution (e,g. 64x64). In the paper, we tried to enhance the low res result by stealing the high res details from the original photo. But in general, there are not many things you can do.

On the other hand, in the recent years, we see dramatic improvement of image quality from these generative models. Overall I think this is a promising and exciting direction.


Yes! In 1948, Shannon proposed using a Markov chain to create a statistical model of the sequences of letters in a piece of English text and this model can be used to generate random text given some existing text. (http://www.cs.princeton.edu/courses/archive/spr05/cos126/ass...). Here is a GitHub implementation: https://github.com/jsvine/markovify Deep models like LSTM/RNN can probably produce better results.


Some fun examples of text generation using LSTM/RNN (and a good overview of RNNs for sequences): http://karpathy.github.io/2015/05/21/rnn-effectiveness/#fun-...


According to a talk by Max Tegmark[0] (and its associated paper[1]), neural nets (particularly LSTMs) might be inherently better at this sort of thing due to the way they model mutual information.

Markov models are best suited to situations where an observation k-steps in the past gives exponentially less information about the present[2] (decaying according to something like λ^k for 0 <= λ < 1). Intuitively, the amount of context imparted by a word or phrase decays somewhat more slowly. That is, if I know the previous five words, I can make a good prediction about the next one, and likely the next one, and slightly less likely the one after that, whereas in a Markovian setting my confidence in my predictions should decay much more quickly.

So in answer to the grandparent, such a thing should be reasonably straightforward to build if it doesn't exist already, and it may offer improvements over a similar model based on Markov chains.

---

0. https://www.youtube.com/watch?v=5MdSE-N0bxs

1. https://arxiv.org/abs/1606.06737

2. Why is this? Lin & Tegmark offer details in the paper, but it comes from the fact that the singular values of the transition matrix are all less than or equal to one (an aperiodic & ergodic transition matrix has only one singular value equal to one), and so the other singular vectors fall away exponentially quickly, with the exponent's base being their corresponding singular value.


It sounds like Tegmark is pointing out a pretty obvious and deliberately designed property of LSTMs... the entire point of them is to avoid exponentially decaying / exploding gradients and allow propagation of information over longer time-scales.


Check out this rather entertaining talk from GitHub universe about the use of an LSTM to generate a film script: https://www.youtube.com/watch?v=W0bVyxi38Bc

and the short film they made, using that script: https://www.youtube.com/watch?v=LY7x2Ihqjmc

(disclosure: i work for github on events/AV)


His subsequent colleagues fired it at Usenet

https://en.wikipedia.org/wiki/Mark_V._Shaney


You are absolutely right. I believe this deep learning technique is a fancy way for mixing many many images automatically given a user's guidance.


Do you think a similar technique could work for generating 3D models?

For example, it's not hard to imagine future organic sculpting packages (e.g. ZBrush) having this type of tech integrated. Perhaps in-game character sculpting systems as well.


It's possible. But 3D data (3D models, videos) are much more difficult to model via a deep neural net. While most of the researchers focus on modeling 2d images in recent years, there are a few work on 3D. For example, here is a project on modeling 3D objects like chairs and tables. https://arxiv.org/abs/1411.5928


Thanks for the reply. As coincidence would have it, this appeared on HN just a couple hours after, referencing the same paper:

https://news.ycombinator.com/item?id=12581420

Appears to be fundamentally 2D, but the interpolation between orientations gives it a sort of meta 3D aspect.


Thanks for sharing our work. Check out the full video: https://m.youtube.com/watch?v=9c4z6YsBGQ0

This work is a deep learning extension of our previous average image project: https://m.youtube.com/watch?v=1QgL_aPPCpM. See the New Yorker article for details: http://www.newyorker.com/tech/elements/out-of-many-one

I guess the deep learning might be a better way to blend millions of images for creating new visual content.


Thank you guys for the interest. The interactive image generation interface as well as trained models are available at https://github.com/junyanz/iGAN. It is still under active development and let me know if you find bugs or would like to suggest new features.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: