Introduction to IEEE STANDARDS and its different types.pptx
DTLC-GAN
1. Generative Adversarial Image
Synthesis with Decision Tree
Latent Controller (CVPR2018)
Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino
NTT Communication Science Laboratories, NTT
Corporation
presenter Seitaro Shinagawa (NAIST/RIKEN)
2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
※Figures are quoted from the authors’ paper and poster
[project page]:http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/dtlc-gan/
1/28
2. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
Self-introduction
Favorite model(?): Tay
Interest:
Interaction between human and
machine
Research Topic:
Dialog based Image generation
1989 Born in Sapporo
2009-2015 Tohoku Univ.
2015- NAIST(Ph.D student)
2/28
3. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
In the image generation task,
DTLC-GAN divides the latent variable into controllable tree
structure one and uncontrollable one.
Summary
デモ画像
3/28
5. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
Yoshua Bengio, Aaron Courville, and Pascal Vincent, 2014
Representation Learning: A Review and New Perspectives
“In the case of probabilistic models, a good representation is often one that
captures the posterior distribution of the underlying explanatory factors for the
observed input. A good representation is also one that is useful as input to a
supervised predictor.”
What is “good representation?”
composed of explanatory factors
good input to training new predictor
independently controllable
Emmanuel Bengio et al., 2017
Independently Controllable Features
“... assume that there are factors of variation underlying the observations coming
from an interactive environment that are “independently controllable.” ... ”
In summary, “good representation” represents each element
of latent vector captures a independent meaning or concept
5/28
6. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
Supervised learning requires annotation!
We are exhausted with annotation!
Some annotation task is difficult because of noisy annotation!
Unsupervised learning can reduce annotation cost!
Previous works: InfoGAN[Chen+, NIPS2016], beta-VAE[Higgins+, ICLR2017]
My concern
How does tree structure help in image generation?
Unsupervised disentanglement
NIPS読み会・関西第一回で堀井さんがInfoGANについて紹介してくださっ
てておススメです(↓This slides is written in Japanese)
https://www.slideshare.net/takato_horii/nips-horii
6/28
7. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
Related work: InfoGAN [Chen+, NIPS2016]
z
real/fakeGen Dis
real
fake
c
• c: discrete latent code
• z: vector derived from random noise
• c’: predicted latent code
learning to make c and G(z,c) highly correlated
c’
Maximize mutual information 𝐼 𝑐; 𝐺 𝑧, 𝑐
The point for disentanglement
7/28
21. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
MNIST comparison with InfoGAN
⋯
⋯
⋯
when
a category 1(ON),
others 0(OFF)
1 0 0
fixed noise
in each row
DTLC-GAN
has
l=1: digit class
l=2: font style
InfoGAN failed to capture digit class
21/28
22. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
Image quality comparison
Adversarial Accuracy: 2 classifier accuracy, trained by generated or real images
Adversarial Divergence: KL divergence between the 2 classifiers’ output distro.
Image quality of DTLC-GAN is not worse than other methods
22/28
23. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
Effectiveness of curriculum with CIFAR-10
Weakly supervised setting: first layer is composed of known label
(The nodes for known labels are fixed)
Evaluation metric:
structural similarity(SSIM) between 2 images from
different latent code with 50,000 random sampled pairs
(any previous layer and noise value are fixed)
SSIM would be higher and higher when
the evaluated layer is lower and lower
23/28
24. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
w/o curriculum result
The latent code of the all layer have low SSIM
We can not find hierarchical structure...
24/28
25. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
w/ curriculum result (proposed)
Similarity becomes larger in lower-layer codes!
Latent codes are well hierarchically-organized!
start l=3 training
start l=4 training
25/28
26. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
Continuous Codes result with 3D Faces
𝑘1 = 5 (𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒)
𝑘2 = 1 (𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠)
-1 1
Layer 2 expresses the angle of face
26/28
27. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
Image retrieval with CelebA
top 3 images retrieved by L2 distance of predicted label 𝑐1, ෝ𝑐2, ෝ𝑐3 between
query and candidate images in database
Using lower and lower predicted label to calculate L2 distance,
more and more suitable image appeared!?
27/28
28. 2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
Conclusion
DTLC-GAN can be seen as an extension of InfoGAN
Latent code becomes hierarchical structure
HCMI loss and curriculum learning helps to obtain
interpretable (disentangled) representation
Generated results are as good as other GAN methods
My COMMENTs and QUESTIONs
It would be useful that DTLC-GAN can deal with big change
and small change separately and in a few stages
The tree structure has to be defined in advance
• Is progressive growing style possible?
28/28