DTLC-GAN

Generative Adversarial Image
Synthesis with Decision Tree
Latent Controller (CVPR2018)
Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino
NTT Communication Science Laboratories, NTT
Corporation
presenter Seitaro Shinagawa (NAIST/RIKEN)
2018/8/24 2018ⒸSeitaro Shinagawa AHC-lab NAIST
※Figures are quoted from the authors’ paper and poster
[project page]:http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/dtlc-gan/
1/28

Self-introduction
Favorite model(?): Tay
Interest:
Interaction between human and
machine
Research Topic:
Dialog based Image generation
1989 Born in Sapporo
2009-2015 Tohoku Univ.
2015- NAIST(Ph.D student)
2/28

In the image generation task,
DTLC-GAN divides the latent variable into controllable tree
structure one and uncontrollable one.
Summary
デモ画像
3/28

“Good representation” is important
「良い表現」の獲得は重要
Motivation
4/28

Yoshua Bengio, Aaron Courville, and Pascal Vincent, 2014
Representation Learning: A Review and New Perspectives
“In the case of probabilistic models, a good representation is often one that
captures the posterior distribution of the underlying explanatory factors for the
observed input. A good representation is also one that is useful as input to a
supervised predictor.”
What is “good representation?”
 composed of explanatory factors
 good input to training new predictor
 independently controllable
Emmanuel Bengio et al., 2017
Independently Controllable Features
“... assume that there are factors of variation underlying the observations coming
from an interactive environment that are “independently controllable.” ... ”
In summary, “good representation” represents each element
of latent vector captures a independent meaning or concept
5/28

Supervised learning requires annotation!
 We are exhausted with annotation!
 Some annotation task is difficult because of noisy annotation!
Unsupervised learning can reduce annotation cost!
Previous works： InfoGAN[Chen+, NIPS2016], beta-VAE[Higgins+, ICLR2017]
My concern
How does tree structure help in image generation?
Unsupervised disentanglement
NIPS読み会・関西第一回で堀井さんがInfoGANについて紹介してくださっ
てておススメです(↓This slides is written in Japanese)
https://www.slideshare.net/takato_horii/nips-horii
6/28

Related work: InfoGAN [Chen+, NIPS2016]
z
real/fakeGen Dis
real
fake
c
• c: discrete latent code
• z: vector derived from random noise
• c’: predicted latent code
learning to make c and G(z,c) highly correlated
c’
Maximize mutual information 𝐼 𝑐; 𝐺 𝑧, 𝑐
The point for disentanglement
7/28

How to maximize I(c;G(z,c)) using Q distro.
z
real/fakeGen Dis
real
fake
c c’
Lemma 5.1
random variable X,Y, function f
𝔼 𝑥~𝑋, 𝑦~𝑌|𝑥, 𝑥′~𝑋|𝑦 𝑓(𝑥′
, 𝑦)
= 𝔼 𝑥~𝑋, 𝑦~𝑌|𝑥 𝑓 𝑥, 𝑦
𝐼 𝑐; 𝐺 𝑧, 𝑐
= 𝐻 𝑐 − 𝐻 𝑐 𝐺 𝑧, 𝑐
= 𝔼 𝑥~𝐺 𝑧,𝑐 𝔼 𝑐′~𝑃 𝑐 𝑥 log 𝑃 𝑐′ 𝑥 + 𝐻 𝑐
= 𝔼 𝑥~𝐺 𝑧,𝑐 𝐷 𝐾𝐿 𝑃 𝑐′ 𝑥 ∥ 𝑄 𝑐′ 𝑥 + 𝔼 𝑐′~𝑃 𝑐 𝑥 log 𝑄 𝑐′ 𝑥 + 𝐻 𝑐
≥ 𝔼 𝑥~𝐺 𝑧,𝑐 𝔼 𝑐′~𝑃 𝑐 𝑥 log 𝑄 𝑐′ 𝑥 + 𝐻 𝑐
= 𝔼 𝑥~𝐺 𝑧,𝑐 𝔼 𝑐′~𝑃 𝑐 𝑥 log 𝑄 𝑐′ 𝑥 + 𝐻 𝑐
= 𝔼 𝑐~𝑃 𝑐 , 𝑥~𝐺 𝑧,𝑐 log 𝑄 𝑐 𝑥 + 𝐻 𝑐
loss between c and c’
8/28

Objective functions
z
real/fakeGen Dis
real
fake
c c’
min
𝐺,𝑄
max
𝐷
𝑉𝐼𝑛𝑓𝑜𝐺𝐴𝑁 𝐷, 𝐺, 𝑄 = 𝐿 𝐺𝐴𝑁(𝐷, 𝐺) − 𝜆𝐿 𝑀𝐼 𝐺, 𝑄
𝐿 𝐺𝐴𝑁 𝐷, 𝐺
= 𝔼 𝑥~𝑃 𝑑𝑎𝑡𝑎 𝑥 log 𝐷 𝑥 + 𝔼 𝑧~𝑃𝑧 𝑧 log 1 − 𝐷 𝐺 𝑧
𝐿 𝑀𝐼 𝐷, 𝐺 = 𝔼 𝑐~𝑃 𝑐 , 𝑥~𝐺 𝑧,𝑐 log 𝑄 𝑐 𝑥
c is discrete: softmax cross entropy loss
c is continuous: KL loss for factored Gaussian
c
loss
9/28

Objective functions of DTLC-GAN
z
real/fakeGen Dis
real
fake
𝒄 𝟏
′
min
𝐺,𝑄1,…,𝑄 𝐿
max
𝐷
𝑉𝐷𝑇𝐿𝐶 𝐷, 𝐺, 𝑄 = 𝐿 𝐺𝐴𝑁 𝐷, 𝐺 − 𝜆1 𝐿 𝑀𝐼 𝐺, 𝑄1
− Σ𝑙=2
𝐿
𝜆𝑙 𝐿 𝐻𝐶𝑀𝐼 𝐺, 𝑄𝑙
ෞ𝒄 𝟏
loss
ෞ𝒄 𝑳
⋯
ෞ𝒄 𝑳
InfoGAN
𝒄′ 𝑳
⋯
loss
𝐿 𝐻𝐶𝑀𝐼 𝐺, 𝑄𝑙 = 𝔼 𝑐1~𝑃 𝑐1 , 𝑥~𝐺 𝑧,ෞ𝑐 𝐿
log 𝑄 ෝ𝑐𝑙 𝑥
Hierarchical conditional mutual information
10/28

Decision Tree Latent Controller
Ƹ𝑐 𝐿: L-layer DTLC latent variable
(fed into Generator)
c
𝑐1
1,1
𝑐1
1,2
𝑐2
1,1
𝑐2
1,2
𝑐2
2,1
𝑐2
2,2
𝑐3
4,1
𝑐3
4,2
𝑐3
3,1
𝑐3
3,2
𝑐3
2,1
𝑐3
2,2
𝑐3
1,1
𝑐3
1,2
= 𝑐3
1
= 𝑐3
2
= 𝑐3
3
= 𝑐3
4
= 𝑐3
5
= 𝑐3
6
= 𝑐3
7
= 𝑐3
8
Ƹ𝑐𝑙 = 𝑐𝑙
1
, 𝑐𝑙
2
, ⋯ , 𝑐𝑙
𝑁 𝑙
𝑐1
1,1
𝑐2
1,1
𝑐2
1,2
index of
left top: parent id
right top: child id
left bottom: layer id
𝑘𝑙: the number of child node
associated with a parent node
𝑐𝑙+1
𝑛,𝑖
= 𝑐𝑙+1
𝑘 𝑙 𝑛−1 +𝑖
𝑐3
1,1
, 𝑐3
1,2
, ⋯ , 𝑐3
4,2
= 𝑐3
1
, 𝑐3
2
, ⋯ , 𝑐3
8
𝑘𝑙 = 2
11/28

c
𝑐1
1,1
𝑐1
1,2
= 𝑐1
1
= 𝑐1
2
L=1 case (same to InfoGAN)
Ƹ𝑐1 = 𝑐1
1
, 𝑐1
2
12/28

c
𝑐1
1,1
𝑐1
1,2
𝑐2
1,1
𝑐2
1,2
𝑐2
2,1
𝑐2
2,2
= 𝑐2
1
= 𝑐2
2
= 𝑐2
3
= 𝑐2
4
Ƹ𝑐2 = 𝑐2
1
, 𝑐2
2
, 𝑐2
3
, 𝑐2
4
L=2 case
(where num of child node 𝑘2 = 2)
13/28

c
𝑐1
1,1
𝑐1
1,2
𝑐2
1,1
𝑐2
1,2
𝑐2
2,1
𝑐2
2,2
𝑐3
4,1
𝑐3
4,2
𝑐3
3,1
𝑐3
3,2
𝑐3
2,1
𝑐3
2,2
𝑐3
1,1
𝑐3
1,2
= 𝑐3
1
= 𝑐3
2
= 𝑐3
3
= 𝑐3
4
= 𝑐3
5
= 𝑐3
6
= 𝑐3
7
= 𝑐3
8
Ƹ𝑐3 = 𝑐3
1
, 𝑐3
2
, ⋯ , 𝑐3
8
L=3 case
(where num of child node 𝑘3 = 2)
14/28

c
Curriculum Learning for DTLC (𝒍 = 𝟏)
𝑐1
1,1
𝑐1
1,2
0.5
𝒄 𝟏~𝑃 𝒄 𝟏 = 𝐶𝑎𝑡 𝐾 = 𝑘𝑙, 𝑝 =
1
𝑘𝑙
0.5
0.5
0.5
𝒄 𝟐, 𝒄 𝟑: 𝑠𝑒𝑡 𝑏𝑦
1
𝑘2
= 0.5,
1
𝑘3
= 0.5
𝑘2 = 2, 𝑘3 = 2
1. Define the whole structure
2. Sampling the “discrete” latent code
𝑐2
𝑖,𝑗
~𝑃 𝑐2
𝑖,𝑗
|𝑐1
ℎ,𝑖
𝑐3
𝑗,𝑚
~𝑃 𝑐3
𝑗,𝑚
|𝑐2
𝑖,𝑗
ෝ𝑐3
𝑛,𝑖
= 𝑐1
ℎ,𝑖
⋅ 𝑐2
𝑖,𝑗
⋅ 𝑐3
𝑗,𝑚
e.g.) ෝ𝑐3
2,1
= 𝑐1
1,1
⋅ 𝑐2
1,2
⋅ 𝑐3
2,1
= 0.25 ⋅ 𝑐1
1,1
ෝ𝑐3
2,1
3. feed ෞ𝒄 𝑳 = ෞ𝒄 𝟑 into Generator
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
15/28

c
Case: Continuous latent code
𝑐1
1,1
𝑐1
1,2
𝒄 𝟏~𝑃 𝒄 𝟏 = 𝑈𝑛𝑖𝑓 −1,1
𝒄 𝟐, 𝒄 𝟑: set by 0???
1. Define the whole structure
2. Sampling the “continuous” latent code
𝑐2
𝑖,𝑗
~𝑃 𝑐2
𝑖,𝑗
|𝑐1
ℎ,𝑖
𝑐3
𝑗,𝑚
~𝑃 𝑐3
𝑗,𝑚
|𝑐2
𝑖,𝑗
ෝ𝑐3
𝑛,𝑖
= 𝑐1
ℎ,𝑖
⋅ 𝑐2
𝑖,𝑗
⋅ 𝑐3
𝑗,𝑚
e.g.) ෝ𝑐3
2,1
= 𝑐1
1,1
⋅ 𝑐2
1,2
⋅ 𝑐3
2,1
ෝ𝑐3
2,1
0
0
0
0
0
0
0
0
0
0
0
0
16/28

c
Curriculum Learning for DTLC (𝒍 = 𝟐)
𝑐1
1,1
𝑐1
1,2
𝒄 𝟏~𝑃 𝒄 𝟏 = 𝐶𝑎𝑡 𝐾 = 𝑘1, 𝑝 =
1
𝑘1
𝒄 𝟑: 𝑠𝑒𝑡 𝑏𝑦
1
𝑘3
= 0.5
𝑘3 = 2
1. Sampling the latent code
𝑐3
𝑗,𝑚
~𝑃 𝑐3
𝑗,𝑚
|𝑐2
𝑖,𝑗
ෝ𝑐3
𝑛,𝑖
= 𝑐1
ℎ,𝑖
⋅ 𝑐2
𝑖,𝑗
⋅ 𝑐3
𝑗,𝑚
e.g.) ෝ𝑐3
2,1
= 𝑐1
1,1
⋅ 𝑐2
1,2
⋅ 𝑐3
2,1
= 0.5 ⋅ 𝑐1
1,1
⋅ 𝑐2
1,2
ෝ𝑐3
2,1
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
𝑘1, 𝑘2 = 2
𝑐2
1,1
𝑐2
1,2
𝑐2
2,1
𝑐2
2,2
𝑐2
𝑗,𝑚
~𝑃 𝑐2
𝑗,𝑚
|𝑐1
𝑖,𝑗
= 𝐶𝑎𝑡 𝐾 = 𝑘2, 𝑝 =
1
𝑘2
17/28

c
Curriculum Learning for DTLC (𝒍 = 𝟑)
𝑐1
1,1
𝑐1
1,2
ෝ𝑐3
2,1
𝑐2
1,1
𝑐2
1,2
𝑐2
2,1
𝑐2
2,2
𝑐3
4,1
𝑐3
4,2
𝑐3
3,1
𝑐3
3,2
𝑐3
2,1
𝑐3
2,2
𝑐3
1,1
𝑐3
1,2
𝒄 𝟏~𝑃 𝒄 𝟏 = 𝐶𝑎𝑡 𝐾 = 𝑘1, 𝑝 =
1
𝑘1
1. Sampling the latent code
ෝ𝑐3
𝑛,𝑖
= 𝑐1
ℎ,𝑖
⋅ 𝑐2
𝑖,𝑗
⋅ 𝑐3
𝑗,𝑚
e.g.) ෝ𝑐3
2,1
= 𝑐1
1,1
⋅ 𝑐2
1,2
⋅ 𝑐3
2,1
𝑐2
𝑗,𝑚
~𝑃 𝑐2
𝑗,𝑚
|𝑐1
𝑖,𝑗
= 𝐶𝑎𝑡 𝐾 = 𝑘2, 𝑝 =
1
𝑘2
𝑘1, 𝑘2, 𝑘3 = 2
𝑐3
𝑚,𝑛
~𝑃 𝑐3
𝑚,𝑛
|𝑐2
𝑗,𝑚
= 𝐶𝑎𝑡 𝐾 = 𝑘3, 𝑝 =
1
𝑘3
18/28

Experimental settings
task setting MNIST CIFAR-10 Tiny
ImageNet
３D Face CelebA
DTLC-GAN
qualitative, quantitative
evaluation
〇〇
DTLC-GAN-GP 〇〇
DTLC-GAN
with continuous code
〇
image retrieval task 〇
MNIST train:test=60,000:10,000
CIFAR-10 train:test=5,000:1,000 per class, 10 class
Tiny ImageNet ?
3D Faces ?
CelebA train:test=180,000:20,000
19/28

MNIST comparison with InfoGAN
𝐷𝑇𝐿𝐶2
− 𝐺𝐴𝑁
2x10=20
(10 but
unlabeled)
Baseline:
𝐼𝑛𝑓𝑜𝐺𝐴𝑁1×20
𝑐1
1
~𝐶𝑎𝑡 𝐾 = 20, 𝑝 = 0.05
𝐼𝑛𝑓𝑜𝐺𝐴𝑁2×20
𝑐1
1
, 𝑐1
2
~𝐶𝑎𝑡 𝐾 = 10, 𝑝 = 0.1
20
10
10
𝑘1 = 10
𝑘2 = 2
⋯
⋯
20/28

MNIST comparison with InfoGAN
⋯
⋯
⋯
when
a category 1(ON),
others 0(OFF)
1 0 0
fixed noise
in each row
DTLC-GAN
has
l=1: digit class
l=2: font style
InfoGAN failed to capture digit class
21/28

Image quality comparison
Adversarial Accuracy: 2 classifier accuracy, trained by generated or real images
Adversarial Divergence: KL divergence between the 2 classifiers’ output distro.
Image quality of DTLC-GAN is not worse than other methods
22/28

Effectiveness of curriculum with CIFAR-10
Weakly supervised setting: first layer is composed of known label
(The nodes for known labels are fixed)
Evaluation metric:
structural similarity(SSIM) between 2 images from
different latent code with 50,000 random sampled pairs
(any previous layer and noise value are fixed)
SSIM would be higher and higher when
the evaluated layer is lower and lower
23/28

w/o curriculum result
The latent code of the all layer have low SSIM
We can not find hierarchical structure...
24/28

w/ curriculum result (proposed)
Similarity becomes larger in lower-layer codes!
Latent codes are well hierarchically-organized!
start l=3 training
start l=4 training
25/28

Continuous Codes result with 3D Faces
𝑘1 = 5 (𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒)
𝑘2 = 1 (𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠)
-1 1
Layer 2 expresses the angle of face
26/28

Image retrieval with CelebA
top 3 images retrieved by L2 distance of predicted label 𝑐1, ෝ𝑐2, ෝ𝑐3 between
query and candidate images in database
Using lower and lower predicted label to calculate L2 distance,
more and more suitable image appeared!?
27/28

Conclusion
DTLC-GAN can be seen as an extension of InfoGAN
 Latent code becomes hierarchical structure
 HCMI loss and curriculum learning helps to obtain
interpretable (disentangled) representation
 Generated results are as good as other GAN methods
My COMMENTs and QUESTIONs
 It would be useful that DTLC-GAN can deal with big change
and small change separately and in a few stages
 The tree structure has to be defined in advance
• Is progressive growing style possible?
28/28

DTLC-GAN

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DTLC-GAN

Similar to DTLC-GAN (20)

More from Shinagawa Seitaro

More from Shinagawa Seitaro (9)

Recently uploaded

Recently uploaded (20)

DTLC-GAN