## Friday, April 19, 2019

## Tuesday, April 16, 2019

### Convolution(1)

- How to compute the size of the Conv layer?

Supoose:

- S:= # Stride
- N := output size
- F:= Filter size
- P:=Padding size

Think about all these

**heads**
$W+2P=(F-1)+(N-1)*S+1$

which gives

$N=\frac{W+2P-F}{S}+1$

## Sunday, April 7, 2019

## Friday, April 5, 2019

### Graph Convolution Neural Nets? Are you out of your mind? How can you do this on a graph?

Recently I am working on reproducing the paper Graph Convolutional Matrix Completion . A natural question to ask is what is

Convolution at first is defined for continuous space, later maybe signal processing guy has extended this idea to discrete space(say image processing, the convolutional kernel you used in convolutional neural network). But what about graphs?

The first idea may appeared in Joan Bruna's this paper. Then extended by Defferrard et al(2016).

The version I encountered is Thomas N. Kipf et al.(2017), which I will introduce here.

The basic setting is here:

where $U$ is the matrix of eigenvectors of the

$g_\theta'(\Lambda) \approx \sum_{k=0}^K \theta_k^{'}T_k(\tilde{\Lambda})$

where $T_k(x)$ is the Chebyshev polynomials. $T_0(x)=1, T_1(x)=x, T_k(x)=2xT_{k-1}(x)-T_{k-2}(x) $

$\tilde{\Lambda}=\frac{2}{\lambda_{max}} \Lambda - I_N$, $\lambda_{max}$ is the largest eigenvalue of $L$

They limit the $K=1$, and assume $\lambda_{max}\approx2$

Now:

$g_{\theta'}*x\approx \theta_0'x+\theta_1^{'}(L-I_N)x=\theta_0^{'}x-\theta_1^{'}D^{-\frac{1}{2}}AD^{-\frac{1}{2}}x$

Further, they assume $\theta=\theta_0^{'}=-\theta_1^{'}$

Apply

**convolution**on graph.Convolution at first is defined for continuous space, later maybe signal processing guy has extended this idea to discrete space(say image processing, the convolutional kernel you used in convolutional neural network). But what about graphs?

The first idea may appeared in Joan Bruna's this paper. Then extended by Defferrard et al(2016).

The version I encountered is Thomas N. Kipf et al.(2017), which I will introduce here.

The basic setting is here:

- $x\in R^N$
- Convolutional kernel/filter $g_\theta=diag(\theta), \theta\in R^N$ lies in the Fourier domain

where $U$ is the matrix of eigenvectors of the

**normalized graph Laplacian**$L=I_N-D^{-1/2}AD^{-1/2}=U \Lambda U^T$**They are not satisfied, yet. Computing $g_\theta$ is expensive, so they use an approximation derived in Hammond et al.(2011), where:**

$g_\theta'(\Lambda) \approx \sum_{k=0}^K \theta_k^{'}T_k(\tilde{\Lambda})$

where $T_k(x)$ is the Chebyshev polynomials. $T_0(x)=1, T_1(x)=x, T_k(x)=2xT_{k-1}(x)-T_{k-2}(x) $

$\tilde{\Lambda}=\frac{2}{\lambda_{max}} \Lambda - I_N$, $\lambda_{max}$ is the largest eigenvalue of $L$

They limit the $K=1$, and assume $\lambda_{max}\approx2$

Now:

$g_{\theta'}*x\approx \theta_0'x+\theta_1^{'}(L-I_N)x=\theta_0^{'}x-\theta_1^{'}D^{-\frac{1}{2}}AD^{-\frac{1}{2}}x$

Further, they assume $\theta=\theta_0^{'}=-\theta_1^{'}$

Apply

**re-normalization trick**to prevent exploding/vanishing gradients problem:- $\tilde{A}=A+I_N$
- $\tilde{D}_{ii}=\sum_j \tilde{A}_{ij}$
- $I_N+D^{-\frac{1}{2}}AD^{-\frac{1}{2}} \rightarrow \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}}$

**Matrix representation:**

Suppose $X\in R^{N \times C} $ is the signal/input, where C is the dim of input channels/features. $\Theta\in R^{C \times F }$ a matrix of filter parameters. $Z \in R^{N \times F}$ is the convolved

input matrix.

$Z=\tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} X \Theta $

Subscribe to:
Posts (Atom)