# NNを使​​用したカテゴリ変数削減

``````import random
import pandas
import numpy as np
import tensorflow as tf

from tensorflow.contrib import layers
from tensorflow.contrib import learn
from __future__ import print_function

from sklearn.preprocessing import LabelEncoder
``````

My dataset looks like the following. It's has 2 independent variable ('X1' & 'X2')and 1 dependent variable ('lable'). 'X2' is the categorical variable. I want to create an embedding vector for this variable and run the simple linear regression to predict 'label'using Tensorflow. I could use any other method. But since linear regression is easiest to understand, I'm trying that.

``````df = pd.DataFrame({'X1': np.array(["A","A","B","C","B","C","B","C","C","B",
"A","B","A","C","A","A","C"]),'X2': np.array([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1]),
'label': np.array([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
2.827,3.465,1.65,2.904,2.42,2.94,1.3])})
``````

# 変数 'X1'に対しては、レベルを作成しています。

``````encoder = LabelEncoder()
encoder.fit(df.X1.values)
X = encoder.transform(df.X1.values)
``````

# 従属変数リストの再作成。

``````y = np.asarray([1.7,2.76,2.09,3.19,1.694,1.573,3.366,2.596,2.53,1.221,
2.827,3.465,1.65,2.904,2.42,2.94,1.3])
``````

# ハイパーパラメータの設定

``````training_epochs = 5
learning_rate = 1e-3
cardinality = len(np.unique(X))
embedding_size = 2
input_X_size = 1
n_hidden = 10
``````

# 変数の設定：

``````embeddings = tf.Variable(tf.random_uniform([cardinality, embedding_size], -1.0, 1.0))

h = tf.Variable(tf.truncated_normal((embedding_size + len(df.X1), n_hidden), stddev=0.1))

W_out = tf.get_variable(name='out_w', shape=[n_hidden],
initializer=tf.contrib.layers.xavier_initializer())
``````

# 埋め込み：

``````embedded_chars = tf.nn.embedding_lookup(embeddings, x)
embedded_chars = tf.reshape(embedded_chars, [-1])
embedded_chars= embedded_chars + np.array([3.3,4.4,5.5,6.71,6.93,4.168,9.779,6.182,7.59,2.167,
7.042,10.791,5.313,7.997,5.654,9.27,3.1])
``````

# 隠しレイヤーを乗算する：

``````layer_1 = tf.matmul(embedded_chars,h)
layer_1 = tf.nn.relu(layer_1)
out_layer = tf.matmul(layer_1, W_out)
``````

＃損失とオプティマイザを定義する

``````cost = tf.reduce_sum(tf.pow(out_layer-y, 2))/(2*n_samples)
``````

# グラフを実行する

init = tf.global_variables_initializer（）

``````# Launch the graph
with tf.Session() as sess:
sess.run(init)

for epoch in range(training_epochs):
avg_cost = 0.

_, c = sess.run([optimizer, cost],
feed_dict={x: X, y: Y})
print("Ran without Error")
``````

コードの実行中に、次のエラーが発生します。

ValueError：Shapeはランク2である必要がありますが、 'MatMul_1'ではランク1です（op：   '[17]、[19,10]。

ありがとうございました！

1

## 1 答え

あなたは行列を掛けているので、次元が一致しなければなりません。

``````embedded_chars = tf.reshape(embedded_chars, [-1])
``````

1