A toy example - Word2Vec using torch

Posted on June 6, 2016

lua, torch, word2Vec, toy example

Word2Vec has gain a lot of popularity. The beauty of this model is, word vectors are positioned in the vector space such that words that share common contexts in the corpus are located in close proximity to one another in the space.

Basically, in continuous bag of words (CBOW) model the idea is to predict the center word given the context word. Below is a toy code for word2vec model for learnign word embeddings.

-- A toy code demonstrating word2vec (CBOW)
require 'torch'
require 'nn'


--[[
    Dataset:
    I like cricket
    I play flute
--]]--

-- construct vocabulary map
vocab = {}
vocab['I'] = 1
vocab['like'] = 2
vocab['cricket'] = 3
vocab['play'] = 4
vocab['flute'] = 5

-- make dataset
dataset = {}
input1 = torch.Tensor{vocab['I'], vocab['cricket']}
output1 = torch.Tensor{vocab['like']}
input2 = torch.Tensor{vocab['I'], vocab['flute']}
output2 = torch.Tensor{vocab['play']}
dataset[1] = {input1, output1}
dataset[2] = {input2, output2}
function dataset:size() return 2 end -- defines size of dataset which is a requirement

-- define constants
vocabSize = 5
wordEmbeddingSize = 10
learningRate = 0.01
nepocs = 10

-- model creation
model = nn.Sequential()
model:add(nn.LookupTable(vocabSize, wordEmbeddingSize))
model:add(nn.Mean()) -- averaging words in the input; loses the word order
model:add(nn.Linear(wordEmbeddingSize, vocabSize)) -- project to |v| size representation
model:add(nn.LogSoftMax()) -- probability of output over input; represents probability distribution
print(model)

-- define criterion function (loss function)
criterion = nn.ClassNLLCriterion()

-- define trainer
trainer = nn.StochasticGradient(model, criterion)
trainer.learningRate = learningRate
trainer.maxIteration = nepocs

print("Word Lookup before learning")
print(model.modules[1].weight)

-- train model
trainer:train(dataset)

-- Get word embeddings
print(model.modules[1].weight)