## Syntagmatic Word Relationships: Conditional Entropy and Mutual Information

### Introduction

As I mentioned in my previous post, there are two types of word associations, paradigmatic and syntagmatic. For this post, I will be discussing syntagmatic relationships and how to apply concepts from information theory to mine this type of relationship. We will concentrate of the idea of entropy (specifically conditional entropy) and mutual information.

### Conditional Entropy

For two words, word 1 and word 2, conditional entropy provides a metric for the predictability of the likelihood of the occurrence of word 1 given that we know about the presence or absence of word 2. Conditional entropy is calculated by finding the sum of the joint probability of the occurrence and absence word 1 and word 2, multiplies by the log of the conditional probability of the occurrence or absence of word 1 and word 2. See equation below:

Conditional Entropy Formula: H(W1|W2) = Sum(- P(W1=1,W2=1)*LOG2[P(W1=1|W2=1)]-P(W1=0,W2=1)*LOG2[P(W1=0|W2=1)] – P(W1=1,W2=0)*LOG2[P(W1=1|W2=0)]- P(W1=0,W2=0)*LOG2[P(W1=0|W2=0)])

Where: H(W1|W2) <= H(W1)

Conditional Probability: P(W1|W2) = P(W1 union W2) / P(W2)

Joint Probability of 2 discrete random variables: P(W1 union W2)

Conditional entropy could never exceed the entropy of word 1. Conditional entropy is less than or equal to the entropy of word 1. In other words, the knowledge of the second word can make the occurrence of word 1 more unpredictable.

### Some definitions:

- H(W1|W1) denotes the conditional entropy of word 1 given word 2
- W1=x and W2=y denotes the occurrence or absence of word 1 and word 2 (x is an element of {0,1}) (y is an element of {0,1}). The value 0 represents the absence of the word
- P(W1=x|W2=y) denotes the conditional probability of occurrence of word 1 given the occurrence of word 2
- P(W1=x,W2=y) denotes the joint probability of occurrence of word 1 given the occurrence of word 2

### Mutual Information

Mutual information is used the measure the amount of reduction in randomness realized as a result of knowing occurrence or absence of the second word. This can be used to compare the strength of relations between any set of words in the document vocabulary.

Mutual Information = I(W1;W2) = H(W1) – H(W1|W2)