May 16, 2013

Compare similarity of two strings in terms of percentage

In this post, I am showing you how to check the similarity between two string in terms of percentage which is acceptable by end user. To check similarity we have a formula which helps us to find match in %. In below example, I am showing two pattern of string:

  • One is to use string array
  • Other one is to user string sentence and split it to make array
The formula to check similarities are:

Similarity (%) = 100 * (commonItems * 2) / (total item in string1 + total item in string2)

Lets start coding, here I show the code logic both in C# and VB.Net.

------
C# :
------
----------------------
String as Array:
----------------------
// The formula to calculate similarity between two string is:
// Similarity (%) = 100 * (commonItems * 2) / (total item in string1 + total item in string2)

int list1Length, list2Length, commonItemLength;

var list1 = new string[] { "1", "2", "3", "4", "5", "6" };
var list2 = new string[] { "2", "3", "4" };
var commonList = list1.Intersect(list2);

list1Length = list1.Length;
list2Length = list2.Length;
commonItemLength = commonList.Count();

double similarity = 100 * (commonItemLength * 2) / (list1Length + list2Length);
Console.WriteLine("Similarity b/w strings: " + similarity + "%");
----------------------------
String as Sentence:
----------------------------
// The formula to calculate similarity between two string is:
// Similarity (%) = 100 * (commonItems * 2) / (total item in string1 + total item in string2)

int list1Length, list2Length, commonItemLength;

string string1 = "jedan, dva, tri, cetri, PET, sest45 sedamytyty osam";
string string2 = "dva, cetri, pet, sedam88 dvadeset osamdeset";

var string1Split = string1.Split(',');
var string2Split = string2.Split(',');

var commonList = string1Split.Intersect(string2Split);

list1Length = string1Split.Length;
list2Length = string2Split.Length;
commonItemLength = commonList.Count();

double similarity = 100 * (commonItemLength * 2) / (list1Length + list2Length);
Console.WriteLine("Similarity b/w strings: " + similarity + "%");

--------------
VB.NET :
--------------
----------------------
String as Array:
----------------------
' The formula to calculate similarity between two string is:
' Similarity (%) = 100 * (commonItems * 2) / (total item in string1 + total item in string2)

Dim list1Length As Integer, list2Length As Integer, commonItemLength As Integer

Dim list1 = New String() {"1", "2", "3", "4", "5", "6"}
Dim list2 = New String() {"2", "3", "4"}
Dim commonList = list1.Intersect(list2)

list1Length = list1.Length
list2Length = list2.Length
commonItemLength = commonList.Count()

Dim similarity As Double = 100 * (commonItemLength * 2) / (list1Length + list2Length)
Console.WriteLine("Similarity b/w strings: " + similarity + "%")
----------------------------
String as Sentence:
----------------------------
' The formula to calculate similarity between two string is:
' Similarity (%) = 100 * (commonItems * 2) / (total item in string1 + total item in string2)

Dim list1Length As Integer, list2Length As Integer, commonItemLength As Integer

Dim string1 As String = "jedan, dva, tri, cetri, PET, sest45 sedamytyty osam"
Dim string2 As String = "dva, cetri, pet, sedam88 dvadeset osamdeset"

Dim string1Split = string1.Split(","C)
Dim string2Split = string2.Split(","C)

Dim commonList = string1Split.Intersect(string2Split)

list1Length = string1Split.Length
list2Length = string2Split.Length
commonItemLength = commonList.Count()

Dim similarity As Double = 100 * (commonItemLength * 2) / (list1Length + list2Length)
Console.WriteLine("Similarity b/w strings: " + similarity + "%")