Sunday, November 25, 2018

Friday, September 28, 2018

language generation in keras

https://machinelearningmastery.com/how-to-develop-a-word-level-neural-language-model-in-keras/

Saturday, September 15, 2018

Language to language translation

https://medium.com/@ageitgey/machine-learning-is-fun-part-5-language-translation-with-deep-learning-and-the-magic-of-sequences-2ace0acca0aa

Thursday, September 13, 2018

Word2vec visualization

link
https://labsblog.f-secure.com/2018/01/30/nlp-analysis-of-tweets-using-word2vec-and-t-sne/

Better one for skip-gram
http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

Wednesday, September 12, 2018

Clustering Evaluating Methodologies

Purity
Normalized Mutual Information
Rand
Jaccard coefficient
Fowlkes and Mallows index
Dunn index
Silhouette Coefficient
Adjusted Rand index
Mutual Information based scores
Calinski-Harabaz Index
Homogeneity, completeness and V-measure
Fowlkes-Mallows scores

Clustering Evaluation

https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

Python NLTK Resource 'tokenizers/punkt/english.pickle' not found

import nltk
nltk.download('punkt')

Thursday, September 6, 2018

Monday, August 27, 2018

How to Run Parallel Data Analysis in Python using Dask Dataframes

Here is the link

https://towardsdatascience.com/trying-out-dask-dataframes-in-python-for-fast-data-analysis-in-parallel-aa960c18a915

Wednesday, August 15, 2018

Python dataframe remove non ascii words

df.text.replace({r'[^\x00-\x7F]+':''}, regex=True, inplace=True)
text is the column name in dataframe df

Monday, May 21, 2018

Figures settings in matplotlib

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
from matplotlib.ticker import NullFormatter
#reading the data
os.chdir('E:/backup folder/paper')
test_data = pd.read_csv('keygenerationtime.csv')
print(test_data['Users'])
Users= test_data['Users']
Time=test_data['Time']
N = len(Users)
x2 = np.arange(N)
print(N)
plt.plot(x2, Time,linewidth=2,color='black')
plt.xticks(x2, Users)
plt.ylabel('Time(secs)',fontsize=14)
plt.xlabel('Number of Users',fontsize=14)
plt.subplots_adjust(bottom=0.2)
fig = plt.gcf()
fig.set_size_inches(5, 5)
fig.savefig('Fig3.png', dpi=200)
plt.show()

#----------------------------------------------

test_data = pd.read_csv('encryptiontime.csv')
print(test_data['File'])
File= test_data['File']
Time=test_data['Time']
N = len(File)
x2 = np.arange(N)
#print(N)
plt.bar(x2, Time,color='black',width=0.5)
plt.xticks(x2, File)
plt.ylabel('Time(secs)',fontsize=14)
plt.xlabel('File Size (KB)',fontsize=14)
plt.subplots_adjust(bottom=0.2)
fig = plt.gcf()
fig.set_size_inches(5, 5)
fig.savefig('Fig4.png', dpi=200)
plt.show()

#--------------------------------------------
test_data = pd.read_csv('decryptiontime.csv')
print(test_data['File'])
File= test_data['File']
Time=test_data['Time']
N = len(File)
x2 = np.arange(N)
#print(N)
plt.bar(x2, Time,color='black',width=0.5)
plt.xticks(x2, File)
plt.ylabel('Time(secs)',fontsize=14)
plt.xlabel('File Size (KB)',fontsize=14)
plt.subplots_adjust(bottom=0.2)
fig = plt.gcf()
fig.set_size_inches(5, 5)
fig.savefig('Fig5.png', dpi=200)
plt.show()

#------------------------------------------
test_data = pd.read_csv('turnaround.csv')
print(test_data)
File= test_data['File']
Ses_up=test_data['a']
up1=test_data['b']
up2=test_data['c']
Ses_down=test_data['d']
down1=test_data['e']
down2=test_data['f']

patterns = [ "/" , "\\" , "|" , "-" , "+" , "x", "o", "O", ".", "*" ]
width = 0.15

pos = list(range(len(Ses_up)))
fig, ax = plt.subplots(figsize=(5,5))
bar1=plt.bar(pos, Ses_up, width,
alpha=1,
color='black',
hatch="/", # this one defines the fill pattern
label=File[0])
plt.bar([p + width for p in pos], up1, width,
alpha=0.5,
color='w',
hatch="\\",
label=File[1])
plt.bar([p + width*2 for p in pos], up2, width,
alpha=0.5,
color='k',
hatch='-',
label=File[2])

plt.bar([p + width*3 for p in pos], Ses_down, width,
alpha=0.7,
color='black',hatch="//",
label=File[3])
plt.bar([p + width*4 for p in pos], down1, width,
alpha=0.5,
color='w',
hatch="...",
label=File[4])
plt.bar([p + width*5 for p in pos], down2, width,
alpha=0.5,
color='white',
hatch="///",
label=File[5])


ax.set_ylabel('Time (secs)',fontsize=14)
ax.set_xlabel('File Size (KB) ',fontsize=14)
#ax.set_title('Grouped bar plot')
ax.set_xticks([p + 1.5 * width for p in pos])
ax.set_xticklabels(File)
plt.legend(['SeSPHR T-up', '[14] T-up', '[27] T-up', 'SeSPHR T-down', '[14] T-down', '[27] T-down'], loc='upper left')
fig.savefig('Fig6.png', dpi=200)

plt.show()

Wednesday, April 4, 2018

Softmax vs Sigmoid function usage in last dense layer

If we require that sum of probabilities of all the classes should be equal to 1 then we use softmax but if we require separate and individual probabilities, such as in the case of multi-label classification we use sigmoid

https://www.depends-on-the-definition.com/classifying-genres-of-movies-by-looking-at-the-poster-a-neural-approach/