Friday, August 18, 2017

Multi-class vs Multi-label classification

Got nice definitions from the link
https://stats.stackexchange.com/questions/11859/what-is-the-difference-between-multiclass-and-multilabel-problem/168945#168945

Multiclass classification means a classification task with more than two classes; e.g., classify a set of images of fruits which may be oranges, apples, or pears. Multiclass classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time.
Multilabel classification assigns to each sample a set of target labels. This can be thought as predicting properties of a data-point that are not mutually exclusive, such as topics that are relevant for a document. A text might be about any of religion, politics, finance or education at the same time or none of these.

Sunday, August 6, 2017

python: removing non ascii character from text

yourstring = yourstring.encode('ascii', 'ignore').decode('ascii')

python: newspaper articles collection

import newspaper
et_paper = newspaper.build('http://cnn.com/',memoize_articles=False)
#for article in et_paper.articles:
# print(article.url)
print(et_paper.size())
for category in et_paper.category_urls():
print(category)


Wednesday, August 2, 2017

Python: selection a column based on the values in another column and saving it into CSV file

Example:
index, column1_name,column2_name
1, asdb, 2
2, asdfsf, 1
3, asdasdfasd, 1
4, dgfg, 2


Results will be
asdb
dgfg




import numpy as np
import pandas as pd
#reading the data
dataframe = pd.read_csv('labeled_data.csv')

for i, col in enumerate(dataframe.columns):
    print(i, col)


newData=dataframe .loc[dataframe ['column2_name'] == 2]
newData2 = newData['column1_name']
newData2.to_csv("foo2.csv",index=False )

Python: reading the headers of the csv file

import numpy as np
import pandas as pd
#reading the data
hate_speech = pd.read_csv('labeled_data.csv')

for i, col in enumerate(hate_speech.columns):
    print(i, col)