Commit eec607dc authored by manetta's avatar manetta
Browse files

Merge branch 'master' of gitlab.constantvzw.org:algolit/algolit

parents 797d90bf 43d7addb
Part I
Chapter I
It was a bright cold day in April, and the clocks were striking thirteen. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of Victory Mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him. The hallway smelt of boiled cabbage and old rag mats. At one end of it a coloured poster, too large for indoor display, had been tacked to the wall. It depicted simply an enormous face, more than a metre wide: the face of a man of about forty-five, with a heavy black moustache and ruggedly handsome features. Winston made for the stairs. It was no use trying the lift. Even at the best of times it was seldom working, and at present the electric current was cut off during daylight hours. It was part of the economy drive in preparation for Hate Week. The flat was seven flights up, and Winston, who was thirty-nine and had a varicose ulcer above his right ankle, went slowly, resting several times on the way. On each landing, opposite the lift shaft, the poster with the enormous face gazed from the wall. It was one of those pictures which are so contrived that the eyes follow you about when you move. BIG BROTHER IS WATCHING YOU, the caption beneath it ran.
Inside the flat a fruity voice was reading out a list of figures which had something to do with the production of pig-iron. The voice came from an oblong metal plaque like a dulled mirror which formed part of the surface of the right-hand wall. Winston turned a switch and the voice sank somewhat, though the words were still distinguishable. The instrument (the telescreen, it was called) could be dimmed, but there was no way of shutting it off completely. He moved over to the window: a smallish, frail figure, the meagreness of his body merely emphasized by the blue overalls which were the uniform of the Party. His hair was very fair, his face naturally sanguine, his skin roughened by coarse soap and blunt razor blades and the cold of the winter that had just ended.
Outside, even through the shut window-pane, the world looked cold. Down in the street little eddies of wind were whirling dust and torn paper into spirals, and though the sun was shining and the sky a harsh blue, there seemed to be no colour in anything, except the posters that were plastered everywhere. The black-moustachio’d face gazed down from every commanding corner. There was one on the house-front immediately opposite. BIG BROTHER IS WATCHING YOU, the caption said, while the dark eyes looked deep into Winston’s own. Down at street level another poster, torn at one corner, flapped fitfully in the wind, alternately covering and uncovering the single word INGSOC. In the far distance a helicopter skimmed down between the roofs, hovered for an instant like a bluebottle, and darted away again with a curving flight. It was the police patrol, snooping into people's windows. The patrols did not matter, however. Only the Thought Police mattered. Behind Winston's back the voice from the telescreen was still babbling away about pig-iron and the overfulfilment of the Ninth Three-Year Plan. The telescreen received and transmitted simultaneously. Any sound that Winston made, above the level of a very low whisper, would be picked up by it; moreover, so long as he remained within the field of vision which the metal plaque commanded, he could be seen as well as heard. There was of course no way of knowing whether you were being watched at any given moment. How often, or on what system, the Thought Police plugged in on any individual wire was guesswork. It was even conceivable that they watched everybody all the time. But at any rate they could plug in your wire whenever they wanted to. You had to live - did live, from habit that became instinct - in the assumption that every sound you made was overheard, and, except in darkness, every movement scrutinised.
Winston kept his back turned to the telescreen. It was safer; though, as he well knew, even a back can be revealing. A kilometre away the Ministry of Truth, his place of work, towered vast and white above the grimy landscape. This, he thought with a sort of vague distaste - this was London, chief city of Airstrip One, itself the third most populous of the provinces of Oceania. He tried to squeeze out some childhood memory that should tell him whether London had always been quite like this. Were there always these vistas of rotting nineteenth-century houses, their sides shored up with baulks of timber, their windows patched with cardboard and their roofs with corrugated iron, their crazy garden walls sagging in all directions? And the bombed sites where the plaster dust swirled in the air and the willowherb straggled over the heaps of rubble; and the places where the bombs had cleared a larger patch and there had sprung up sordid colonies of wooden dwellings like chicken-houses? But it was no use, he could not remember: nothing remained of his childhood except a series of bright-lit tableaux, occurring against no background and mostly unintelligible.
-1.004391
I was guiltless, but I had indeed drawn down a horrible curse upon my head, as mortal as that of crime.
-1.006065
We sat late.
-1.007096
said the old man.
-1.028012
The agonies of remorse poison the luxury there is otherwise sometimes found in indulging the excess of grief.
-1.036734
Blasted as thou wert, my agony was still superior to thine, for the bitter sting of remorse will not cease to rankle in my wounds until death shall close them forever.
-1.045306
Man, you shall repent of the injuries you inflict.
-1.050177
In one corner, near a small fire, sat an old man, leaning his head on his hands in a disconsolate attitude.
-1.055839
I saw him on the point of repeating his blow, when, overcome by pain and anguish, I quitted the cottage, and in the general tumult escaped unperceived to my hovel.
-1.060125
Her mild eyes seemed incapable of any severity or guile, and yet she has committed a murder.
-1.063721
Never did I behold a vision so horrible as his face, of such loathsome yet appalling hideousness.
-1.085048
But since the murderer has been discovered--The murderer discovered!
-1.085386
I may die, but first you, my tyrant and tormentor, shall curse the sun that gazes on your misery.
-1.085582
More miserable than man ever was before, why did I not sink into forgetfulness and rest?
-1.090738
Yet I am certainly unjust.
-1.097396
This sound disturbed an old woman who was sleeping in a chair beside me.
-1.099525
Chapter 12 I lay on my straw, but I could not sleep.
-1.109658
Saying this, he suddenly quitted me, fearful, perhaps, of any change in my sentiments.
-1.136478
The poor victim, who on the morrow was to pass the awful boundary between life and death, felt not, as I did, such deep and bitter agony.
-1.1452
Alas!
-1.1452
Alas!
-1.1452
Alas!
-1.147289
I paused.
-1.148148
Was there no injustice in this?
-1.148477
Yet mine shall not be the submission of abject slavery.
-1.154061
Why did you form a monster so hideous that even YOU turned from me in disgust?
-1.154612
A sister or a brother can never, unless indeed such symptoms have been shown early, suspect the other of fraud or false dealing, when another friend, however strongly he may be attached, may, in spite of himself, be contemplated with suspicion.
-1.155108
My swelling heart involuntarily pours itself out thus.
-1.164905
I am the assassin of those most innocent victims; they died by my machinations.
-1.172458
I shall no longer see the sun or stars or feel the winds play on my cheeks.
-1.181726
In a thousand spots the traces of the winter avalanche may be perceived, where trees lie broken and strewed on the ground, some entirely destroyed, others bent, leaning upon the jutting rocks of the mountain or transversely upon other trees.
-1.198924
Or whither does your senseless curiosity lead you?
-1.20918
The poor that stopped at their door were never driven away.
-1.215336
Mans yesterday may neer be like his morrow; Nought may endure but mutability!
-1.230786
Devil, cease; and do not poison the air with these sounds of malice.
-1.240057
Elizabeth had caught the scarlet fever; her illness was severe, and she was in the greatest danger.
-1.240959
I writhed under his words, yet dared not exhibit the pain I felt.
-1.245818
I passed the night wretchedly.
-1.248401
His jaws opened, and he muttered some inarticulate sounds, while a grin wrinkled his cheeks.
-1.253795
Felix trembled violently as he said this.
-1.256616
I remained silent.
-1.263614
I was a poor, helpless, miserable wretch; I knew, and could distinguish, nothing; but feeling pain invade me on all sides, I sat down and wept.
-1.282275
He became the victim of its weakness.
-1.285592
Fear overcame me; I dared no advance, dreading a thousand nameless evils that made me tremble, although I was unable to define them.
-1.302078
Fiend that thou art!
-1.311546
Let the cursed and hellish monster drink deep of agony; let him feel the despair that now torments me.
-1.3167
The wet wood which I had placed near the heat dried and itself became inflamed.
-1.330364
A frightful selfishness hurried me on, while my heart was poisoned with remorse.
-1.335277
I was alone; none were near me to dissipate the gloom and relieve me from the sickening oppression of the most terrible reveries.
-1.343951
I sat down, and a silence ensued.
-1.36678
As my sickness quitted me, I was absorbed by a gloomy and black melancholy that nothing could dissipate.
-1.370631
I then paused, and a cold shivering came over me.
-1.385116
I do not fear to die, she said; that pang is past.
-1.388903
Tears, unrestrained, fell from my brothers eyes; a sense of mortal agony crept over my frame.
-1.402591
I shall die.
-1.404472
His limbs were nearly frozen, and his body dreadfully emaciated by fatigue and suffering.
-1.410327
Cursed, cursed be the fiend that brought misery on his grey hairs and doomed him to waste in wretchedness!
-1.412762
What do these sounds portend?
-1.41817
Geneva, March 18, 17--.
-1.425913
Again do I vow vengeance; again do I devote thee, miserable fiend, to torture and death.
-1.431747
Had my eyes deceived me?
-1.450231
They spurn and hate me.
-1.460187
Then, overcome by fatigue, I lay down among some straw and fell asleep.
-1.500029
And do not you fear the fierce vengeance of my arm wreaked on your miserable head?
-1.545846
Man!
-1.578741
Justine shook her head mournfully.
-1.579585
I paused; at length he spoke, in broken accents: Unhappy man!
-1.590892
How often did I imprecate curses on the cause of my being!
-1.611354
But now crime has degraded me beneath the meanest animal.
-1.622701
During the whole of this wretched mockery of justice I suffered living torture.
-1.647614
Wherefore not?
-1.648735
Shall I not then hate them who abhor me?
-1.654698
I never beheld anything so utterly destroyed.
-1.668905
My own agitation and anguish was extreme during the whole trial.
-1.674818
I remained motionless.
-1.683465
He threatened excommunication and hell fire in my last moments if I continued obdurate.
-1.686961
I, a miserable wretch, haunted by a curse that shut up every avenue to enjoyment.
-1.715548
The monster continued to utter wild and incoherent self-reproaches.
-1.731914
There he lies, white and cold in death.
-1.735126
Miserable himself that he may render no other wretched, he ought to die.
-1.775426
Oh, Frankenstein!
-1.804502
Why did I not die?
-1.822475
I could have torn him limb from limb, as the lion rends the antelope.
-1.823312
Why did I not then expire!
-1.860791
Leave me; I am inexorable.
-1.879978
I never saw a man in so wretched a condition.
-1.881685
I pitied Frankenstein; my pity amounted to horror; I abhorred myself.
-1.894665
I do refuse it, I replied; and no torture shall ever extort a consent from me.
-1.975587
Poor little fellow!
-1.97756
Or rather, stay, that I may trample you to dust!
-1.993792
I have endured incalculable fatigue, and cold, and hunger; do you dare destroy my hopes?Begone!
-10.273834
Begone!
-10.51603
Miserable, unhappy wretch!
-10.930867
Despair!
-12.273019
Ugly wretch!
-2.012111
Chapter 16 Cursed, cursed creator!
-2.070458
I did confess, but I confessed a lie.
-2.156648
Thus I might proclaim myself a madman, but not revoke the sentence passed upon my wretched victim.
-2.208422
Oh, not abhorred!
-2.255191
No guilt, no mischief, no malignity, no misery, can be found comparable to mine.
-2.27524
Poor, poor girl, is she the accused?
-2.345074
What a miserable night I passed!
-2.426911
I did not yet entirely know the fatal effects of this miserable deformity.
-2.487342
Geneva, May 12th, 17--.
-2.625245
Frankenstein
-2.679246
I never could survive so horrible a misfortune.
-2.722253
Nay, then I was not miserable.
-2.848968
No; I am not so selfish.
-2.911875
I am interrupted.
-3.021047
Poor William!
-3.069702
I trembled.
-3.091565
R.W.
-3.114359
Clerval!
-3.402281
Oh, no!
-3.683603
Suddenly a heavy storm of rain descended.
-3.789021
Poor girl!
-4.030508
You may hate, but beware!
-4.177622
I knocked.
-4.350249
I thought (foolish wretch!)
-4.381934
He struggled violently.
-4.595895
I am malicious because I am miserable.
-4.733167
Abhorred monster!
-4.976811
Poor Clerval!
-5.525419
Unfeeling, heartless creator!
-5.878955
No!
-6.044624
Do not despair.
-6.321552
I trembled violently, apprehending some dreadful misfortune.
-6.501103
Do not fear.
-7.414885
Wretch!
-8.231735
Scoffing devil!
-8.805615
Begone, vile insect!
-9.783166
Wretched devil!
-9.875328
Hypocritical fiend!
-9.878561
Hideous monster!
...@@ -3,8 +3,8 @@ ...@@ -3,8 +3,8 @@
''' '''
This script applies a trained model to textfiles. This script applies a trained model to textfiles.
It splits the text in sentences and predicts a sentiment score for each of the sentences. It splits the text in sentences and predicts a sentiment score for each of the sentences / or for each of the words
The score & sentence are saved in a file, ordered from small to big scores The score & sentence/word are saved in a file, ordered from small to big scores
clean your text using adapting_the_reading_glasses.py clean your text using adapting_the_reading_glasses.py
''' '''
...@@ -13,15 +13,10 @@ clean your text using adapting_the_reading_glasses.py ...@@ -13,15 +13,10 @@ clean your text using adapting_the_reading_glasses.py
import numpy as np import numpy as np
import pandas as pd import pandas as pd
import re import re
import time
import random
import os, sys
import nltk import nltk
import nltk.data import nltk.data
from sklearn.linear_model import SGDClassifier from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, f1_score
from sklearn.externals import joblib from sklearn.externals import joblib
...@@ -30,16 +25,6 @@ from sklearn.externals import joblib ...@@ -30,16 +25,6 @@ from sklearn.externals import joblib
### WRITING ### WRITING
### ------------ ### ------------
def write(sentence):
words = sentence.split(" ")
for word in words:
for char in word:
sys.stdout.write('%s' % char)
sys.stdout.flush()
time.sleep(0.2)
sys.stdout.write(" ")
sys.stdout.flush()
def archive(sentence, filename): def archive(sentence, filename):
with open(filename, "a") as destination: with open(filename, "a") as destination:
destination.write(sentence) destination.write(sentence)
...@@ -71,47 +56,6 @@ def load_embeddings(filename): ...@@ -71,47 +56,6 @@ def load_embeddings(filename):
return pd.DataFrame(arr, index=labels, dtype='f') return pd.DataFrame(arr, index=labels, dtype='f')
### Load Lexicon of POSITIVE and NEGATIVE words
### -------------------------------------------
def load_lexicon(filename):
'''
Load a file from Bing Liu's sentiment lexicon
(https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html), containing
English words in Latin-1 encoding.
One file contains a list of positive words, and the other contains
a list of negative words. The files contain comment lines starting
with ';' and blank lines, which should be skipped.
'''
lexicon = []
with open(filename, encoding='latin-1') as infile:
for line in infile:
line = line.rstrip()
if line and not line.startswith(';'):
lexicon.append(line)
return lexicon
### See the sentiment that this classifier predicts for particular words
### --------------------------------------------------------------------
def vecs_to_sentiment(vecs):
# predict_log_proba gives the log probability for each class
predictions = model.predict_log_proba(vecs)
# To see an overall positive vs. negative classification in one number,
# we take the log probability of positive sentiment minus the log
# probability of negative sentiment.
return predictions[:, 1] - predictions[:, 0]
### Use sentiment function to see some examples of its predictions on the test data
### --------------------------------------------------------------------------------
def words_to_sentiment(words):
vecs = embeddings.loc[words].dropna()
log_odds = vecs_to_sentiment(vecs)
return pd.DataFrame({'sentiment': log_odds}, index=vecs.index)
### Combine sentiments for word vectors into an overall sentiment score by averaging them ### Combine sentiments for word vectors into an overall sentiment score by averaging them
### -------------------------------------------------------------------------------------- ### --------------------------------------------------------------------------------------
TOKEN_RE = re.compile(r"\w.*?\b") TOKEN_RE = re.compile(r"\w.*?\b")
...@@ -123,20 +67,6 @@ def text_to_sentiment(text): ...@@ -123,20 +67,6 @@ def text_to_sentiment(text):
sentiments = words_to_sentiment(tokens) sentiments = words_to_sentiment(tokens)
return sentiments['sentiment'].mean() return sentiments['sentiment'].mean()
### Use Pandas to make a table of names, their predominant ethnic background, and the predicted sentiment score
### ------------------------------------------------------------------------------------------------------------
def name_sentiment_table():
frames = []
for group, name_list in sorted(NAMES_BY_ETHNICITY.items()):
lower_names = [name.lower() for name in name_list]
sentiments = words_to_sentiment(lower_names)
sentiments['group'] = group
frames.append(sentiments)
# Put together the data we got from each ethnic group into one big table
return pd.concat(frames)
### ------------------------------------------ ### ------------------------------------------
...@@ -144,165 +74,17 @@ def name_sentiment_table(): ...@@ -144,165 +74,17 @@ def name_sentiment_table():
### LOAD EMBEDDINGS ### LOAD EMBEDDINGS
#playsound('/path/to/file/you/want/to/play.wav') #playsound('/path/to/file/you/want/to/play.wav')
embeddings = load_embeddings('data/glove.42B.300d.txt') embeddings = load_embeddings('data/glove.840B.300d.txt')
#embeddings = load_embeddings('data/glove.42B.300d.txt')
#embeddings = load_embeddings('data/glovesample.txt') #embeddings = load_embeddings('data/glovesample.txt')
#playsound('/path/to/file/you/want/to/play.wav') #playsound('/path/to/file/you/want/to/play.wav')
filename = 'data/1984_all_stripped.txt' filename = 'data/1984_all_stripped.txt'
#filename = 'data/1984_fragment.txt'
#filename = 'data/frankenstein_for_machines.txt' #filename = 'data/frankenstein_for_machines.txt'
pos_output = filename.replace('.txt','_pos.txt') pos_output = filename.replace('.txt','_pos.txt')
neg_output = filename.replace('.txt','_neg.txt') neg_output = filename.replace('.txt','_neg.txt')
scored_words = filename.replace('.txt','_scored_words.txt')
### Welcome & choice
### ----------------
# rows = embeddings.shape[0]
# columns = embeddings.shape[1]
# pos_words = load_lexicon('data/positive-words.txt')
# neg_words = load_lexicon('data/negative-words.txt')
# ### CLEAN UP positive and negative words
# ### ------------------------------------
# #the data points here are the embeddings of these positive and negative words.
# #We use the Pandas .loc[] operation to look up the embeddings of all the words.
# pos_vectors = embeddings.loc[pos_words]
# neg_vectors = embeddings.loc[neg_words]
# #Some of these words are not in the GloVe vocabulary, particularly the misspellings such as "fancinating".
# #Those words end up with rows full of NaN to indicate their missing embeddings, so we use .dropna() to remove them.
# pos_vectors = embeddings.loc[pos_words].dropna()
# neg_vectors = embeddings.loc[neg_words].dropna()
# print("\t\tTidied up, you see that each word is represented by exactly 300 points in the vector landscape: \n", pos_vectors[:5], "\n")
# #time.sleep(10)
# len_pos = len(pos_vectors)
# len_neg = len(neg_vectors)
# '''
# Now we make arrays of the desired inputs and outputs.
# The inputs are the embeddings, and the outputs are 1 for positive words and -1 for negative words.
# We also make sure to keep track of the words they're labeled with, so we can interpret the results.
# '''
# vectors = pd.concat([pos_vectors, neg_vectors])
# targets = np.array([1 for entry in pos_vectors.index] + [-1 for entry in neg_vectors.index])
# labels = list(pos_vectors.index) + list(neg_vectors.index)
# ### TRAINING & TESTING
# ### ___________________
# '''
# Using the scikit-learn train_test_split function, we simultaneously separate the input vectors,
# output values, and labels into training and test data, with 10% of the data used for testing.
# '''
# train_vectors, test_vectors, train_targets, test_targets, train_labels, test_labels = \
# train_test_split(vectors, targets, labels, test_size=0.1, random_state=0)
# '''
# Now we make our classifier, and train it by running the training vectors through it for 100 iterations.
# We use a logistic function as the loss, so that the resulting classifier can output the probability
# that a word is positive or negative.
# '''
# model = SGDClassifier(loss='log', random_state=0, n_iter=100)
# model.fit(train_vectors, train_targets)
# '''
# ### EVALUATION - Finetuning the scoring
# ### ____________________________________
# We evaluate the classifier on the test vectors.
# It predicts the correct sentiment for sentiment words outside of its training data 95% of the #time.
# Precision: (also called positive predictive value) is the fraction of relevant instances among the retrieved instances:
# -> When it predicts yes, how often is it correct?
# Recall: (also known as sensitivity) is the fraction of relevant instances that have been retrieved over
# the total amount of relevant instances: how many instances did the classifier classify correctly?
# Confusion Matrix: True Positives | False Negatives
# False Positives | True Negatives
# '''
# confusion_matrix = (confusion_matrix(model.predict(test_vectors), test_targets))
# #print("confusion matrix", confusion_matrix)
# cm = np.split(confusion_matrix, 2, 1)
# print("\t\tLet's ", blue("evaluate our findings!\n"))
# #time.sleep(4)
# print("\t\tFor each of 10% of test words, we predict their overall sentiment.\n")
# #time.sleep(4)
# print("\t\tWe compare our results to the given labels.\n")
# #time.sleep(4)
# print("\t\tFor this test we scored the following:\n")
# #time.sleep(4)
# TP = cm[0][0]
# FP = cm[0][1]
# TN = cm[1][1]
# FN = cm[1][0]
# print("\t\tWe matched ", green(str(TP))," items correctly as positive words.\n")
# #time.sleep(4)
# print("\t\tThese are also called ", red("True Positives."))
# #time.sleep(4)
# print("\n")
# print("\t\tWe mismatched ", green(str(FP))," items, we labeled them incorrectly as positive words.\n")
# #time.sleep(4)
# print("\t\tThese are also called ", red("False Positives."))
# #time.sleep(4)
# print("\n")
# print("\t\tWe matched ", green(str(TN))," items, we labeled them correctly as negative words.\n")
# #time.sleep(4)
# print("\t\tThese are also called ", red("True Negatives."))
# #time.sleep(4)
# print("\n")
# print("\t\tWe mismatched ", green(str(FN))," items, we labeled them incorrectly as negative words.\n")
# #time.sleep(4)
# print("\t\tThese are also called ", red("False Negatives."))
# #time.sleep(4)
# print("\n")
# ### QUESTION::: How to map weights features/outcome numbers back to original words??]
# ### QUESTION: how to show examples of TP/FP/TN/FN???
# #print("Weights assigned to features: ", model.coef_)
# accuracy_score = (accuracy_score(model.predict(test_vectors), test_targets))
# print("\t\tOur accuracy score is ", accuracy_score)
# '''
# ### Predicted sentiment for Particular Word
# ### ________________________________________
# Let's use the function vecs_to_sentiment(vecs) and words_to_sentiment(words) above to see the sentiment that this classifier predicts for particular words,
# to see some examples of its predictions on the test data.
# '''
# # Show 20 examples from the test set