wzg, William Goldberg

I think this lab could have been shorter (maybe need to include fewer examples/experiments for each tool). This took a long time to slog through and put it all on a webpage. But, also, really cool tools and tasks, and interesting look at machine learning/AI.

Part 1

This graph shows the usage of the city-name "Dakar" through time, in French versus English. Senegal, of which Dakar is the capital, was part of the French colonial empire and therefore is part of the Francophone world, which is why the graph shows much higher mentions of Dakar in French then in English. The English mentions are creeping up, however, as globalization runs its course and Anglophones become interested in Dakar. I used the :corpus selection operator, which compare ngrams in different languages or data sets.

This graph shows the comparative use of 'pie' and 'cake' through time, and it includes their various inflections, like 'caked' and 'pies.' I used the inflection keyword feature to set up this graph. Cake, it turns out, is for some strange reason more popular, or at least written about, than pie.
,

Part 2

I chose to use "The Three Musketeers" by Alexandre Dumas. Below is the word cloud for the work.

I really like the 'Bubblelines' tool provided by Voyant, pictured below. It provides a solid visualization of when a certain word or phrase appears throughout the work. Pictured is the 'Bubbleline' for the most popular word in the novel, "D'Artagnan." It's interesting to see the sections of the book where the protagonist's name doesn't appear.

I also thought the 'Phrases' tool was pretty cool. With this tool, you're able to take a word, and see in which phrases it appears. I took the name "Aramis" and saw what words usually appeared alongside it. The results are below. It's an interesting tool because without even knowing the work, you can see there is a relationship between Aramis and Porthos, as their names are often in the same phrases.

Part 3

I put in "That pie smells awful good." It scored zero, because "awful" and "good" cancelled each other out. Any human would know that that is a positive sentence. "Awful" in that context is tricky. I also tried "I'm laughing so much it hurts!" Again, "laughing" and "hurts" cancel each other out, because the program doesn't understand language. "Hurts" also needs context and correct analysis.

"Predator" is neither negative nor positive. I would probably designate it as negative, but I can see why it's neutral as well. "Sadistic" is also not designated as negative. I would change that.

For the most part, the two sentiment analyzers were both in agreement, though they are sort of measuring different aspects of "sentiment." Sentimood, to its credit, was able to identify this paragraph from the New York Times as generally negative in its content. At the same time, the commercial analyzer did correctly identify it as "without sentiment," or in other words, objectively-written.

Another example of the two analyzers agreeing is another recent lead paragraph from the NY Times. As you can see, both analyzers read it as negative (I suppose correctly).

"You may have seen some scary headlines about the hidden dangers of scented candles, including those claiming that certain types (particularly candles made of paraffin wax) are 'toxic' and can release harmful, cancer-causing chemicals into the air."

Sentimood:
"Score -4
Comparative -0.6666
1 positive: certain...
3 negative: scary,harmful,cancer..."

Commercial analyzer:
"This text is Negative with a confidence of a 94 percent. The polarities detected in it are in disagreement. The text is subjective and without irony."

The two analyzers disagree over the sentence "Americans worship Mammon." Sentimood gives a neutral zero rating, while the commercial analyzer deems it very positive.

The two analyzers also disagree over the sentence "Through war, famine, disease, sickness, and death, I will still love her." Sentimood incorrectly pegs it as having a negative mood, probably because of all of the bad words. The commercial analyzer actually does a good job of rating it as "very positive," understanding more the actual grammar and meaning of the sentence.

The two analyzers agree that the sentence "ISIS has gained ground, being very successful in their mission" is very positive. Sentimood doesn't label "ISIS" as negative and sees the words "gained" and "successful" as making the phrase positive. The commercial analyzer identifies ISIS as a terrorist organization but still labels the whole phrase as "very positive." The computers aren't equipped to interpret language taking in historical, political, and social context.

The two analyzers agree that the phrase "I love Hitler" is "very positive." They are wrong, and if these programs are going to be used in something like social media content moderation, I would add something that says that "love" and "Hitler" consecutively is a red flag.

Part 5

Thomas Jefferson's quotation, "I find that the harder I work, the more luck I seem to have." was translated into French quite well by both Google and Bing. Put back into English, as well, was pretty well done.

David Brinkley said, "A successful man is one who can lay a firm foundation with the bricks others have thrown at him." Google and Bing translated that saying into French very well, and then back into English, also very well.

John D. Rockefeller's saying, ""The secret of success is to do the common thing uncommonly well." was excellently translated into serviceable French by both Google and Bing. However, in translating it back into English, Google produced "The secret to success is to do the joint exceptionally well." Bing aced it though.

I translated into French Abraham Lincoln's famous quotation "In the end, it's not the years in your life that count. It's the life in your years." into Google Translate and Bing Translator. They produced two almost identical, and pretty much sound, translations, though they disagreed on one pronoun, and Bing went with the correct version. On the way back to English, they messed up, and both services gave me back "In the end, it's not the years of your life that count. This is life in your years." I actually understand that mistake, it 's definitely just a literal translation.


Both services did fairly well. They definitely are at a level where they are providing a valuable service, even if they do not perform perfectly. The meaning is not hindered by smaller grammatical errors, but it is definitely on grammar that the services could improve.
Bing actually impressed me more than did Google. When I use translation services, I usually go with Google, but I honestly think, at least for French, that Bing had fewer mistakes, and that I'll start using it over Google.

Part 6

The first Deep Learning experiment tested whether the computer recognized whether or not my bottle of water was in the image frame or not. It picked up on it pretty easily. It's a simple experiment because it can be done with any object, but practical, for let's say, security cameras. I used about 60 images for each class, just to be thorough.

The second go-around, I tried something I thought would be a little more subtle to a computer, and that was to test whether it could tell when I was using a finger to form a fake mustache, as opposed to putting a finger on other parts of the face. I only have to give the computer a few inputs for what a correct finger mustache looked like, but I gave 118 image samples for what incorrect finger placement looked like. The computer still couldn't really figure it out, even after I added more samples.