2016 Election: Sentiment Analysis From Twitter on Presidential Candidates

In an effort to understand the potential future of the American People, I decided to undertake the noble (and thankless) task of analyzing textual data regarding 2016 Presidential Candidates through the Twitter API. I used Perl with the Net::Twitter module from CPAN to perform the data extraction and processing, the positive and negative words were from “Mining and Summarizing Customer Reviews” (Hu, Liu, 2004). Lastly, the wordclouds were made with Tableau.

The following data visualizations were made from a sample size of 500 tweets. Each of the visualizations used a different search criteria- which will be specified with the images.

The following words cloud is from the search criteria of ‘#BernieSanders’:

sa_sanders_2_26_16

This search found 32 total positive words and 24 total negative words.

The next word cloud utilized the search term ‘#HillaryClinton’:

sa_#hillary_clinton_2_26_16

The above results yielded 41 total positive words and 65 total negative words.

The next result is from ‘#DonaldTrump’:

sa_#donaldtrump_2_26_16

The above cloud had 56 total positive words and 53 total negative words.

The next result is from ‘#JohnKasich’:

sa_#johnkasich_2_26_!6

The above cloud had 60 positive words and 50 negative words.

The next result is from ‘#MarcoRubio’:

sa_#marcorubio_2_26_16

The above cloud had 61 positive and 59 negative words.

The next result is from ‘#TedCruz’:

sa_#tedcruz_2_26_16

The above cloud had 46 positive and 56 negative words.

The final result is for ‘#BenCarson’:

sa_#bencarson_2_26_16.jpg

The above cloud had 43 positive and 35 negative words.

Hopefully these results are beneficial to the reader in choosing a candidate!

 

 

 

2016 Election: Sentiment Analysis From Twitter on Presidential Candidates

Finding the Greatest Common Factor in Perl

Once again from Knuth’s Art of Computer Programming, one of his first examples of in his book of an easy algorithm to implement is euclid’s method for finding the greatest common factor between two numbers. The code below was developed from the pseudo-code presented in his book:


#!/usr/bin/perl

use strict;
use warnings;

=ignore

Euclid's algorithm for finding greatest common factor of 2 numbers

=cut


my $m = 225;
my $n = 10;

my $remainder;
for ( 1..$n ) {
    $remainder = divide($m, $n);
    if ( $remainder == 0 ) {
        print $n,"\n"; # Found the GCF, break out of the loop
        last;
    }
    else {
        $m = $n;
        $n = $remainder;
    }
}

sub divide {

    my $m = shift;
    my $n = shift;

    my $remainder = $m % $n;
    return $remainder;
}

 

Finding the Greatest Common Factor in Perl

The Knuth-Morris-Pratt Algorithm in Perl and using index() to match a string pattern.

In light of my recent post using a brute-force string searching algorithm, I decided to post an implementation of the Knuth-Morris-Pratt algorithm in Perl. This implementation is essentially straight from the pseudo-code found on the wiki.


#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

my $string = "ABC ABCDAB ABCDABCDABDE";

my $pattern = "ABCDABD";
my $table_array = build_table($pattern);

print "found pattern at index: "
. search_string( $string, $pattern, $table_array ),"\n";


sub search_string {

    my $string      = shift;
    my $pattern     = shift;
    my $table_array = shift;

    my $m = 0; # beginning of current match
    my $i = 0; # the position of the current character in pattern sought

    my @split_string = split(//, $string);
    my @split_pattern = split(//, $pattern);

    while ( $m + $i < scalar(@split_string) ) {
         if ( $split_pattern[$i] eq $split_string[ $m + $i] ) {
             if ( $i == scalar(@split_pattern) - 1 ) {   
                 return $m;             
             }             
         $i++;
         }
         else {
             if ( @{$table_array}[$i] > -1 ) {
                $m = $m + $i - @{$table_array}[$i];
                $i = @{$table_array}[$i];
            }
            else {
                $i = 0;
                $m++;
            }
        }

    }

    return length($string);

}

sub build_table {

    my $string = shift;

    my @split_string = split (//, $string);

    my $pos = 2;
    my $cnd = 0;

    @{$table_array}[0] = -1;
    @{$table_array}[1] = 0;

    while ( $pos < scalar(@split_string) ) {  
       if ( $split_string[$pos-1] eq $split_string[$cnd] ) {   
          $cnd++; 
          @{$table_array}[$pos] = $cnd;
          $pos++;
       } 
       elsif ( $cnd > 0 ) {
            $cnd = @{$table_array}[$cnd];
       }
       else {
            @{$table_array}[$pos] = 0;
            $pos++;
       }

    }

    return $table_array;
}

It should be noted, however, that the index() function in Perl uses the Boyer-Moore algorithm under the hood- so implementing a string searching function like the following may be an easier, and faster solution. It takes a pattern and string (to search within) as arguments.


use strict;
use warnings;
use Data::Dumper;

my $matches = occurrences('CGATGGTCG',
'TCGATGGTAAATACTGTGCGATGGTCGATGGTTCGATGGTCGATGGTCGGGACGATGGTGGGCGATGGTGCGATGGTTCGATGGTACGATGGTCGATGGTACGATGGTCAGGGCGATGGTTAACGCGATGGTGGCAGTCGATGGTTGCGATGGTTCGATGGTCCCGATGGTGCGACGATGGTATTCCGATGGTTCGATGGTCGATGGTACTGCGATGGTCGATGGTACATCGATGGTATCCGATGGTCGATGGTGGCGATGGTCGATGGTCGATGGTCGATGGTGTTATCGATGGTCCGATGGTCGATGGTTAGCGATGGTTATAGGTATCCCGATGGTCGATGGTCGATGGTTACGATGGTCCGATGGTCGATGGTCTTTGTCGATGGTTCGATGGTCGATGGTAACGATGGTCGATGGTTTGTCGATGGTCGCGATGGTCGCCGATGGTGCCGATGGTGGGTCGATGGTGCTCGATGGTCGATGGTCCGCGATGGTTGCGTCGATGGTCGATGGTCGATGGTGGACTCGATGGTCACGATGGTTTCTCGATGGTGGTTCCGATGGTCGATGGTGTCGATGGTACGCAAGTACAGATAGTGCGATGGTGAGGATAGTGCGATGGTAGCGATGGTCGCGATGGTCGATGGTTACTTGCCTGCGATGGTGTGTACGATGGTCGGAACGCCCGATGGTGACGATGGTCATGCGATGGTATTCAATTCGATGGTCTCCGGCCGAAGAAAGCGATGGTCCCAAGATGATCGATGGTCGATGGTCGATGGTGTCGATGGTCCGATGGTCCGTTTCGATGGTACTTCGATGGTTTGCGATGGTATATGTCGATGGTCCAACGATGGTGGTGCGATGGTCTGCGATGGTA');

print join(' ', @{$matches});
sub occurrences {

    my( $x, $y ) = @_;

    my $pos = 0;
    my $matches = 0;
    my @locations;

    while (1) {
        $pos = index($y, $x, $pos);
        last if($pos < 0);
        $matches++;
        $pos++;
        push @locations, $pos;
    }

    return \@locations;
}
The Knuth-Morris-Pratt Algorithm in Perl and using index() to match a string pattern.

Finding the Index and Number of Occurrences of a Pattern in a String

It is a common task in programming to search for patterns in a given string. This is especially true in the field of Bio Informatics. For various reasons, the programmer may want to know the location, or index, that the pattern started at within the String. The following code below is an example of how to find the index and also the number of occurrences of a given pattern in Perl.

sub pattern_count {
    my ($text, $pattern) = @_;

    my $pattern_len = length($pattern);
    my $count = 0;
    my $pos;
    for ( my $i = 0; $i < length($text); $i++) {
        if ( $pattern eq substr($text, $i, $pattern_len ) ) {
            $count += 1;
            $pos = $i + 1;
            print "found $pattern at $pos\n";
        }
    }
    return $count;

}

my $num_occurrences = pattern_count('GATATATGCATATACTT', 'ATAT');

print "The number of times the pattern occurred is : " . $num_occurrences,"\n";


The String to search within is the first argument to the method, and the desired pattern is the second argument given.

The difficult part about using a regular expression for this task is that sometimes the desired pattern may overlap itself in occurrences. Since using the global flag for a regular expression will not pick up the overlapping occurrences, this implementation of walking the string is used.

Finding the Index and Number of Occurrences of a Pattern in a String