First, we omitted the “NA”s and blank spaces from the dataset. The data was mostly straightforward, but it required some cleaning and separation into different data sets. The third hypothesis we are testing is that the proportion of positive sentiment words and negative words are different in each decade, so we can accurately predict a song’s decade with a decision tree based on how similar its own proportion of positive or negative words is to a particular decade. The second hypothesis we are testing is that the number of unique words will decrease each decade, indicating that songs are becoming more repetitive. However, we also hypothesize that common themes in songs will stay the same. The first hypothesis we are testing is that the top 15 words, top two-word phrases, and top three-word phrases have changed over the decades, with more profanity and less sophisticated words with each passing decade. Overall, our alternative hypothesis is that the word count, average number of unique words per song, top 15 words used, popular two-word or three-word phrases, and proportion of sentimental words changes each decade. Our null hypothesis is that regardless of the decade, the word count, average number of unique words per song, top 15 words used, top bigrams and trigrams, and proportion of sentimental words will remain the same. We were more interested in learning about trends behind the lyrics rather than trends behind artists or song names. Specifically, we looked at the differences among word count, average number of unique words per song, top 15 words used, popular two- or three-word phrases, and sentimental words for each decade. We explored several hypotheses while analyzing this data set.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |