cdtw capstone presentation

16
Black, White, & Blue Team CDTW Members: Chris Smith, Dwayne Jones, Todd Rutherford, & Willie V. Ward III

Upload: todd-rutherford

Post on 15-Apr-2017

119 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Black, White, & BlueTeam CDTW

Members: Chris Smith, Dwayne Jones, Todd Rutherford, & Willie V. Ward III

Purpose & DisclaimerPurpose: The purpose of this presentation is to summarize the research and findings along with any associated subject manner. The information contained herein this presentation is intended to be informational in nature as it summarizes the avenue of research preformed by the researching group.

Disclaimer: The content contained within this presentation is not meant to offend any party within the physical or virtual audience nor is the offend any individual’s societal, cultural, or educational beliefs. Contained within the presentation are pictures & views that serve as antidotes for the views present in today’s society, which served as the foundation for this research.

\

1930 1955

1963 1993

Today

Problem & Hypothesis

Problem Statement:

African-American males are involved in more fatal incidents with the events with the police than any other ethnicity.

Hypothesis Statement:The race of police officers serves as a precursor for the disproportionate amount of fatalities at the hands of police for African-American (race) males (gender) between the ages of 15-30 (age) below the poverty line (income) who are unarmed (weapon classification).

Data & Systems ArchitecturePath 1

Data Modeling & Visualization

Data Ingestion

Path One Description:The primary data set for this path are tweets from 20 Twitter user accounts the produce 4,000 data records ingested routinely at the time of a new event or at a minimum of 2 week intervals.

The data is ingested using the Tweepy package and then wrangled and stored in PostgreSQL. The final product of this data collection is shown visually using word cloud graphs from Python/Conda source code.

Path Two Description:The path includes structured datasets collected from various sources to produce snapshots for the variables analyzed supporting the team’s hypothesis. The data is stored using the Amazon Web Services (AWS) database, wrangled and stored in PostgreSQL then transposed using Tableau as the visualization tool.

20 Twitter User Accounts:• 15 celeb/advocate accounts• 5 News & Media accounts

Data Munching/Wrangling

Data Munching/Wrangling

Path 2

Data Architecture

Data Analysis: EncumbrancesPath One Problems

Twitter data was limited to a 200 tweet ingestion per pass.

Since the users produced activity on an array of subjects, this presented a collection issue.

Updated:May 5, 2016

Twitter Account Analysis (not listed in any specific order)

Tier Four Three Two OneTier Limits >300,000 followers <300,000 followers <100,000 followers <50,000 followersTier Count 4 3 5 3

User 1 User 2 User 3 User 4 User 5

Name Cornel West Tavis Smiley Dr. Umar Johnson Rev. Al Sharpton Khym RinggoldTwitter Name @CornelWest @tavissmiley @DrUmarJohnson @TheRevAl @Login2truth

Verified Account Yes Yes No Yes NoFollowers 750k 318k 46k 489k 10.7k

Tweets 3k 12.5k 6.5k 11.2k 8kTier

Bio Keywords Public Intellectuals Advocate Pan-Africanist White Supremacy White SupremacyBio KeyWords Racial Justice Entrepreneur Political Scientist Pan-Africanist Pan-Africanist Bio KeyWords Progressive Politics Pres. Of Team Pan-Afrikan Injustice Injustice

User 6 User 7 User 8 User 9 User 10

Name Jamilah Lemieux Johnetta Elzie Jeffrey Wright Deray McKesson Marc Lamont HillTwitter Name @JamilahLemieux @Nettaaaaaaaa @jfreewright @deray @marclamonthill

Verified Account Yes Yes Yes Yes YesFollowers 82k 122k 88.4k 345k 224k

Tweets 186k 175k 14.3k 159k 57kTier

Bio Keywords Senior Editor, Ebony Soldier Chicken foot Activist MorehouseBio KeyWords Howard University War Kook Educator Colored BoysBio KeyWords Stone Builder Actor Mayoral Candidate KAY

User 11 User 12 User 13 User 14 User 15

Name Black Lives Matter W. Kamau Bell #JusticeForDeriante Michael Eric Dyson Malcolm-Jamal WarnerTwitter Name @Blklivesmatter @wkamaubell @SankofaBrown @MichaelEDyson @MalcolmJamalWar

Verified Account No Yes No Yes YesFollowers 110k 77.9k 48.8k 261.7 265k

Tweets 8k 25k 139k 15.2k 5.1kTier

Bio Keywords Affirmation CNN Socialist Georgetown Professor Fucks to GiveBio KeyWords Resistance Host US Shades of America Militant Political Analyst Poet/WriterBio KeyWords Resilience Pan-Africanist Author Actor/Musician/Director

User 16 User 17 User 18 User 19 User 20

Name Fox News CNN BET MSNBC NAACPTwitter Name @FoxNews @CNN @BET @MSNBC @NAACP

Verified Account Yes Yes Yes Yes YesFollowers 9.1M 25M 1.85M 966k 136k

Tweets 249k 86k 70k 98k 15.5Tier Elite Elite Elite Elite Elite

Bio Keywords America’s Strongest #GoThere #ChasingDestinyBET Political Commentary Civil RightBio KeyWords Insightful Analysis Difficult Stories #BlackGirlsRock Informed Perspectives Grass RootsBio KeyWords Breaking News

Data Analysis: EncumbrancesPath Two

Collections of disparate dataset helped the group analyze different variables as associated with the hypothesis, but a lack of complete, related, and comprehensive datasets stymied the project’s statistical analysis.

Data VisualizationsPath One

147 Samples

Decision Tree: The race of police officers serves as a precursor for the disproportionate amount of fatalities at the hands of police for African-American (race) males (gender)

Scikit Learn Models evaluated• Linear SVC mode• Logistic Regression • Decision Tree Classifier

The decision tree model produced the following binary tree graph.

Data Visualizations: Path One

Group 4: Media

Group 3: RevolutionariesGroup 1: Intellectuals

Group 2: Renaissance Artists

Data Visualizations: Path Two

Data Visualizations: Path Two

Data Visualizations: Path Two

Recommendations for DataPath One: Collection

Use of a more sophisticated API (i.e. Twitter Firehouse) that will allow a greater ingestion bandwidth and use of criteria (keywords, locations, users, etc.)

Supplement the path by setting up user polls to get a better idea of how the general public feels.

Path One: Assessment

More data would help probabilistic assessment with statistical significance.

Continue sentiment analysis using Naive Bayes Classifier using the methodology outlined within Gamallo & Garcia (2014).

Recommendations for AnalysisPath Two: Collection

Employment of unstructured data using internet scrapers to pull data from various national, regional, and local news stations.

Development of national databases combining the efforts of organizations who sponsor the user submitted files submitted.

Path Two: Assessment

Assessment of the hypothesis that contributed to the best statistical analysis as opposed to analysis of all hypothesis concurrently.

Focus on the assessment of the dataset yielding the best analysis for the number of variables included.