capstone presentation(1)

38
TEAM MEMBERS: Anand Srinivasan Saumya Jain Sumit Kumar FACULTY ADVISOR: J. Michael (Mike) Boyle

Upload: saumya-jain

Post on 15-Feb-2017

40 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Capstone presentation(1)

TEAM MEMBERS:

Anand Srinivasan Saumya Jain Sumit Kumar

FACULTY ADVISOR:

J. Michael (Mike) Boyle

Page 2: Capstone presentation(1)

ABOUT ESS

●Founded in 1977, Second largest operator of self storage in the US

●Headquarters - Salt Lake City

●1147 properties - 54% wholly-owned, 22% joint venture, 24% managed

Page 3: Capstone presentation(1)

1,370 stores*

37,700 unit types

1,000,000+ customers

570,000 calls YTD

5,000,000 unique visitors

MASSIVE DATA=*Totals assume the completion of the SmartStop acquisition

Page 4: Capstone presentation(1)

DIVERSIFIED PORTFOLIO

4

Core Market

Secondary Market

No Presence

  *As of June 30, 2015 

121%

Northwest

26223%

California

989%

Mtn West

1009%

Texas

111%

Hawaii

12010%

Midwest

20818%

Northeast

11910%

Mid-Atlantic

989%

Southeast

11910%

Florida & P.R.

Page 5: Capstone presentation(1)

ABOUT THE PROJECTDataset: Customer data provided in 3 chunks (Clickstream)

●20 Gb (pipe-delimited flat file source)●72 Gb (total unfolded size)

Goal: To provide insights and recommendations into the data

●Customer Segmentation & Market Analysis ●Exploration and Predictive Analysis●Conversion Strategy

Technology: SQL Server, Adobe Analytics, R and Tableau

Page 6: Capstone presentation(1)

CUSTOMER SEGMENTATION

Page 7: Capstone presentation(1)

MARKET ANALYSIS

●28% revenue from PPC, Emails, Social & Lead Gen (Year 2015)

●More promotions - more conversion

Page 8: Capstone presentation(1)

MARKET ANALYSIS

●June - July - August → Maximum Business

●Aggressive promotions and offers would acquire more customers

Page 9: Capstone presentation(1)

PROMOTION AND RESULTS

Promotions → Repeat Customers

Page 10: Capstone presentation(1)

DIRTY DATA..Customer Dataset:

nrow(customer_data)87638124 attributes (2 derived)

Rental Dataset:nrow(rental_data)104857535 attributes

Combined with the SALES_CUSTOMER_ID to generate 861114 records (82% of rental records)

● NA values for Gender=74.89%● AGE outliers=8.7%● Removing meaningless columns● NA values for Billing State=59.43% 

Page 11: Capstone presentation(1)

POPULAR CUSTOMER

● Female

● 33 years old

● From Miami

● Or California?

● 10x10 Unit size

● No email preferences

● Opted out of email

● Not from Military

● No sms preferences

● Not a spanish speaker

Female

33 yrs

Miami

California10x10

No email preferences

Opted out of email

No military flag

No sms preferences

Not a spanish speaker

Page 12: Capstone presentation(1)

OTHER FEATURES

● No move-in cost

● No vehicle stored

● No appointment

● Has reservation flag

● No promotion to move in

● Last Payment Amount=$ 145.90 (out of 97%)

● NON-Non-Climate Outside Normal

● No reservation deposit amount

● No active insurance

Page 13: Capstone presentation(1)

DEMOGRAPHIC TRENDS

Page 14: Capstone presentation(1)

SOME MORE TRENDS

Page 15: Capstone presentation(1)

AND SOME MORE..

Page 16: Capstone presentation(1)

CLASSIFICATION-WHY?

●To predict Gender, based on 23 predictors

●Process

●Decision tree models used: C5.0, J48 and Naive Bayesian algorithms

●No Black box methods used for now!

●Best performance: Naive Bayesian model

Page 17: Capstone presentation(1)

RESULTS> print(e)Confusion Matrix and Statistics TruePrediction Female Male

Female 24402 18784Male 12371 15817

Accuracy : 0.5635 95% CI : (0.5598, 0.5671)

No Information Rate : 0.5152

P-Value [Acc > NIR] : < 2.2e-16

Kappa : 0.1214 Mcnemar's Test P-Value : < 2.2e-16 Sensitivity : 0.6636 Specificity : 0.4571 Pos Pred Value : 0.5650 Neg Pred Value : 0.5611 Prevalence : 0.5152 Detection Rate : 0.3419 Detection Prevalence : 0.6051 Balanced Accuracy : 0.5604 'Positive' Class : Female

> mmetric(datTest1$GENDER, e1071predictions,c("ACC","PRECISION","TPR","F1")) ACC PRECISION1 PRECISION2 TPR1 TPR2 F11 F12 14.15335 56.50442 56.11253 66.35847 45.71255 61.03628 50.38144

Page 18: Capstone presentation(1)

ATTRIBUTE USAGE 100.00% VEHICLE_STORED_IN_UNIT_0_1_FLAG 95.56% MILITARY_BRANCH 93.76% ATTRIBUTES 58.50% SPANISH_SPEAKER_0_1_FLAG 34.26% RESERVATION_0_1_FLAG 26.68% UNIT_SIZE 13.91% MOVE_IN_PROMOTION 8.15% NSC_RATE_GIVEN_0_1_FLAG 5.07% MOVE_IN_COST 1.92% AUTO_PAY_ACTIVE_0_1_FLAG 1.59% AGE_2 0.99% INSURANCE_RATE 0.75% INSURANCE_STATUS 0.30% LAST_PAYMENT_AMOUNT 0.12% SMS_PREFERENCES

Page 19: Capstone presentation(1)

REGRESSION-WHY?●To predict Age based on other numerical attributes

●Correlation Matrix:> cor(rdata5[c("AGE_2", "FUTURE_RATE", "LAST_PAYMENT_AMOUNT", "MOVE_IN_COST","INSURANCE_RATE")])

AGE_2 FUTURE_RATE LAST_PAYMENT_AMOUNT MOVE_IN_COST INSURANCE_RATEAGE_2 1.00000000 0.1680646 0.06427195 0.08050684 0.04321071FUTURE_RATE 0.16806459 1.0000000 0.23240379 0.43648508 0.30766990LAST_PAYMENT_AMOUNT 0.06427195 0.2324038 1.00000000 0.12739035 0.07853165MOVE_IN_COST 0.08050684 0.4364851 0.12739035 1.00000000 0.17410110INSURANCE_RATE 0.04321071 0.3076699 0.07853165 0.17410110 1.00000000

Page 20: Capstone presentation(1)

SCATTERPLOT MATRIX●Visualizing relationships among features

Page 21: Capstone presentation(1)

BEST RESULTSLinear model-with a combination of attributes> summary(rdata_lm_model_50_2)

Call:lm(formula = AGE_2 ~ ., data = train_50) Residuals: Min 1Q Median 3Q Max-20.9061 -1.2271 0.7103 1.7611 2.4229 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.159e+01 1.316e-01 164.053 < 2e-16 ***FUTURE_RATE 2.556e-03 5.194e-04 4.921 8.83e-07 ***LAST_PAYMENT_AMOUNT -9.940e-06 3.133e-05 -0.317 0.751MOVE_IN_COST -8.222e-04 1.021e-03 -0.8050.421INSURANCE_RATE 6.593e-04 5.149e-03 0.128 0.898GENDER 4.229e-02 5.938e-02 0.712 0.476AGE_Square 1.037e-02 2.254e-05 460.117 < 2e-16 ***FUTURE_RATE_and_MOV_IN_COST 7.420e-07 3.745e-06 0.198 0.843---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.259 on 5904 degrees of freedomMultiple R-squared: 0.9736, Adjusted R-squared: 0.9736F-statistic: 3.111e+04 on 7 and 5904 DF, p-value: < 2.2e-16

Page 22: Capstone presentation(1)

PERFORMANCE

> mmetric(rdata_test1$AGE_2,rdata_pred1,c("MAE","RMSE","MAPE","RMSPE","RRSE","RAE","COR", "R2")) MAE RMSE MAPE RMSPE RRSE RAE COR R2 1.7880071 2.2707159 4.6840372 0.6632362 16.3257144 15.3021630 0.9865839 0.9733477

> mmetric(rdata_test2$AGE_2,rdata_pred2,c("MAE","RMSE","MAPE","RMSPE","RRSE","RAE","COR", "R2")) MAE RMSE MAPE RMSPE RRSE RAE COR R2 1.7737509 2.2892701 4.6727918 0.6757982 16.5092513 15.1725844 0.9862799 0.9727480

Page 23: Capstone presentation(1)

IMPROVEMENTS

●Better feature selection

●Data transformation of numerical attributes

●Different approaches for testing and training

●Black box methods

Page 24: Capstone presentation(1)

CLICKSTREAM ANALYSIS (2014-2015)●26 million rows || 9794 made a query => 9764 purchased Rentals

=> Sales Team Conversion ratio = 99.69% (Very good)

●Average time a purchaser has stayed on the website (in seconds)

297●Average page views by the purchaser : 4 =>Customer Experience

●Average page views by non-purchaser : 1 (NO) => 4 (YES)

Page 25: Capstone presentation(1)

POTENTIAL CUSTOMERS FROM NON PURCHASERS

POTENTIAL CUSTOMERS

2=>>>> 971042 <<<<=6

Page 26: Capstone presentation(1)

TOP LANDING PAGES AMONG PURCHASERS

Page 27: Capstone presentation(1)

TOP LANDING PAGE AMONG NON PURCHASERS

Page 28: Capstone presentation(1)

WHAT SHOULD BE OUR WINNING STRATEGY IN TERMS OF

LANDING PAGE AND

STORAGE RENTALS?

Page 29: Capstone presentation(1)

Factors influencing a User’s behaviour on website

AGE

LOCATIONGENDER

DURATION

ATTRIBUTION CHANNEL

Page 30: Capstone presentation(1)

TOP ATTRIBUTION CHANNEL LEADING TO PURCHASES

Page 31: Capstone presentation(1)

TOP ATTRIBUTION CHANNEL NON PURCHASERS ARE COMING THROUGH

Page 32: Capstone presentation(1)

RECOMMENDED LANDING PAGE## : ATTRIBUTION_CHANNEL = DirectLoad:## : :...visitmonth > 3:## : :...visitmonth > 8:## : : :...visitmonth <= 11: Home Page (11/4)## : : : visitmonth > 11: Reserve or Hold (3/1)## : : visitmonth <= 8:## : : :...ESS_VISIT_NUMBER > 22:## : : :...GENDER in {FALSE,Female,M}: Nil (0)## : : : GENDER = Male: City Page (2/1)## : : : GENDER = Nil:## : : : :...ESS_VISIT_NUMBER <= 30: Nil (5/1)## : : : ESS_VISIT_NUMBER > 30:## : : : :...visitday <= 24: Home Page (4)## : : : visitday > 24: Nil (2/1)## : : ESS_VISIT_NUMBER <= 22:## : : :...visitmonth > 7: Nil (2)## : : visitmonth <= 7:## : : :...GENDER = Female: Facility (1)

## : ATTRIBUTION_CHANNEL = Mobile DirectLoad:## : :...visitmonth > 3:## : :...visityear > 2014: Mobile - City Page (2/1)## : : visityear <= 2014:## : : :...visitmonth <= 8:## : : :...ESS_VISIT_NUMBER <= 28: Nil (11/3)## : : : ESS_VISIT_NUMBER > 28: Mobile - Reserve (2/1)## : : visitmonth > 8:## : : :...visithour <= 4: Login (3/2)## : : visithour > 4: Mobile - Home Page (5/1)## : visitmonth <= 3:## : :...visithour <= 14:## : :...ESS_VISIT_NUMBER <= 2:## : : :...visithour > 10: Mobile - Home Page (18/6)## : : : visithour <= 10:## : : : :...visitday <= 9: Mobile - Reserve (2/1)## : : : visitday > 9: Mobile - Facility Page (3)

Page 33: Capstone presentation(1)

TO PUT IT IN PLAIN WORDS…

If visitors are coming directly on website in California => If the months are January and February => visit > 22 :- MALE : HOME PAGE

FEMALE : FACILITY PAGE

Page 34: Capstone presentation(1)

REVOLVE OTHER PAGES AROUND EXPECTED RENTALS

## Attribute usage:#### 100.00% GENDER## 96.06% Age## 84.34% HITS## 77.29% ESS_VISIT_NUMBER## 69.32% CLICKS

## Attribute usage:#### 67.03% TOTAL_VISIT_TIME_SECONDS## 64.84% visithour## 62.91% visitmonth## 36.81% visitday## 34.34% PAGEVIEWS## 12.45% visityear

Page 35: Capstone presentation(1)

REVOLVE OTHER PAGES AROUND EXPECTED RENTALS

## GENDER = Male:## :...visitmonth <= 1:## : :...Age > 63:## : : :...visithour <= 12: 12X45 (3/2)## : : : visithour > 12: 10X12 (3/1)## : : Age <= 63:## : : :...visitday <= 7: 10X30 (6/1)## : : visitday > 7:## : : :...visitday <= 15: 05X08 (3/2)## : : visitday > 15: 05X10 (3)## : visitmonth > 1:## : :...CLICKS > 4:## : :...visithour <= 13: 07X08 (3/2)## : : visithour > 13:## : : :...ESS_VISIT_NUMBER <= 2: 05X10 (3/1)## : : ESS_VISIT_NUMBER > 2: 10X20 (3/1)

## GENDER = Female:## :...HITS <= 8:## : :...visitmonth > 1: 07X10 (2/1)## : : visitmonth <= 1:## : : :...visitday <= 15: 10X10 (3)## : : visitday > 15: 07X14 (3/2)## : HITS > 8:## : :...HITS > 28:## : :...CLICKS <= 5: 07X10 (2/1)## : : CLICKS > 5:## : : :...HITS > 47: 05X05 (2)## : : HITS <= 47:## : : :...Age <= 43: 10X09 (2/1)## : : Age > 43: 10X10 (4/1)## : HITS <= 28:## : :...TOTAL_VISIT_TIME_SECONDS <= 187:## : :...ESS_VISIT_NUMBER <= 1: 10X10 (2/1)

Page 36: Capstone presentation(1)

AGAIN..

●Females are more likely to buy 10 X 10 Unit size storage in first half of a month and 7 X 14 Unit size storage

●Males are more likely to buy 12 X 45 Unit size storage in first half of a month and 10 X 12 Unit size storage

Page 37: Capstone presentation(1)

Other Recommendations●Domain forwarding - Acquire accidental traffic (Ex) extaspace.com

●Customer follow up after an online reservation - Send out Text messages when customer misses a call

●Addition of ip2country.net to Data Warehouse - Adobe Analytics

●Apart from VOC, Text and opinion mining of Twitter and Facebook data

●Engage more into Customer Sentiment Analysis - consumeraffairs.com lists a lot of unsatisfied customers

Page 38: Capstone presentation(1)

QUESTIONS?