Data Processing


InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID Country
536365 85123A White Hanging Heart T-Light Holder 6 2010-12-01 08:26:00 2.55 17850 United Kingdom
536365 71053 White Metal Lantern 6 2010-12-01 08:26:00 3.39 17850 United Kingdom
536365 84406B Cream Cupid Hearts Coat Hanger 8 2010-12-01 08:26:00 2.75 17850 United Kingdom


  • InvoiceNo: 發票號碼

  • StockCode: 貨物編號

  • Description: 描述

  • Quantity: 數量

  • InvoiceDate: 發票日期

  • UnitPrice: 單價

  • CustomerID: 消費者ID

  • Country: 城市


CustomerID Gender Age Income Zipcode Customer Segment
13089 male 53 High 8625 Small Business
15810 female 22 Low 87797 Small Business
15556 female 29 High 29257 Corporate


  • CustomerID: 消費者ID

  • Gender: 性別

  • Age: 年齡

  • Income: 收入

  • Zipcode: 郵遞區號

  • Customer Segment: 消費者分類

Combined data

CustomerID InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice Country Gender Age Income Zipcode Customer Segment
12346 C541433 23166 Medium Ceramic Top Storage Jar -74215 2011-01-18 10:17:00 1.04 United Kingdom male 29 Low 5526 Corporate
12346 541431 23166 Medium Ceramic Top Storage Jar 74215 2011-01-18 10:01:00 1.04 United Kingdom male 29 Low 5526 Corporate
12347 542237 84558A 3D Dog Picture Playing Cards 12 2011-01-26 14:30:00 2.95 Iceland female 30 High 45822 Small Business



Variable Number of class
CustomerID 4372
InvoiceNo 22190
StockCode 3684
Description 3885
InvoiceDate 20460
Country 37
Gender 2
Income 3
Zipcode 4282
Customer Segment 3

Quantity UnitPrice Age
Min. :-80995.00 Min. : 0.00 Min. :18.00
1st Qu.: 2.00 1st Qu.: 1.25 1st Qu.:27.00
Median : 5.00 Median : 1.95 Median :38.00
Mean : 12.06 Mean : 3.46 Mean :37.26
3rd Qu.: 12.00 3rd Qu.: 3.75 3rd Qu.:47.00
Max. : 80995.00 Max. :38970.00 Max. :55.00


CustomerID InvoiceNo StockCode Description Quantity InvoiceDate
12346 541431 23166 Medium Ceramic Top Storage Jar 74215 2011-01-18 10:01:00
12346 C541433 23166 Medium Ceramic Top Storage Jar -74215 2011-01-18 10:17:00
CustomerID InvoiceNo StockCode Description Quantity InvoiceDate
18268 C561590 84968A Set Of 16 Vintage Rose Cutlery -2 2011-07-28 11:16:00
18268 561680 84968A Set Of 16 Vintage Rose Cutlery 2 2011-07-28 19:13:00
CustomerID InvoiceNo StockCode Description Quantity InvoiceDate
18141 C538717 22457 Natural Slate Heart Chalkboard -12 2010-12-14 11:09:00
  • 退貨 (*´・д・)?


  • 猜測單價為美元,所以最小幣值為美分(0.01美元)
StockCode Description UnitPrice
1 23234 Biscuit Tin Vintage Christmas 0
2 22619 Set Of 6 Soldier Skittles 0
3 22385 Jumbo Bag Spaceboy Design 0
4 M Manual 0
5 23407 Set Of 2 Trays Home Sweet Home 0
6 M Manual 0
43 47566 Party Bunting 0
44 21208 Pastel Colour Honeycomb Fan 0
  • 贈品 ( •́ _ •̀)?


CustomerID InvoiceNo StockCode Description Quantity InvoiceDate
15749 540815 21108 Fairy Cake Flannel Assorted Colour 3114 2011-01-11 12:55:00
15749 C550456 21108 Fairy Cake Flannel Assorted Colour -3114 2011-04-18 13:08:00
15749 550461 21108 Fairy Cake Flannel Assorted Colour 3114 2011-04-18 13:20:00
15749 540815 21175 Gin + Tonic Diet Metal Sign 2000 2011-01-11 12:55:00
15749 C550456 21175 Gin + Tonic Diet Metal Sign -2000 2011-04-18 13:08:00
15749 550461 21175 Gin + Tonic Diet Metal Sign 2000 2011-04-18 13:20:00
15749 540818 47556B Tea Time Tea Towels 1300 2011-01-11 12:57:00
15749 550461 47556B Tea Time Tea Towels 1300 2011-04-18 13:20:00
15749 C550456 47566B Tea Time Party Bunting -1300 2011-04-18 13:08:00
15749 540818 48185 Doormat Fairy Cake 670 2011-01-11 12:57:00
15749 C550456 48185 Doormat Fairy Cake -670 2011-04-18 13:08:00
15749 550461 48185 Doormat Fairy Cake 670 2011-04-18 13:20:00
15749 540815 85123A White Hanging Heart T-Light Holder 1930 2011-01-11 12:55:00
15749 C550456 85123A White Hanging Heart T-Light Holder -1930 2011-04-18 13:08:00
15749 550461 85123A White Hanging Heart T-Light Holder 1930 2011-04-18 13:20:00


Variable Number of class (new) Number of class (old)
CustomerID 2782 4372
InvoiceNo 6955 22190
StockCode 3506 3684
Description 3638 3885
Country 32 37
Gender 2 2
Income 3 3
Zipcode 2746 4282
Customer Segment 3 3

  • 性別: female, male

  • 收入: High, Low, Medium

  • 消費者分類: Small Business, Middle class, Corporate

  • 發票日期: 2010-12-01 09:00:00, 2011-12-09 12:50:00

Quantity UnitPrice Age
Min. : 1.000 Min. : 0.040 Min. :18.00
1st Qu.: 2.000 1st Qu.: 1.250 1st Qu.:28.00
Median : 4.000 Median : 1.690 Median :37.00
Mean : 9.379 Mean : 2.843 Mean :36.76
3rd Qu.: 12.000 3rd Qu.: 3.450 3rd Qu.:46.00
Max. :4300.000 Max. :2033.100 Max. :55.00


  • Conjoint Analysis
  • Association Rule
  • Collaborative filtering

choose method

CustomerID InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice Country Gender Age Income Zipcode Customer Segment
12347 542237 84558A 3D Dog Picture Playing Cards 12 2011-01-26 14:30:00 2.95 Iceland female 30 High 45822 Small Business
  • Conjoint Analysis: 需要ranking,所以沒辦法做
  • Association Rule: 找出物件的組合 (以商品編號為主)
  • Collaborative filtering: 根據使用者和物品的相似性進行推薦,以有沒有買來推薦 (0/1)

Association Rule

Itemfrequency Plot (以消費者ID)

Itemfrequency Plot (以發票號碼)

  • 多了收納袋和檯燈,少了蛋杯和儲藏罐

Rules (Apriori, Supp=0.04, Conf=0.7)

  • 以消費者ID
LHS RHS support confidence coverage lift
{22698} {22697} 0.04 0.91 0.05 14.95
{22697} {22698} 0.04 0.70 0.06 14.95
{21136} {84879} 0.04 0.78 0.05 6.42
{22697} {22699} 0.05 0.76 0.06 12.32
{22699} {22697} 0.05 0.75 0.06 12.32
{22617} {22138} 0.04 0.83 0.05 7.38
{22804} {85123A} 0.04 0.89 0.05 5.42
{22578} {22577} 0.04 0.75 0.06 11.97
{22577} {22578} 0.04 0.72 0.06 11.97
{23300} {23301} 0.04 0.72 0.06 10.48
{21733} {85123A} 0.06 0.79 0.07 4.83

  • 以發票號碼
LHS RHS support confidence coverage lift
{22697} {22699} 0.02 0.73 0.03 23.53
{22699} {22697} 0.02 0.70 0.03 23.53
{22804} {85123A} 0.02 0.79 0.03 6.85
{22578} {22577} 0.02 0.75 0.03 25.20
{22726} {22727} 0.02 0.71 0.03 16.78
{23300} {23301} 0.03 0.75 0.03 17.64
{21733} {85123A} 0.03 0.70 0.04 6.07


code discription
22698 Pink Regency Teacup And Saucer
22697 Green Regency Teacup And Saucer
22699 Roses Regency Teacup And Saucer
22578 Wooden Star Christmas Scandinavian
22577 Wooden Heart Christmas Scandinavian
21733 Red Hanging Heart T-Light Holder
85123A White Hanging Heart T-Light Holder
22726 Alarm Clock Bakelike Green
22727 Alarm Clock Bakelike Red

Collaborative filtering


  • 以Quantity為評分,大於等於1為1,其餘為0
  • 保留 (中位數)
    • 買過超過27件物品的使用者
    • 被超過19個使用者購買過的物品

IBCF (method = “Jaccard”)

User 1
Childrens Cutlery Polkadot Green
Charlotte Bag Pink Polkadot
Red Retrospot Charlotte Bag
Charlotte Bag Suki Design
Charlotte Bag Dolly Girl Design
Strawberry Charlotte Bag

Item Recommend times
Lunch Bag Red Retrospot 124
Lunch Bag Pink Polkadot 114
Lunch Bag Spaceboy Design 111
Lunch Bag Apple Design 110

