OkCupid
Icon 1024x1024 OkCupid is an online dating site that serves international users. Kim and Escobedo-Land (2015) descr...
2018
16/03
 
  Partecipanti 56 Sottomissioni 1106  
 

OkCupid is an online dating site that serves international users. Kim and Escobedo-Land (2015) describe a data set where over 50,000 profiles from the San Francisco area were made available.

The goal will be to predict whether a person’s profession is in the STEM fields (science, technology, engineering, and math) using a random sample of the overall dataset.

Submissions are evaluated by the Area Under the Curve (AUC).

During the competition, the leaderboard displays your partial score, which is the AUC for 500 (random) observations of the test set.
At the end of the contest, the leaderboard will display the final score, which is the AUC for the remaining 500 observations of the test set. The final score will determine the final winner. This method prevents users from overfitting to the leaderboard.

Max. team size = 3

train <- read.csv(“101.csv”, stringsAsFactors=T)
test <- read.csv(“102.csv”, , stringsAsFactors=F)
test$Class = NA
n = nrow(train)
m = nrow(test)
combi = rbind(train,test)
train = combi[1:n,]
test = combi[(n+1):(n+m),]

library(rpart)
fit = rpart(Class ~ ., data=train)
phat = predict(fit, newdata=test)[,“stem”,drop=F]

write.table(file=“myokcupid.txt”, phat, row.names = FALSE, col.names = FALSE)

In their original form, the data set contains several types of variables:
- open text essays related to an individual’s interests and personal descriptions,
- single choice type fields such as profession, diet, gender, body type, and education, and
- multiple choice fields such as languages spoken and fluency in programming languages

See “okcupid_codebook.txt” and Kim and Escobedo-Land (2015) for a description of the original data.

Almost all of raw data fields are discrete in nature; only age was numeric. The categorical predictors and the open text data were converted to dummy variables.

The data set to be used is a random sample of the overall dataset. It contains 90 predictors and 1 response variable (Class, with 2 levels: “other” and “stem”). The train set has n=4000 observations, the test set has m=1000 observations.




train train_okc.csv
2 MB
test test_okc.csv
400 KB
okcupid_codebook okcupid_codebook.txt
4 KB
Per partecipare bisogna prima autenticarsi
# Nome Punteggio Prove Ultima prova
1 sarasixti FINALE 84.12% 11 15.11.2018
11:15
2 e.bagnati FINALE 84.12% 8 15.11.2018
11:40
3 m.caronte FINALE 84.12% 7 15.11.2018
13:34
4 s.panizza5 FINALE 83.40% 50 15.11.2018
23:50
5 f.nigro5 FINALE 83.40% 32 15.11.2018
22:06
6 Francesco Bongini FINALE 82.68% 42 09.11.2018
18:10
7 l.bassanese FINALE 82.56% 21 15.11.2018
20:11
8 alaeddine.ayadi FINALE 82.34% 11 03.11.2018
15:36
9 r.buzzini FINALE 82.04% 28 15.11.2018
13:20
10 g.floriani FINALE 82.04% 3 15.11.2018
15:16
11 marcello.sbordi FINALE 82.01% 11 16.11.2018
08:14
12 a.gaffuririva FINALE 82.01% 1 16.11.2018
00:16
13 giorgia.modafferi FINALE 82.01% 1 16.11.2018
09:46
14 d.piovesana FINALE 81.98% 30 16.11.2018
09:34
15 e.pelagalli FINALE 81.98% 28 15.11.2018
23:40
16 m.abbiati FINALE 81.90% 37 15.11.2018
18:36
17 d.casamassima FINALE 81.90% 4 15.11.2018
21:54
18 k.arguirov FINALE 81.56% 24 15.11.2018
18:25
19 f.melograna1 FINALE 81.56% 3 15.11.2018
21:08
20 e.benincasa1 FINALE 81.56% 2 15.11.2018
21:14
21 d.parimbelli2 FINALE 81.04% 47 15.11.2018
09:14
22 d.casiraghi FINALE 81.04% 12 15.11.2018
09:05
23 fabio.marigo FINALE 80.91% 73 16.11.2018
08:08
24 l.mandelli17 FINALE 80.91% 49 16.11.2018
10:39
25 giabellianna FINALE 80.91% 11 15.11.2018
21:50
26 i.bessone FINALE 80.82% 7 15.11.2018
09:45
27 l.gregori1 FINALE 80.82% 6 15.11.2018
09:51
28 alessandra.pellegata FINALE 80.82% 5 08.11.2018
13:37
29 p.dangelo4 FINALE 80.59% 8 13.11.2018
11:15
30 s.turi FINALE 80.38% 24 16.11.2018
09:20
31 g.grazzi FINALE 80.38% 6 13.11.2018
17:32
32 c.crippa19 FINALE 80.32% 39 15.11.2018
14:18
33 i.iacoban FINALE 80.32% 29 15.11.2018
14:25
34 chiara.aldeghi95 FINALE 80.32% 17 15.11.2018
14:26
35 n.ghioldi FINALE 80.28% 21 15.11.2018
17:44
36 a.caciolo FINALE 80.28% 19 15.11.2018
10:40
37 a.giampino FINALE 80.27% 14 09.11.2018
08:25
38 a.ongaro3 FINALE 80.27% 13 08.11.2018
17:00
39 f.bekollari FINALE 80.27% 6 13.11.2018
13:10
40 i.belleri FINALE 80.24% 3 15.11.2018
20:12
41 a.spataro2 FINALE 80.24% 40 15.11.2018
23:39
42 d.perrini FINALE 80.23% 51 15.11.2018
15:22
43 m.sciartilli FINALE 80.23% 19 16.11.2018
00:18
44 l.giordano8 FINALE 80.07% 7 15.11.2018
20:17
45 valyde FINALE 80.07% 4 15.11.2018
20:35
46 beatrice.somaschini21 FINALE 79.89% 28 15.11.2018
18:26
47 NicholasMissineo FINALE 79.89% 25 16.11.2018
00:14
48 Elisa Pirotta FINALE 79.89% 4 16.11.2018
00:09
49 RuudGullit FINALE 79.59% 16 04.11.2018
15:38
50 f.logiudice1 FINALE 78.98% 6 15.11.2018
13:07
51 sonia_cucchi FINALE 78.98% 22 16.11.2018
09:44
52 s.pisaniello2 FINALE 78.98% 11 15.11.2018
15:35
53 joana.curri FINALE 78.33% 84 16.11.2018
08:34
54 g.saccaggi FINALE 78.33% 22 16.11.2018
09:26
55 benchmark FINALE 59.03% 4 19.10.2018
09:47