Original data sets |
|
Problem | Response | Explanatory (descriptor) variables | Folds for cross validation |
Protein Homology ( Explanation ) |
kdd_act.txt | kdd_train.txt | |
Reorganize kdd datasets: kdd_organize_data.R |
Methods | R Code | Plots (*.pdf) | Numerical results |
May 27 2004 Submitted by Fei: compare the blocks in the training/test data; find out how class 0 and 1 are distributed in the blocks of training set | |||
May 27 2004 Submitted by Hui: plot "Hit Rate ~ Block ID" using all training data ( HitRate.pdf ) | |||
May 27 2004 Submitted by Yi: plot the Kernel densities of 74 original explanatory variables using all training data | |||
May 27 2004 Submitted by Fei: plot the Kernel densities of 74 original explanatory variables using block 7 and 244 from training data | |||
May 27 2004 Submitted by Yi: plot the Kernel densities of first 15 PCs, which were calculated using all training data | |||
May 27 2004 Submitted by Fei: plot the Kernel densities of first 15 PCs, which were calculated by randomly sampling five blocks from 153 blocks of all training data |
|||
May 27 2004 Submitted by Guohua: how to use Perf? Answer:perfMeas.pdf | |||
May 31 2004 Submitted by Fei: try to find important variables by applying tree methods on single blocks/the whole training data( unpruned / pruned trees)
|
|||
June 3 2004 Submitted by Fei: Randomly sample 76 blocks from the 153 blocks of the training dataset and store the block numbers of the sample into kddSamplBlocks.mtx How to load this file into R ? Answer: source("http://hajek.stat.ubc.ca/~fyuan/rcode/readmtx.R") sampleBlocks<-read.mtx("kddSamplBlocks.mtx") |
|||
June 4 2004 Submitted by GuoHua:
| |||
my.R | |||
my.R | |||
June 14 2004 Submitted by Fei: How to call this Perf() in R? Choice 1: dyn.load("myperf3.o") MyArgv<-c("perf","-top1","-rms","-rkl","-apr","-blocks","-files","./temp0.txt", "./temp1.txt") MyArgv<-as.character(MyArgv) MyArgc<-length(MyArgv) MyArgc<-as.integer(MyArgc) myout<-rep(0.0,4) storage.mode(myout)<-"double" res<-.C("perf", MyArgc,MyArgv,out=myout)$out Choice 2: dyn.load("myperf3.o") MyArgv<-c("perf","-top1","-rms","-rkl","-apr","-blocks","-file","./temp.txt") MyArgv<-as.character(MyArgv) MyArgc<-length(MyArgv) MyArgc<-as.integer(MyArgc) myout<-rep(0.0,4) storage.mode(myout)<-"double" res<-.C("perf", MyArgc,MyArgv,out=myout)$out Sample results:> res<-.C("perf", MyArgc,MyArgv,out=myout)$outMEAN_BLOCK_APR 0.25000 MEAN_BLOCK_RKL 2.00000 MEAN_BLOCK_RMS 0.57614 MEAN_BLOCK_TOP1 0.50000 > res [1] 0.2500000 2.0000000 0.5761375 0.5000000 > Notes:The returned values are store in the vector "res" in the order: MEAN_BLOCK_APR, MEAN_BLOCK_RKL, MEAN_BLOCK_RMS, MEAN_BLOCK_TOP1 |
|||
June 22 2004 Submitted by Fei: Results for 2-fold crossvalidation LDA ( download kdd_lda_whole.txt ) > res<-.C("perf", MyArgc,MyArgv,out=myout)$out MEAN_BLOCK_APR 0.45452 MEAN_BLOCK_RKL 338.35948 MEAN_BLOCK_RMS 0.04338 MEAN_BLOCK_TOP1 0.83660 | |||
June 22 2004 Submitted by Fei: Results for nearest neighbor logistic regression ( download kdd_log_weiliang.txt ) res<-.C("perf", MyArgc,MyArgv,out=myout)$out MEAN_BLOCK_APR 0.47385 MEAN_BLOCK_RKL 172.33987 MEAN_BLOCK_RMS 0.04565 MEAN_BLOCK_TOP1 0.66667 | |||
June 22 2004 Submitted by Fei: Functions that are borrowed from Weiliang for calcalating nearest neighbor logistic regression matrix_position.R separation.R CallWei.R | |||
June 24 2004 Submitted by Yi: kel function: | |||
June 30 2004 Submitted by Fei: How to assemble 2-fold esitmated probabilities? Answer: sample code | |||
July 5 2004 Submitted by Yi: Format of the results for each case by all four methods: Col1: Block ID and hit case ID Col2 to Col6: the probabilities of each case in class 1 for Kernel estimate(Ker), Lda, KNN, outlier approach(Maha) and Logistic method(Logis) Col7 to Col11: the rank of of each case in class 1 according to probabilities for Kernel estimate(Ker), Lda, KNN, outlier approach(Maha) and Logistic method(Logis) | |||
July 5 2004 Submitted by Fei: Download new Perf: newperf.o | |||
July 9 2004 Submitted by Fei: Test four subset of explanatory variables using 153-fold cross-validation: result | |||
July 9 2004 Submitted by Guohua: Guohua's subsets of explanatory variables: subsetOfXs.txt | |||
July 12 2004 Submitted by Fei: some probabilities: kdd_153_log_var.txt | |||
July 13 2004 Submitted by Fei & Hui: Results for the new subset of Xs: kdd-cv-logis.doc |