Weka链接Mysql数据库

Weka简介

  Weka的全名是怀卡托智能分析环境(Waikato Environment for Knowledge Analysis),是一款免费的,非商业化(与之对应的是SPSS公司商业数据挖掘产品--Clementine )的,基于JAVA环境下开源的机器学习(machine learning)以及数据挖掘(data minining)软件。

Weka数据格式

WEKA存储数据的格式是ARFF(Attribute-Relation File Format)文件,这是一种ASCII文本文件。二维表格存储在如下的ARFF文件中。这也就是WEKA自带的“weather.arff” 文件,在WEKA安装目录的“data”子目录下可以找到。
代码:
% ARFF file for the weather data with some numric features
%
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
%
% 14 instances
%
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

Mysql简介

  MySQL是一个关系型数据库管理系统,由瑞典MySQL AB公司开发,目前属于Oracle公司。MySQL是一种关联数据库管理系统,关联数据库将数据保存在不同的表中,而不是将所有数据放在一个大仓库内,这样就增加了速度并提高了灵活性。MySQL所使用的SQL语言是用于访问数据库的最常用标准化语言。MySQL软件采用了双授权政策(本词条“授权政策”),它分为社区版和商业版,由于其体积小、速度快、总体拥有成本低,尤其是开放源码这一特点,一般中小型网站的开发都选择MySQL作为网站数据库。由于其社区版的性能卓越,搭配PHPApache可组成良好的开发环境。

Weka直接连接Mysql

由于Weka数据格式的特殊性,如果想在Weka中处理数据,必须首先将数据的格式转化成ARFF格式,所以需要经历SQL->ARFF的转化,比较麻烦,但是Weka已经为此做了充分的准备,只需简单配置就可在Weka GUI上直接连接操作Mysql数据库。

准备工作:

Java运行环境

Weka安装

mysql-connector-java-5.1.26-bin.jar

详细配置步骤:

  在weka的安装目录下新建lib文件夹,将mysql-connector-java-5.1.26-bin.jar包复制到此lib文件夹下,并且在%JAVA_HOME%\jre\lib\ext"下也复制一份mysql-connector-java-5.1.6-bin.jar。

  在weka的安装目录下找到weka.jar,将其解压到当前目录,你会看到多出来一个名为weka的文件夹,进到此文件夹目录下,找到experiment文件夹下的DatabaseUtils.props.mysql,将其改名为DatabaseUtils.props,替换原有的DatabaseUtils.props文件,并将其修改文件里的以下内容:

  1 # Database settings for MySQL 3.23.x, 4.x
  2 #
  3 # General information on database access can be found here:
  4 # http://weka.wikispaces.com/Databases
  5 #
  6 # url:     http://www.mysql.com/
  7 # jdbc:    http://www.mysql.com/products/connector/j/
  8 # author:  Fracpete (fracpete at waikato dot ac dot nz)
  9 # version: $Revision: 5836 $
 10 
 11 # JDBC driver (comma-separated list)
 12 #jdbcDriver=org.gjt.mm.mysql.Driver
 13 jdbcDriver=com.mysql.jdbc.Driver
 14 
 15 # database URL
 16 #jdbcURL=jdbc:mysql://server_name:3306/database_name
 17 jdbcURL=jdbc:mysql://localhost:3306/rtest
 18 # specific data types
 19 # string, getString() = 0;    --> nominal
 20 # boolean, getBoolean() = 1;  --> nominal
 21 # double, getDouble() = 2;    --> numeric
 22 # byte, getByte() = 3;        --> numeric
 23 # short, getByte()= 4;        --> numeric
 24 # int, getInteger() = 5;      --> numeric
 25 # long, getLong() = 6;        --> numeric
 26 # float, getFloat() = 7;      --> numeric
 27 # date, getDate() = 8;        --> date
 28 # text, getString() = 9;      --> string
 29 # time, getTime() = 10;       --> date
 30 
 31 # specific data types
 32  string, getString() = 0;    --> nominal
 33  boolean, getBoolean() = 1;  --> nominal
 34  double, getDouble() = 2;    --> numeric
 35  byte, getByte() = 3;        --> numeric
 36  short, getByte()= 4;        --> numeric
 37  int, getInteger() = 5;      --> numeric
 38  long, getLong() = 6;        --> numeric
 39  float, getFloat() = 7;      --> numeric
 40  date, getDate() = 8;        --> date
 41  text, getString() = 9;      --> string
 42  time, getTime() = 10;       --> date
 43 TINYINT=3
 44 SMALLINT=4
 45 #SHORT=4
 46 SHORT=5
 47 INTEGER=5
 48 INT=5
 49 INT_UNSIGNED=6
 50 BIGINT=6
 51 LONG=6
 52 REAL=7
 53 NUMERIC=2
 54 DECIMAL=2
 55 FLOAT=2
 56 DOUBLE=2
 57 CHAR=0
 58 TEXT=0
 59 VARCHAR=0
 60 LONGVARCHAR=9
 61 BINARY=0
 62 VARBINARY=0
 63 LONGVARBINARY=9
 64 BIT=1
 65 BLOB=9
 66 DATE=8
 67 TIME=8
 68 DATETIME=8
 69 TIMESTAMP=8
 70 
 71 # other options
 72 CREATE_DOUBLE=DOUBLE
 73 CREATE_STRING=TEXT
 74 CREATE_INT=INT
 75 CREATE_DATE=DATETIME
 76 DateFormat=yyyy-MM-dd HH:mm:ss
 77 checkUpperCaseNames=false
 78 checkLowerCaseNames=false
 79 checkForTable=true
 80 
 81 # All the reserved keywords for this database
 82 # Based on the keywords listed at the following URL (2009-04-13):
 83 # http://dev.mysql.com/doc/mysqld-version-reference/en/mysqld-version-reference-reservedwords-5-0.html
 84 Keywords= 85   ADD, 86   ALL, 87   ALTER, 88   ANALYZE, 89   AND, 90   AS, 91   ASC, 92   ASENSITIVE, 93   BEFORE, 94   BETWEEN, 95   BIGINT, 96   BINARY, 97   BLOB, 98   BOTH, 99   BY,100   CALL,101   CASCADE,102   CASE,103   CHANGE,104   CHAR,105   CHARACTER,106   CHECK,107   COLLATE,108   COLUMN,109   COLUMNS,110   CONDITION,111   CONNECTION,112   CONSTRAINT,113   CONTINUE,114   CONVERT,115   CREATE,116   CROSS,117   CURRENT_DATE,118   CURRENT_TIME,119   CURRENT_TIMESTAMP,120   CURRENT_USER,121   CURSOR,122   DATABASE,123   DATABASES,124   DAY_HOUR,125   DAY_MICROSECOND,126   DAY_MINUTE,127   DAY_SECOND,128   DEC,129   DECIMAL,130   DECLARE,131   DEFAULT,132   DELAYED,133   DELETE,134   DESC,135   DESCRIBE,136   DETERMINISTIC,137   DISTINCT,138   DISTINCTROW,139   DIV,140   DOUBLE,141   DROP,142   DUAL,143   EACH,144   ELSE,145   ELSEIF,146   ENCLOSED,147   ESCAPED,148   EXISTS,149   EXIT,150   EXPLAIN,151   FALSE,152   FETCH,153   FIELDS,154   FLOAT,155   FLOAT4,156   FLOAT8,157   FOR,158   FORCE,159   FOREIGN,160   FROM,161   FULLTEXT,162   GOTO,163   GRANT,164   GROUP,165   HAVING,166   HIGH_PRIORITY,167   HOUR_MICROSECOND,168   HOUR_MINUTE,169   HOUR_SECOND,170   IF,171   IGNORE,172   IN,173   INDEX,174   INFILE,175   INNER,176   INOUT,177   INSENSITIVE,178   INSERT,179   INT,180   INT1,181   INT2,182   INT3,183   INT4,184   INT8,185   INTEGER,186   INTERVAL,187   INTO,188   IS,189   ITERATE,190   JOIN,191   KEY,192   KEYS,193   KILL,194   LABEL,195   LEADING,196   LEAVE,197   LEFT,198   LIKE,199   LIMIT,200   LINES,201   LOAD,202   LOCALTIME,203   LOCALTIMESTAMP,204   LOCK,205   LONG,206   LONGBLOB,207   LONGTEXT,208   LOOP,209   LOW_PRIORITY,210   MATCH,211   MEDIUMBLOB,212   MEDIUMINT,213   MEDIUMTEXT,214   MIDDLEINT,215   MINUTE_MICROSECOND,216   MINUTE_SECOND,217   MOD,218   MODIFIES,219   NATURAL,220   NOT,221   NO_WRITE_TO_BINLOG,222   NULL,223   NUMERIC,224   ON,225   OPTIMIZE,226   OPTION,227   OPTIONALLY,228   OR,229   ORDER,230   OUT,231   OUTER,232   OUTFILE,233   PRECISION,234   PRIMARY,235   PRIVILEGES,236   PROCEDURE,237   PURGE,238   READ,239   READS,240   REAL,241   REFERENCES,242   REGEXP,243   RELEASE,244   RENAME,245   REPEAT,246   REPLACE,247   REQUIRE,248   RESTRICT,249   RETURN,250   REVOKE,251   RIGHT,252   RLIKE,253   SCHEMA,254   SCHEMAS,255   SECOND_MICROSECOND,256   SELECT,257   SENSITIVE,258   SEPARATOR,259   SET,260   SHOW,261   SMALLINT,262   SONAME,263   SPATIAL,264   SPECIFIC,265   SQL,266   SQLEXCEPTION,267   SQLSTATE,268   SQLWARNING,269   SQL_BIG_RESULT,270   SQL_CALC_FOUND_ROWS,271   SQL_SMALL_RESULT,272   SSL,273   STARTING,274   STRAIGHT_JOIN,275   TABLE,276   TABLES,277   TERMINATED,278   THEN,279   TINYBLOB,280   TINYINT,281   TINYTEXT,282   TO,283   TRAILING,284   TRIGGER,285   TRUE,286   UNDO,287   UNION,288   UNIQUE,289   UNLOCK,290   UNSIGNED,291   UPDATE,292   UPGRADE,293   USAGE,294   USE,295   USING,296   UTC_DATE,297   UTC_TIME,298   UTC_TIMESTAMP,299   VALUES,300   VARBINARY,301   VARCHAR,302   VARCHARACTER,303   VARYING,304   WHEN,305   WHERE,306   WHILE,307   WITH,308   WRITE,309   XOR,310   YEAR_MONTH,311   ZEROFILL
312 
313 # The character to append to attribute names to avoid exceptions due to
314 # clashes between keywords and attribute names
315 KeywordsMaskChar=_
316 
317 #flags for loading and saving instances using DatabaseLoader/Saver
318 nominalToStringLimit=50
319 idColumn=auto_generated_id
View Code

  然后将weka文件夹打包成weka.jar,替换原来的weka.jar。运行weka,选择open DB,选择user,输入用户名和密码,点击connect,info显示connecting to:jdbc:mysql://localhost:3306/myweka = true,代表连接成功。Explorer就从数据库中载入数据集了。

郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。