WEKA Steps for Loading Data: Difference between revisions

From Rasulev Lab Wiki
Jump to navigation Jump to search
(Imported from text file)
 
(Revised formatting)
 
Line 1: Line 1:
__TOC__
__TOC__


<div title="header">
====== Steps for Loading Data into WEKA ======
ARFF format consists of three parts: '''@RELATION''', '''@ATTRIBUTE''' and '''@DATA'''.


{|
* @RELATION name
|width="33%"| <br />
* @ATTRIBUTE  descriptor_name data_type (numeric, nominal …)
|width="33%"| <br />
* @DATA: numbers (integer or real) or strings<span id="_GoBack"></span>General rules for ARFF file can be found here: https://www.cs.waikato.ac.nz/ml/weka/arff.html
|width="33%"| <span style="background: #c0c0c0">0</span>
|}


<br />
====== In Excel: ======
 
* File 1: the format in this file is three columns for: “'''@ATTRIBUTE'''”, the descriptors’ names, and “'''NUMERIC'''”
 
 
</div>
Steps for Loading Data into WEKA
 
<br />
 
 
ARFF format consists of three parts: @RELATION, @ATTRIBUTE and @DATA.
 
<br />
 
 
@RELATION ''“SPACE”'' name
 
@ATTRIBUTE ''“SPACE”'' descriptor name ''“SPACE”'' data type (numeric, nominal …)
 
<span id="_GoBack"></span> @DATA: numbers (integer or real) or strings
 
<br />
 
 
General rules for ARFF file can be found here
 
https://www.cs.waikato.ac.nz/ml/weka/arff.html
 
(Search “ARFF File Format”)
 
<br />
 
 
In Excel:
 
* File 1: the format in this file is three columns for: “@ATTRIBUTE”, the descriptors’ names, and “NUMERIC”
** Open the txt data file in Excel. Make sure you are searching from “All Files”.
** Open the txt data file in Excel. Make sure you are searching from “All Files”.


[[File:WEKA_Steps_for_Loading_Data_HTML_4ec5c9075dd9cb51.png|512x288px]] [[File:WEKA_Steps_for_Loading_Data_HTML_ae35e05dbfaa6f7c.gif|31x38px|Shape1]]
[[File:WEKA_Steps_for_Loading_Data_HTML_4ec5c9075dd9cb51.png|512x288px]] [[File:WEKA_Steps_for_Loading_Data_HTML_ae35e05dbfaa6f7c.gif|31x38px|Shape1]]


''Figure <span style="background: #c0c0c0">1</span>, Opening your data file''
''Figure 1, Opening your data file''


* When '''Text Import Wizard''' prompts, choose '''Delimited and''' click on '''Next''' ('''step 1'''), check the '''Tab''' box ('''step 2''') and click on '''Finish'''. (Leave the rest as default unless it is necessary to change.)
* When '''Text Import Wizard''' prompts, choose '''Delimited and''' click on '''Next''' ('''step 1'''), check the '''Tab''' box ('''step 2''') and click on '''Finish'''. (Leave the rest as default unless it is necessary to change.)
Line 55: Line 20:
[[File:WEKA_Steps_for_Loading_Data_HTML_d288ed8722be4fdc.png|393x300px]] [[File:WEKA_Steps_for_Loading_Data_HTML_369e56d90bf80141.gif|32x31px|Shape3]] [[File:WEKA_Steps_for_Loading_Data_HTML_bef7f76831abec8b.gif|14x32px|Shape2]]
[[File:WEKA_Steps_for_Loading_Data_HTML_d288ed8722be4fdc.png|393x300px]] [[File:WEKA_Steps_for_Loading_Data_HTML_369e56d90bf80141.gif|32x31px|Shape3]] [[File:WEKA_Steps_for_Loading_Data_HTML_bef7f76831abec8b.gif|14x32px|Shape2]]


''Figure <span style="background: #c0c0c0">2</span>, Step 1''
''Figure 2, Step 1''


[[File:WEKA_Steps_for_Loading_Data_HTML_cd41ce313e7118b1.png|379x288px]] [[File:WEKA_Steps_for_Loading_Data_HTML_859149fac76acd64.gif|30x19px|Shape5]] [[File:WEKA_Steps_for_Loading_Data_HTML_6092b35113c6658c.gif|19x33px|Shape4]]
[[File:WEKA_Steps_for_Loading_Data_HTML_cd41ce313e7118b1.png|379x288px]] [[File:WEKA_Steps_for_Loading_Data_HTML_859149fac76acd64.gif|30x19px|Shape5]] [[File:WEKA_Steps_for_Loading_Data_HTML_6092b35113c6658c.gif|19x33px|Shape4]]


''Figure <span style="background: #c0c0c0">3</span>, Step 2''
''Figure 3, Step 2''


* Create three blank columns.
* Create three blank columns.
Line 67: Line 32:
[[File:WEKA_Steps_for_Loading_Data_HTML_5e70e797a5c2cba0.png|313x156px]] [[File:WEKA_Steps_for_Loading_Data_HTML_634c3e41e8a46d53.gif|59x34px|Shape6]]
[[File:WEKA_Steps_for_Loading_Data_HTML_5e70e797a5c2cba0.png|313x156px]] [[File:WEKA_Steps_for_Loading_Data_HTML_634c3e41e8a46d53.gif|59x34px|Shape6]]


''Figure <span style="background: #c0c0c0">4</span>, Cell format''
''Figure 4, Cell format''


* For the first and third columns: make equal number of rows as the second column has for “@ATTRIBUTE” and “NUMERIC”, respectively.
* For the first and third columns: make equal number of rows as the second column has for “@ATTRIBUTE” and “NUMERIC”, respectively.
Line 74: Line 39:
[[File:WEKA_Steps_for_Loading_Data_HTML_e8fa2497f6fc4487.png|198x337px]]
[[File:WEKA_Steps_for_Loading_Data_HTML_e8fa2497f6fc4487.png|198x337px]]


''Figure <span style="background: #c0c0c0">5</span>, the three columns''
''Figure 5, the three columns''
 
<br />
 
 
* File 2: Here we separate the @DATA part (only numbers) of data.
* File 2: Here we separate the @DATA part (only numbers) of data.
** Keep only the numbers needed and delete everything else.
** Keep only the numbers needed and delete everything else.
** Save it in '''CSV (comma delimited)''' format. ('''Click YES''' when asks “Some features in your workbook … Do you want to keep using that format?”)
** Save it in '''CSV (comma delimited)''' format. ('''Click YES''' when asks “Some features in your workbook … Do you want to keep using that format?”)
<br />


[[File:WEKA_Steps_for_Loading_Data_HTML_d4bc2328d2c3083f.png|279x64px]] [[File:WEKA_Steps_for_Loading_Data_HTML_4fc01181394c3a93.gif|43x28px|Shape7]]
[[File:WEKA_Steps_for_Loading_Data_HTML_d4bc2328d2c3083f.png|279x64px]] [[File:WEKA_Steps_for_Loading_Data_HTML_4fc01181394c3a93.gif|43x28px|Shape7]]


''Figure <span style="background: #c0c0c0">6</span>, Save as CSV''
''Figure 6, Save as CSV''
 
<br />
 
 
In Notepad++:


====== In Notepad++: ======
* File A:
* File A:
** In the first row, create two columns: @RELATION and a title for the relation name (which are just separated by a space).
** In the first row, create two columns: @RELATION and a title for the relation name (which are just separated by a space).
Line 103: Line 57:
[[File:WEKA_Steps_for_Loading_Data_HTML_1f0af9e857ead597.png|247x204px]]
[[File:WEKA_Steps_for_Loading_Data_HTML_1f0af9e857ead597.png|247x204px]]


''Figure <span style="background: #c0c0c0">7</span>, start of the File A''
''Figure 7, start of the File A''


* Depending on your data, the last @ATTRIBUTE row will need to be the response variable AKA the thing you are trying to predict. (CATEGORICAL may require this, NUMERIC may not)
* File A (cont):
* After all ATTRIBUTE information has been pasted leave 1 row blank ''(For aesthetics'')
** Depending on your data, the last @ATTRIBUTE row will need to be the response variable AKA the thing you are trying to predict. (CATEGORICAL may require this, NUMERIC may not
* After blank row Type “@DATA”.
** After all ATTRIBUTE information has been pasted leave 1 row blank ''(For aesthetics'')
* ''(For aesthetics:'' For the next row after “@DATA”, leave it blank.)
** After blank row Type “@DATA”.
* Example: [[File:WEKA_Steps_for_Loading_Data_HTML_ca4125bd73bd6170.png|460x172px]]
** ''(For aesthetics:'' For the next row after “@DATA”, leave it blank.)
 
** Example:
''Figure <span style="background: #c0c0c0">8</span>, middle of File A''
 
<br />


[[File:WEKA_Steps_for_Loading_Data_HTML_ca4125bd73bd6170.png|460x172px]]


''Figure 8, middle of File A''
* File B:
* File B:
** Open File 2 from Notepad++ (and you should see the data are separated by commas).
** Open File 2 from Notepad++ (and you should see the data are separated by commas).
** Example: [[File:WEKA_Steps_for_Loading_Data_HTML_964b046e1656a7f.png|555x225px]]
** Example:
[[File:WEKA_Steps_for_Loading_Data_HTML_964b046e1656a7f.png|555x225px]]


''Figure <span style="background: #c0c0c0">9</span>, File 2 open with NotePad++''
''Figure <span style="background: #c0c0c0">9</span>, File 2 open with NotePad++''


* Copy and paste all the data to File A.
* Copy and paste all the data to File A.
<br />


* Already returned to File A:
* Already returned to File A:
** Save the file as “arff” format (by adding “.arff” at the end of the file name).
** Save the file as “arff” format (by adding “.arff” at the end of the file name).
<br />


'''The file is ready to open and run in WEKA (yay).'''
'''The file is ready to open and run in WEKA (yay).'''


PS: (1) “%” would give you error if included in the descriptors’ names, specifically such as '''“%N” or “%O”''', after “@ATTRIBUTE”. The error would occur because whatever after “%” is considered as comment, then in WEKA, it would be interpreted as a missing information for the descriptor name and data type. Simply remove “%” would avoid errors. (2)Also make sure the number of descriptors matches with the number of data in the “@DATA” section. (If you have 10 lines of @ATTRIBUTE + descriptors’ names, there should have 10 numbers in each line in @DATA part in the Notepad++.


<br />
PS: (1) “%” would give you error if included in the descriptors’ names, specifically such as '''“%N” or “%O”''', after “@ATTRIBUTE”. The error would occur because whatever after “%” is considered as comment, then in WEKA, it would be interpreted as a missing information for the descriptor name and data type. Simply remove “%” would avoid errors. (2)Also make sure the number of descriptors matches with the number of data in the “@DATA” section. (If you have 10 lines of @ATTRIBUTE + descriptors’ names, there should have 10 numbers in each line in @DATA part in the Notepad++.)<br />
<br />

Latest revision as of 20:39, 26 September 2022

Steps for Loading Data into WEKA

ARFF format consists of three parts: @RELATION, @ATTRIBUTE and @DATA.

In Excel:
  • File 1: the format in this file is three columns for: “@ATTRIBUTE”, the descriptors’ names, and “NUMERIC
    • Open the txt data file in Excel. Make sure you are searching from “All Files”.

WEKA Steps for Loading Data HTML 4ec5c9075dd9cb51.png Shape1

Figure 1, Opening your data file

  • When Text Import Wizard prompts, choose Delimited and click on Next (step 1), check the Tab box (step 2) and click on Finish. (Leave the rest as default unless it is necessary to change.)

WEKA Steps for Loading Data HTML d288ed8722be4fdc.png Shape3 Shape2

Figure 2, Step 1

WEKA Steps for Loading Data HTML cd41ce313e7118b1.png Shape5 Shape4

Figure 3, Step 2

  • Create three blank columns.
  • Copy the descriptors’ names (they are at the first row of your data file, normally) and paste them in the vertical form by using “Transpose” pasting option to the second column.
  • Make sure the cell format is Text before the next step.

WEKA Steps for Loading Data HTML 5e70e797a5c2cba0.png Shape6

Figure 4, Cell format

  • For the first and third columns: make equal number of rows as the second column has for “@ATTRIBUTE” and “NUMERIC”, respectively.
  • Example:

WEKA Steps for Loading Data HTML e8fa2497f6fc4487.png

Figure 5, the three columns

  • File 2: Here we separate the @DATA part (only numbers) of data.
    • Keep only the numbers needed and delete everything else.
    • Save it in CSV (comma delimited) format. (Click YES when asks “Some features in your workbook … Do you want to keep using that format?”)

WEKA Steps for Loading Data HTML d4bc2328d2c3083f.png Shape7

Figure 6, Save as CSV

In Notepad++:
  • File A:
    • In the first row, create two columns: @RELATION and a title for the relation name (which are just separated by a space).
    • (For aesthetics: leave the second row blank.)
    • Copy the three columns in File 1 of Excel and paste into row3.
    • Example:

WEKA Steps for Loading Data HTML 1f0af9e857ead597.png

Figure 7, start of the File A

  • File A (cont):
    • Depending on your data, the last @ATTRIBUTE row will need to be the response variable AKA the thing you are trying to predict. (CATEGORICAL may require this, NUMERIC may not
    • After all ATTRIBUTE information has been pasted leave 1 row blank (For aesthetics)
    • After blank row Type “@DATA”.
    • (For aesthetics: For the next row after “@DATA”, leave it blank.)
    • Example:

WEKA Steps for Loading Data HTML ca4125bd73bd6170.png

Figure 8, middle of File A

  • File B:
    • Open File 2 from Notepad++ (and you should see the data are separated by commas).
    • Example:

WEKA Steps for Loading Data HTML 964b046e1656a7f.png

Figure 9, File 2 open with NotePad++

  • Copy and paste all the data to File A.
  • Already returned to File A:
    • Save the file as “arff” format (by adding “.arff” at the end of the file name).

The file is ready to open and run in WEKA (yay).


PS: (1) “%” would give you error if included in the descriptors’ names, specifically such as “%N” or “%O”, after “@ATTRIBUTE”. The error would occur because whatever after “%” is considered as comment, then in WEKA, it would be interpreted as a missing information for the descriptor name and data type. Simply remove “%” would avoid errors. (2)Also make sure the number of descriptors matches with the number of data in the “@DATA” section. (If you have 10 lines of @ATTRIBUTE + descriptors’ names, there should have 10 numbers in each line in @DATA part in the Notepad++.)