WEKA Steps for Loading Data: Difference between revisions

Latest revision as of 20:39, 26 September 2022

Steps for Loading Data into WEKA

ARFF format consists of three parts: @RELATION, @ATTRIBUTE and @DATA.

@RELATION name
@ATTRIBUTE descriptor_name data_type (numeric, nominal …)
@DATA: numbers (integer or real) or stringsGeneral rules for ARFF file can be found here: https://www.cs.waikato.ac.nz/ml/weka/arff.html

In Excel:

File 1: the format in this file is three columns for: “@ATTRIBUTE”, the descriptors’ names, and “NUMERIC”
- Open the txt data file in Excel. Make sure you are searching from “All Files”.

Figure 1, Opening your data file

When Text Import Wizard prompts, choose Delimited and click on Next (step 1), check the Tab box (step 2) and click on Finish. (Leave the rest as default unless it is necessary to change.)

Figure 2, Step 1

Figure 3, Step 2

Create three blank columns.
Copy the descriptors’ names (they are at the first row of your data file, normally) and paste them in the vertical form by using “Transpose” pasting option to the second column.
Make sure the cell format is Text before the next step.

Figure 4, Cell format

For the first and third columns: make equal number of rows as the second column has for “@ATTRIBUTE” and “NUMERIC”, respectively.
Example:

Figure 5, the three columns

File 2: Here we separate the @DATA part (only numbers) of data.
- Keep only the numbers needed and delete everything else.
- Save it in CSV (comma delimited) format. (Click YES when asks “Some features in your workbook … Do you want to keep using that format?”)

Figure 6, Save as CSV

In Notepad++:

File A:
- In the first row, create two columns: @RELATION and a title for the relation name (which are just separated by a space).
- (For aesthetics: leave the second row blank.)
- Copy the three columns in File 1 of Excel and paste into row3.
- Example:

Figure 7, start of the File A

File A (cont):
- Depending on your data, the last @ATTRIBUTE row will need to be the response variable AKA the thing you are trying to predict. (CATEGORICAL may require this, NUMERIC may not
- After all ATTRIBUTE information has been pasted leave 1 row blank (For aesthetics)
- After blank row Type “@DATA”.
- (For aesthetics: For the next row after “@DATA”, leave it blank.)
- Example:

Figure 8, middle of File A

File B:
- Open File 2 from Notepad++ (and you should see the data are separated by commas).
- Example:

Figure 9, File 2 open with NotePad++

Copy and paste all the data to File A.

Already returned to File A:
- Save the file as “arff” format (by adding “.arff” at the end of the file name).

The file is ready to open and run in WEKA (yay).

PS: (1) “%” would give you error if included in the descriptors’ names, specifically such as “%N” or “%O”, after “@ATTRIBUTE”. The error would occur because whatever after “%” is considered as comment, then in WEKA, it would be interpreted as a missing information for the descriptor name and data type. Simply remove “%” would avoid errors. (2)Also make sure the number of descriptors matches with the number of data in the “@DATA” section. (If you have 10 lines of @ATTRIBUTE + descriptors’ names, there should have 10 numbers in each line in @DATA part in the Notepad++.)

@@ Line 1: / Line 1: @@
 __TOC__
-<div title="header">
+====== Steps for Loading Data into WEKA ======
+ARFF format consists of three parts: '''@RELATION''', '''@ATTRIBUTE''' and '''@DATA'''.
-{|
+* @RELATION name
-|width="33%"| <br />
+* @ATTRIBUTE  descriptor_name data_type (numeric, nominal …)
-|width="33%"| <br />
+* @DATA: numbers (integer or real) or strings<span id="_GoBack"></span>General rules for ARFF file can be found here: https://www.cs.waikato.ac.nz/ml/weka/arff.html
-|width="33%"| <span style="background: #c0c0c0">0</span>
-|}
-<br />
+====== In Excel: ======
+* File 1: the format in this file is three columns for: “'''@ATTRIBUTE'''”, the descriptors’ names, and “'''NUMERIC'''”
-</div>
-Steps for Loading Data into WEKA
-<br />
-ARFF format consists of three parts: @RELATION, @ATTRIBUTE and @DATA.
-<br />
-@RELATION ''“SPACE”'' name
-@ATTRIBUTE ''“SPACE”'' descriptor name ''“SPACE”'' data type (numeric, nominal …)
-<span id="_GoBack"></span> @DATA: numbers (integer or real) or strings
-<br />
-General rules for ARFF file can be found here
-https://www.cs.waikato.ac.nz/ml/weka/arff.html
-(Search “ARFF File Format”)
-<br />
-In Excel:
-* File 1: the format in this file is three columns for: “@ATTRIBUTE”, the descriptors’ names, and “NUMERIC”
 ** Open the txt data file in Excel. Make sure you are searching from “All Files”.
 [[File:WEKA_Steps_for_Loading_Data_HTML_4ec5c9075dd9cb51.png|512x288px]] [[File:WEKA_Steps_for_Loading_Data_HTML_ae35e05dbfaa6f7c.gif|31x38px|Shape1]]
-''Figure <span style="background: #c0c0c0">1</span>, Opening your data file''
+''Figure 1, Opening your data file''
 * When '''Text Import Wizard''' prompts, choose '''Delimited and''' click on '''Next''' ('''step 1'''), check the '''Tab''' box ('''step 2''') and click on '''Finish'''. (Leave the rest as default unless it is necessary to change.)
@@ Line 55: / Line 20: @@
 [[File:WEKA_Steps_for_Loading_Data_HTML_d288ed8722be4fdc.png|393x300px]] [[File:WEKA_Steps_for_Loading_Data_HTML_369e56d90bf80141.gif|32x31px|Shape3]] [[File:WEKA_Steps_for_Loading_Data_HTML_bef7f76831abec8b.gif|14x32px|Shape2]]
-''Figure <span style="background: #c0c0c0">2</span>, Step 1''
+''Figure 2, Step 1''
 [[File:WEKA_Steps_for_Loading_Data_HTML_cd41ce313e7118b1.png|379x288px]] [[File:WEKA_Steps_for_Loading_Data_HTML_859149fac76acd64.gif|30x19px|Shape5]] [[File:WEKA_Steps_for_Loading_Data_HTML_6092b35113c6658c.gif|19x33px|Shape4]]
-''Figure <span style="background: #c0c0c0">3</span>, Step 2''
+''Figure 3, Step 2''
 * Create three blank columns.
@@ Line 67: / Line 32: @@
 [[File:WEKA_Steps_for_Loading_Data_HTML_5e70e797a5c2cba0.png|313x156px]] [[File:WEKA_Steps_for_Loading_Data_HTML_634c3e41e8a46d53.gif|59x34px|Shape6]]
-''Figure <span style="background: #c0c0c0">4</span>, Cell format''
+''Figure 4, Cell format''
 * For the first and third columns: make equal number of rows as the second column has for “@ATTRIBUTE” and “NUMERIC”, respectively.
@@ Line 74: / Line 39: @@
 [[File:WEKA_Steps_for_Loading_Data_HTML_e8fa2497f6fc4487.png|198x337px]]
-''Figure <span style="background: #c0c0c0">5</span>, the three columns''
+''Figure 5, the three columns''
-<br />
 * File 2: Here we separate the @DATA part (only numbers) of data.
 ** Keep only the numbers needed and delete everything else.
 ** Save it in '''CSV (comma delimited)''' format. ('''Click YES''' when asks “Some features in your workbook … Do you want to keep using that format?”)
-<br />
 [[File:WEKA_Steps_for_Loading_Data_HTML_d4bc2328d2c3083f.png|279x64px]] [[File:WEKA_Steps_for_Loading_Data_HTML_4fc01181394c3a93.gif|43x28px|Shape7]]
-''Figure <span style="background: #c0c0c0">6</span>, Save as CSV''
+''Figure 6, Save as CSV''
-<br />
-In Notepad++:
+====== In Notepad++: ======
 * File A:
 ** In the first row, create two columns: @RELATION and a title for the relation name (which are just separated by a space).
@@ Line 103: / Line 57: @@
 [[File:WEKA_Steps_for_Loading_Data_HTML_1f0af9e857ead597.png|247x204px]]
-''Figure <span style="background: #c0c0c0">7</span>, start of the File A''
+''Figure 7, start of the File A''
-* Depending on your data, the last @ATTRIBUTE row will need to be the response variable AKA the thing you are trying to predict. (CATEGORICAL may require this, NUMERIC may not)
+* File A (cont):
-* After all ATTRIBUTE information has been pasted leave 1 row blank ''(For aesthetics'')
+** Depending on your data, the last @ATTRIBUTE row will need to be the response variable AKA the thing you are trying to predict. (CATEGORICAL may require this, NUMERIC may not
-* After blank row Type “@DATA”.
+** After all ATTRIBUTE information has been pasted leave 1 row blank ''(For aesthetics'')
-* ''(For aesthetics:'' For the next row after “@DATA”, leave it blank.)
+** After blank row Type “@DATA”.
-* Example: [[File:WEKA_Steps_for_Loading_Data_HTML_ca4125bd73bd6170.png|460x172px]]
+** ''(For aesthetics:'' For the next row after “@DATA”, leave it blank.)
+** Example:
-''Figure <span style="background: #c0c0c0">8</span>, middle of File A''
-<br />
+[[File:WEKA_Steps_for_Loading_Data_HTML_ca4125bd73bd6170.png|460x172px]]
+''Figure 8, middle of File A''
 * File B:
 ** Open File 2 from Notepad++ (and you should see the data are separated by commas).
-** Example: [[File:WEKA_Steps_for_Loading_Data_HTML_964b046e1656a7f.png|555x225px]]
+** Example:
+[[File:WEKA_Steps_for_Loading_Data_HTML_964b046e1656a7f.png|555x225px]]
 ''Figure <span style="background: #c0c0c0">9</span>, File 2 open with NotePad++''
 * Copy and paste all the data to File A.
-<br />
 * Already returned to File A:
 ** Save the file as “arff” format (by adding “.arff” at the end of the file name).
-<br />
 '''The file is ready to open and run in WEKA (yay).'''
-PS: (1) “%” would give you error if included in the descriptors’ names, specifically such as '''“%N” or “%O”''', after “@ATTRIBUTE”. The error would occur because whatever after “%” is considered as comment, then in WEKA, it would be interpreted as a missing information for the descriptor name and data type. Simply remove “%” would avoid errors. (2)Also make sure the number of descriptors matches with the number of data in the “@DATA” section. (If you have 10 lines of @ATTRIBUTE + descriptors’ names, there should have 10 numbers in each line in @DATA part in the Notepad++.
-<br />
+PS: (1) “%” would give you error if included in the descriptors’ names, specifically such as '''“%N” or “%O”''', after “@ATTRIBUTE”. The error would occur because whatever after “%” is considered as comment, then in WEKA, it would be interpreted as a missing information for the descriptor name and data type. Simply remove “%” would avoid errors. (2)Also make sure the number of descriptors matches with the number of data in the “@DATA” section. (If you have 10 lines of @ATTRIBUTE + descriptors’ names, there should have 10 numbers in each line in @DATA part in the Notepad++.)<br />
-<br />

WEKA Steps for Loading Data: Difference between revisions

Latest revision as of 20:39, 26 September 2022

Contents

Steps for Loading Data into WEKA

In Excel:

In Notepad++:

Navigation menu