WEKA Steps for Loading Data

From Rasulev Lab Wiki
Revision as of 19:21, 26 September 2022 by Sysadmin (talk | contribs) (Imported from text file)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


0



Steps for Loading Data into WEKA



ARFF format consists of three parts: @RELATION, @ATTRIBUTE and @DATA.



@RELATION “SPACE” name

@ATTRIBUTE “SPACE” descriptor name “SPACE” data type (numeric, nominal …)

@DATA: numbers (integer or real) or strings



General rules for ARFF file can be found here

https://www.cs.waikato.ac.nz/ml/weka/arff.html

(Search “ARFF File Format”)



In Excel:

  • File 1: the format in this file is three columns for: “@ATTRIBUTE”, the descriptors’ names, and “NUMERIC”
    • Open the txt data file in Excel. Make sure you are searching from “All Files”.

WEKA Steps for Loading Data HTML 4ec5c9075dd9cb51.png Shape1

Figure 1, Opening your data file

  • When Text Import Wizard prompts, choose Delimited and click on Next (step 1), check the Tab box (step 2) and click on Finish. (Leave the rest as default unless it is necessary to change.)

WEKA Steps for Loading Data HTML d288ed8722be4fdc.png Shape3 Shape2

Figure 2, Step 1

WEKA Steps for Loading Data HTML cd41ce313e7118b1.png Shape5 Shape4

Figure 3, Step 2

  • Create three blank columns.
  • Copy the descriptors’ names (they are at the first row of your data file, normally) and paste them in the vertical form by using “Transpose” pasting option to the second column.
  • Make sure the cell format is Text before the next step.

WEKA Steps for Loading Data HTML 5e70e797a5c2cba0.png Shape6

Figure 4, Cell format

  • For the first and third columns: make equal number of rows as the second column has for “@ATTRIBUTE” and “NUMERIC”, respectively.
  • Example:

WEKA Steps for Loading Data HTML e8fa2497f6fc4487.png

Figure 5, the three columns



  • File 2: Here we separate the @DATA part (only numbers) of data.
    • Keep only the numbers needed and delete everything else.
    • Save it in CSV (comma delimited) format. (Click YES when asks “Some features in your workbook … Do you want to keep using that format?”)



WEKA Steps for Loading Data HTML d4bc2328d2c3083f.png Shape7

Figure 6, Save as CSV



In Notepad++:

  • File A:
    • In the first row, create two columns: @RELATION and a title for the relation name (which are just separated by a space).
    • (For aesthetics: leave the second row blank.)
    • Copy the three columns in File 1 of Excel and paste into row3.
    • Example:

WEKA Steps for Loading Data HTML 1f0af9e857ead597.png

Figure 7, start of the File A

  • Depending on your data, the last @ATTRIBUTE row will need to be the response variable AKA the thing you are trying to predict. (CATEGORICAL may require this, NUMERIC may not)
  • After all ATTRIBUTE information has been pasted leave 1 row blank (For aesthetics)
  • After blank row Type “@DATA”.
  • (For aesthetics: For the next row after “@DATA”, leave it blank.)
  • Example: WEKA Steps for Loading Data HTML ca4125bd73bd6170.png

Figure 8, middle of File A



  • File B:
    • Open File 2 from Notepad++ (and you should see the data are separated by commas).
    • Example: WEKA Steps for Loading Data HTML 964b046e1656a7f.png

Figure 9, File 2 open with NotePad++

  • Copy and paste all the data to File A.



  • Already returned to File A:
    • Save the file as “arff” format (by adding “.arff” at the end of the file name).



The file is ready to open and run in WEKA (yay).

PS: (1) “%” would give you error if included in the descriptors’ names, specifically such as “%N” or “%O”, after “@ATTRIBUTE”. The error would occur because whatever after “%” is considered as comment, then in WEKA, it would be interpreted as a missing information for the descriptor name and data type. Simply remove “%” would avoid errors. (2)Also make sure the number of descriptors matches with the number of data in the “@DATA” section. (If you have 10 lines of @ATTRIBUTE + descriptors’ names, there should have 10 numbers in each line in @DATA part in the Notepad++.