This program compares ranked lists by computing the mean Canberra distance indicator on top-k sublists.
Currently, this app is unavailable in the Discovery Environment; however, it is available in the Canberra/Borda instance on Atmosphere. To use the app in Atmosphere, access the Terminal and input the following commands.
Example input and output data has already been loaded into this image, as well as, the usage file mentioned at the bottom of this page.
canberra.py <options> <inputfilename>
canberra.py <options> <inputfilename> <kValue>
Filename or full path to the input file to be processed. See below for expected format.
Single k value for which the Canberra distance is to be calculated. If <kvalue> is not specified, values will be generated for all values of k from 1 through the length of the first list in the input file.
Delimiter between values in a row; defaults to a comma.
Character enclosing strings if necessary; defaults to a double quote.
If specified, the first row of the input file is assumed to be a header row and is ignored.
Delimiter(s) between values in a row in the output file; defaults to a comma.
Character(s) enclosing strings in the output; defaults to a double quote.
If specified, the first row will contain the names of the fields (k, Distance).
Name of the output file; defaults to "results.txt".
Test data can be found in the /usr/bin/example_data folder of the Atmosphere instance. Additionally test data can be found at:
Results using the parameter file included with the relevant input files can be found at:
The output files were generated using the --outHeader option.
Running Example Data
Firstly open your instance with the vnc image viewer (or if you are comfortable with the command line, the shell. Although this example uses the vnc viewer)
Secondly we will change our working director to the Desktop. The results automatically output to the current working directory and the Desktop is an easy to access place. Feel free to chose any other place you may want for your own data.
Thirdly we will run an analysis of the sample input data without any special options being specified
(note: the 3, is our way of setting a k-value and you can set this value to anything you want within your own data)
canberra.py /usr/bin/example_data/canberraInput.csv 3
(Running the above as stated should result in a screenshot similar to this one)
Input File Format
The input file should contain delimited values (defaulting to CSV unless options are specified) with an optional header row if the correct option is specified on the command line; otherwise, the first line is considered to be data.
Each row represents a ranked list, with each item the index of the position of the item in the original list. For example, if the original list is (55, 66, 44, 22), the corresponding row in the input file is (2, 3, 1, 0). Whether the ranking starts with 0 or 1 does not matter as long as all the lists start with the same number. Currently the app is only accepting ranked data of equal columns. For example if a list contains (4, 1, 2 3), then each row that follows must be of the same size such as (3 ,2, 4,1), (2,1, 3, 4) etc.
Output File Format
The output file will be called results.txt unless the --outputFile option is specified, and the file will be output in the same location as the Canberra source code. It contains comma-separated values (or other delimiters if specified) and may or may not contain an header row based on the presence of the appropriate option. The values are: k, Distance.