Exporting ratings

From WOT Wiki
Revision as of 04:40, 10 March 2014 by Jesant13 (talk | contribs) (I updated some information and split one sentence into two.)
Jump to: navigation, search

Downloading ratings and comments

You can export your own ratings and comments on the "My ratings" tab in your profile. The data is updated daily but only if at least one new website has been rated. The date of last update is shown next to the "Download" button.

Data format

The data is in an XML format. If the export file is larger than 1 kiB, it's compressed to a ZIP file. Here's an annotated example:

 <!-- character encoding is always UTF-8 -->
 <?xml version="1.0" encoding="UTF-8"?>
 
 <!-- wot: the root element -->
 <!--   uid: your account id (integer) -->
 
 <wot uid="1">
 
   <!-- target: one element for each rated or commented target -->  
   <!--   name: target name (string) -->
 
   <target name="example.com">
 
     <!-- rating: contains rating components (optional) -->
     <!--   time: last changed (ISO 8601 time) -->
 
     <rating time="2006-09-26 19:16:47+00">
 
       <!-- component: one for each rated component (unrated components not included) -->
       <!--   name: component id (integer ∊ [0, 100]) -->
       <!--     0 = Trustworthiness -->
       <!--     1 = Vendor reliability -->
       <!--     2 = Privacy -->
       <!--     4 = Child safety -->
       <!--   rating: rating value (integer) -->
 
       <component name="0" rating="90"/>
       <component name="4" rating="90"/>
 
     </rating>
 
     <!-- comment: one for each comment (optional) -->
     <!--   category: comment category (integer) -->
     <!--      4 = • Useful, informative -->
     <!--      5 = • Entertaining -->
     <!--      6 = • Good customer experience -->
     <!--      7 = • Child friendly -->
     <!--     21 = • Good site -->
     <!--      8 =  Spam -->
     <!--      9 =  Annoying ads or popups -->
     <!--     10 =  Bad customer experience -->
     <!--     11 =  Phishing or other scams -->
     <!--     12 =  Malicious content, viruses -->
     <!--     13 =  Browser exploit -->
     <!--     14 =  Spyware or adware -->
     <!--     15 =  Adult content -->
     <!--     16 =  Hateful, violent or illegal content -->
     <!--     22 =  Ethical issues -->
     <!--     18 =  Useless -->
     <!--     19 =  Other -->
     <!--   time: last changed (ISO 8601 time) --> 
 
     <comment category="9" time="2008-04-23 16:37:48+00">
       <!-- comment text is always in a CDATA element --> 
       <!CDATA[Comment text.]]>
     </comment>
 
     <!-- ... more comment elements ... -->
   </target>
 
   <!-- ... more target elements for each rated or commented host ... -->
 </wot>

Target names

Target names are typically host names or IP addresses. Internationalized domain names (IDN) are converted to ASCII according to RFC 3490. For certain shared hosts (e.g. twitter.com), the target name may also contain part of the path encoded as a subdomain. The encoded subdomain is always lowest in the hierarchy and starts with _p_ followed by the RFC 3548 compliant Base32 encoded path. For example:

_p_k5swex3pmzpvi4tvon2a.twitter.com = twitter.com/Web_of_Trust

Processing large export files

Many text editors or XML parsers try to load the entire file into memory, which causes problems for larger export files. You can split the XML file into smaller chunks using this Perl script (or any other XML splitter). For example, to split the file into chunks each containing at most 10000 targets, run the command perl splitwot.pl export.xml 10000. If you are using Windows, try ActivePerl.