Difference between revisions of "Exporting ratings"

From WOT Wiki
Jump to: navigation, search
m (tyop)
(Target name encoding)
Line 14: Line 14:
 
    
 
    
 
     <span style="color: green;">''&lt;!-- '''target''': one element for each rated or commented target --&gt;''</span>   
 
     <span style="color: green;">''&lt;!-- '''target''': one element for each rated or commented target --&gt;''</span>   
     <span style="color: green;">''&lt;!--  name: target name (host name, IP address), [http://en.wikipedia.org/wiki/Internationalized_domain_name IDN] are converted according to [http://www.rfc-editor.org/rfc/rfc3490.txt RFC 3490] (string) --&gt;''</span>
+
     <span style="color: green;">''&lt;!--  name: [[#Target names|target name]] (string) --&gt;''</span>
 
    
 
    
 
     <'''target''' name="<span style="color: blue;">example.com</span>">
 
     <'''target''' name="<span style="color: blue;">example.com</span>">
Line 67: Line 67:
 
     <span style="color: green;">''&lt;!-- ... more target elements for each rated or commented host ... --&gt;''</span>
 
     <span style="color: green;">''&lt;!-- ... more target elements for each rated or commented host ... --&gt;''</span>
 
   </'''wot'''>
 
   </'''wot'''>
 +
 +
=== Target names ===
 +
Target names are typically host names or IP addresses. [http://en.wikipedia.org/wiki/Internationalized_domain_name IDN] are converted to ASCII according to [http://www.rfc-editor.org/rfc/rfc3490.txt RFC 3490]. For certain shared hosts (e.g. twitter.com), the target name may also contain part of the path name encoded as a subdomain. The encoded subdomain is always lowest in the hierarchy and starts with <tt>_p_</tt> followed by the [http://www.rfc-editor.org/rfc/rfc3548.txt RFC 3548] compliant [http://en.wikipedia.org/wiki/Base32 Base32] encoded path. For example:
 +
<span style="color: green;">_p_</span><span style="color: blue;">k5swex3pmzpvi4tvon2a</span>.twitter.com = twitter.com/<span style="color: blue;">Web_of_Trust</span>
  
 
== Processing large export files ==
 
== Processing large export files ==
 
Many text editors or XML parsers try to load the entire file into memory, which causes problems for larger export files. You can split the XML file into smaller chunks using [http://www.mywot.com/files/misc/splitwot.pl.txt this Perl script] (or any other XML splitter). For example, to split the file into chunks each containing at most 10000 targets, run the command <tt>perl splitwot.pl export.xml 10000</tt>. If you are using Windows, try [http://www.activestate.com/activeperl/ ActivePerl].
 
Many text editors or XML parsers try to load the entire file into memory, which causes problems for larger export files. You can split the XML file into smaller chunks using [http://www.mywot.com/files/misc/splitwot.pl.txt this Perl script] (or any other XML splitter). For example, to split the file into chunks each containing at most 10000 targets, run the command <tt>perl splitwot.pl export.xml 10000</tt>. If you are using Windows, try [http://www.activestate.com/activeperl/ ActivePerl].

Revision as of 09:04, 16 September 2009

Downloading ratings and comments

You can export your own ratings and comments on the "My ratings" tab in your profile. The data is updated once a day and the date of last update is shown next to the "Download" button.

Data format

The data is in an XML format. If the export file is larger than 1 kiB, it's compressed to a ZIP file. Here's an annotated example:

 <!-- character encoding is always UTF-8 -->
 <?xml version="1.0" encoding="UTF-8"?>
 
 <!-- wot: the root element -->
 <!--   uid: your account id (integer) -->
 
 <wot uid="1">
 
   <!-- target: one element for each rated or commented target -->  
   <!--   name: target name (string) -->
 
   <target name="example.com">
 
     <!-- rating: contains rating components (optional) -->
     <!--   time: last changed (ISO 8601 time) -->
 
     <rating time="2006-09-26 19:16:47+00">
 
       <!-- component: one for each rated component (unrated components not included) -->
       <!--   name: component id (integer ∊ [0, 100]) -->
       <!--     0 = Trustworthiness -->
       <!--     1 = Vendor reliability -->
       <!--     2 = Privacy -->
       <!--     4 = Child safety -->
       <!--   rating: rating value (integer) -->
 
       <component name="0" rating="90"/>
       <component name="4" rating="90"/>
 
     </rating>
 
     <!-- comment: one for each comment (optional) -->
     <!--   category: comment category (integer) -->
     <!--      4 = • Useful, informative -->
     <!--      5 = • Entertaining -->
     <!--      6 = • Good customer experience -->
     <!--      7 = • Child friendly -->
     <!--     21 = • Good site -->
     <!--      8 =  Spam -->
     <!--      9 =  Annoying ads or popups -->
     <!--     10 =  Bad customer experience -->
     <!--     11 =  Phishing or other scams -->
     <!--     12 =  Malicious content, viruses -->
     <!--     13 =  Browser exploit -->
     <!--     14 =  Spyware or adware -->
     <!--     15 =  Adult content -->
     <!--     16 =  Hateful, violent or illegal content -->
     <!--     22 =  Ethical issues -->
     <!--     18 =  Useless -->
     <!--     19 =  Other -->
     <!--   time: last changed (ISO 8601 time) --> 
 
     <comment category="9" time="2008-04-23 16:37:48+00">
       <!-- comment text is always in a CDATA element --> 
       <!CDATA[Comment text.]]>
     </comment>
 
     <!-- ... more comment elements ... -->
   </target>
 
   <!-- ... more target elements for each rated or commented host ... -->
 </wot>

Target names

Target names are typically host names or IP addresses. IDN are converted to ASCII according to RFC 3490. For certain shared hosts (e.g. twitter.com), the target name may also contain part of the path name encoded as a subdomain. The encoded subdomain is always lowest in the hierarchy and starts with _p_ followed by the RFC 3548 compliant Base32 encoded path. For example:

_p_k5swex3pmzpvi4tvon2a.twitter.com = twitter.com/Web_of_Trust

Processing large export files

Many text editors or XML parsers try to load the entire file into memory, which causes problems for larger export files. You can split the XML file into smaller chunks using this Perl script (or any other XML splitter). For example, to split the file into chunks each containing at most 10000 targets, run the command perl splitwot.pl export.xml 10000. If you are using Windows, try ActivePerl.