A classification of ASes into one of the following classes:
large ISPs
small ISPs
customer networks
universities
Internet exchange points
network information centers
The following set of AS attributes calculated for every AS:
organization description records
advertised IP prefixes
inferred relationship with neighboring ASes
The classification in Part I is a result of application of
machine learning techniques to the attributes in Part II.
The latter are extracted from CAIDA, RouteViews, and
Internet Routing Registries data.
Users of the repository can view it as a source of the
Internet AS-level topology enriched with information
related to the Internet economy. As such,
the repository aims to promote deeper analysis of the
macroscopic Internet structure and to inspire more
adequate Internet modeling. The repository is
supplementary to the paper "Revealing the Autonomous
System Taxonomy: The Machine Learning Approach".
Publication
Revealing the Autonomous System Taxonomy: The Machine Learning Approach (pdf) Xenofontas Dimitropoulos, Dmitri Krioukov, George Riley, KC Claffy
Passive and Active Measurements Workshop (PAM), Mar. 2006.
Our work was recognized with the PAM best paper award. The received award is
based on a new dataset that we release for community use in subsequent
research. In this page we make available our dataset.
Data Sources and AS attributes
From this page you can download the following:
The AS taxonomy obtained in this work along with the AS attributes
we used for the classification.
The AS business relationship information inferred in our previous work.
AS taxonomy and attributes
The file as2attr.tgz includes the set of AS attributes
we extracted from CAIDA, RouteViews, and Internet Routing Registries data. Each line contains the following
tab delimited fields: 1) AS number, 2) organization description record, 3) number of inferred providers,
4) number of inferred peers, 5) number of inferred customers, 6) equivalent number of /24 prefixes
covering all the advertised IP space, 7) number of advertised IP prefixes, and 8) inferred AS class.
The classes are encoded with the following acronyms: "t1" for large ISPs, "t2" for
small ISPs, "edu" for Universities, "ix" for IXPs, "nic" for NICs, "comp" for Customers and "abstained"
for ASes for which the algorithm did not make a prediction.
AS relationships
The file as_rel.tgz includes the AS graph annotated with
inferred AS relationships. Our inference is based on heuristics we developed in our previous work. In particular,
customer-to-provider relationships are inferred using the methodology of the paper
Inferring AS Relationships: Dead End
or Lively Beginning?, while peer-to-peer links are inferred using the methodology of the paper
"AS Relationships: Inferance and Validation", which is currently under submission (we hope to post a link
here soon). Each line in as_rel.txt is a triplet: A B C, where A B reflects an AS link and C the
AS relationship: if (C==0) A B is a p2p link; if (C=-1) A is a customer of B; and if (C==1) A is a provider
of B. Each AS link is listed twice as A B and B A. Note that few of the AS numbers listed in as_rel.txt
are missing from as2attr.txt, since in the latter we include only the AS numbers for which all six
attributes were available.
Contact Information
Xenofontas Dimitropoulos (fontas [you know what] ece.gatech.edu)
College of Computing
Georgia Institute of Technology
Atlanta, GA 30332-0280
Last Modified: Oct. 10, 2005