============== Page 1/1 ============== Towards Next-Generation Peer-to-Peer Systems Magnus Kolweyh aka risq University of Bremen mag@tzi.de Kolweyh Towards Next-Generation Peer-to-Peer Systems Motivation Present an overview of interesting P2P systems Offer some knowledge out of P2P science Discuss novel implemented approaches and concepts Show some current measurement studies Inspire developers Ask the usual P2P suspects Kolweyh Towards Next-Generation Peer-to-Peer Systems Roadmap Science vs. Peer-to-Peer P2P generations Challenges and concepts Current trends in file sharing Services for Peer-to-Peer systems Example: Data Mining Prospects Kolweyh Towards Next-Generation Peer-to-Peer Systems Science and Peer-to-Peer ....or how to survive as an evil file sharing PhD student... Anthill Don't touch too much file sharing..illegal content Kolweyh Suprnova dead ? ..ok..next one please Towards Next-Generation Peer-to-Peer Systems P2P Generations ?! Old school File sharing Phones Napster Arpanet Gnutella Bulletin Boards Edonkey Messenger Distributed computing ICQ Zetagrid M$ and AOL crap SETI@HOME ..awkward !! ..any ideas ? Kolweyh Towards Next-Generation Peer-to-Peer Systems Let's think of services Communication File sharing Distributed computing Scalability Reliability Anonymity What are the next services ? Kolweyh Towards Next-Generation Peer-to-Peer Systems Peer-to-Peer Communities Boards Kolweyh Community basis Files Towards Next-Generation Peer-to-Peer Systems Free riding Adar/Huberman 20% host 98% of all content 60-70% peers don't share Kolweyh Towards Next-Generation Peer-to-Peer Systems Data distribution y = r –b -> Zipf Distribution of data, not random ! (Examples: wealth among firms / words in human languages) Kolweyh Towards Next-Generation Peer-to-Peer Systems Performance issues For instance Gnutella I Simple Protocol Plain keyword searching Random selection of the peers No central point of failure No caching of the peers or the data No redundancy of the data Vulnerable to DoS attacks Heavy messaging Scales bad Kolweyh Towards Next-Generation Peer-to-Peer Systems Time for improvements Data structures Distributed Hash tables Routing Indices Graphs Small world effect Power law distribution Semantic overlays Description of content repository Agent based systems Swarm-Intelligence, Learning Consider data distribution Kolweyh Towards Next-Generation Peer-to-Peer Systems The small world effect ● Characteristic path length L Shortest Path between two nodes ● Global Cluster coefficient C Cliquishness of a node's neighbourhood Small World ? Similar L Much higher C compared to random graphs Kolweyh ● LActual Average path length of the “real world” graph ● LRandom Average Path length of a random graph ● CActual ● CRandom Cluster coefficient of a random graph Cluster coefficient of the “real world” graph Towards Next-Generation Peer-to-Peer Systems It's a small world after all... Small World Graph The Kevin Bacon Game Kolweyh Neuronal networks of worms Baseball players Power grids Web graphs Gnutella Towards Next-Generation Peer-to-Peer Systems Small worlds and P2P 2 approaches Adapt the distributed protocol/algorithm to the six degrees of separation (the small world) world Build systems with small world attributes Example: Kolweyh Towards Next-Generation Peer-to-Peer Systems CHORD – Scalable P2P Load Balance, Decentralization, Scalability, Availability, Flexible naming space Kolweyh Towards Next-Generation Peer-to-Peer Systems Scalable systems Concepts Applications CAN PASTRY TAPESTRY JXTA Kolweyh Edutella Towards Next-Generation Peer-to-Peer Systems Back to file sharing Assumptions made by popular media File sharing is on the decline Those nets are all about music and video Edonkey is the new leader, ahead of Kazaa P2P = illegal sharing of files What we will do here System analysis (traffic, content, distribution) Services for real world P2P systems Kolweyh Towards Next-Generation Peer-to-Peer Systems File sharing traffic Kolweyh Towards Next-Generation Peer-to-Peer Systems Where have all the flowers gone.. Traditional traffic measurements don't work for P2P P2P applications use various ports dynamically P2P protocols use common ports (80) to jump over firewall/NAT barriers (e.g. jxta uses http) Users and projects switched to Bittorrent recently ---> P2P is not on decline but is hiding ! Kolweyh Towards Next-Generation Peer-to-Peer Systems File sharing content 1000000 100000 10000 1000 100 10 1 total GB average MB number musictypes videotypes others 1471,74 5315,86 1392,51 3,55 145,61 2,32 415125 36508 601094 Direct Connect Hubs November 04 Just sound and video content ? What other content is there ? Kolweyh Towards Next-Generation Peer-to-Peer Systems What other content types ? 1000000,00 100000,00 10000,00 1000,00 100,00 10,00 1,00 0,10 0,01 configtypes binarytypes cdimagetypes internettypes imagetypes officetypes sourcetypes texttypes ziptypes total GB 0,21 111,96 988,22 1,87 28,60 1,16 0,40 6,28 253,79 average MB 0,02 2,02 123,25 0,02 0,10 0,18 0,12 0,15 3,84 number 9717 55442 8018 119007 292554 6318 3251 40645 66142 High diversity of the files Vast amount of Internet data (html-files..) Too much data to ignore => NONE movie/music places Kolweyh Towards Next-Generation Peer-to-Peer Systems Services for huge databases Goals Personalization, collaboration Example How much customers from Schöneberg buy ice cream on Fridays ? Users who bought this book also bought... Services Collaborative filtering, recommender systems How to get the recommendations ? User Input Community-based, but users are lazy ! Automatic Kolweyh Towards Next-Generation Peer-to-Peer Systems amazon recommendation service @ Coldplay Kolweyh Towards Next-Generation Peer-to-Peer Systems Do we really need this ? Recommendation service Knowledge discovery Kolweyh Towards Next-Generation Peer-to-Peer Systems Types of RS Bayesian networks Model-based, Decision trees complex, not very fast Clustering Clutch users into groups with similar interest recommended product is the average of a group Association rules Relations between users or data sounds easy but needs to be adapted to P2P Kolweyh Towards Next-Generation Peer-to-Peer Systems Challenges for P2P RS Mining strategy Data extraction and conversion Validate and represent the rules Data Mining algorithm Kolweyh scalable, dynamic, ad-hoc, asynchronous, suitable Towards Next-Generation Peer-to-Peer Systems Association Rule Mining Set of items: I={I1,I2,…,Im} Transactions: D={t1,t2,…, tn}, tj ∈ I Item set X: {Ii1,Ii2,…,Iik} ∈ I Support supp(X) = | X (t) | / |D| Transactions T1 T2 T3 T4 T5 Beer Diapers Beer Honey Wine Items Linux Wine Linux Beer Bread Honey Linux Diapers Linux • Association Rule X ⇒ Y Implication X ⇒ Y X,Y ⊆ I X ∩ Y Transactions X ∪ Y • Support X ⇒ Y supp(X ∪ Y) / supp(X) • Confidence X ⇒ Y • Support-Confidence Framework: X ⇒ Y supp (X ∪ Y) ≥ minsupp conf (X ⇒ Y) ≥ minconf Task: Search frequent items, prune rules that don't interest Kolweyh Towards Next-Generation Peer-to-Peer Systems Bread Beer Distributed ARM in P2P Traditional ARM: Sequentially Methods: Central Processing, Broadcasting, Global Synchronisation Peer to Peer: Distributed - Synchronisation: Autonomous behaviour of every node - Information never is up-to-date, highly dynamic Global Synchronisation: Computing costs are to high Here: Local Algorithm: Local majority voting (Wolff/Schuster) Kolweyh Towards Next-Generation Peer-to-Peer Systems Direct Connect network Infrastructure Data Structured by Hubs Peers 341089 User oriented Data size 14486.50 Terra bytes! Content oriented Clients Original NeoModus DC DC++ Valknut (!) ..aka DC-GUI Kolweyh Towards Next-Generation Peer-to-Peer Systems Direct Connect -Hub Some approach connected Client 1 Client 2 Client n getData() allData() PeerMiner Filter Simulator avi P2P Transactions feedback() Local Database Apriori Majority Voting Association AssociationRules Rules Abbildung 2 PeerMiner Kolweyh Towards Next-Generation Peer-to-Peer Systems Let's get some movies ..ooops..sorry.. Content avi-video Attributes 10 rules Hubs Chosen by regular expressions (“movies”,..) Instances 151 Minimum support 0.10 Minimum confidence 0.5 transaction database some popular rules Mystic River Love actually=no => Mystic River=yes && Kill Bill=no Spartan Mystic River=yes && Love actually=no => Kill Bill=no Kill Bill Love actually=no && Kill Bill=no => Mystic River=yes Love actually Spartan=yes => Kill Bill=no ... Love actually=no => Mystic River=yes Kolweyh Towards Next-Generation Peer-to-Peer Systems Current work Implementation of Distributed ARM Algorithms on P2P networks Social structures of P2P communities Distributed Hashing in JXTA Develop more accurate P2P simulators Measurement studies of other P2P systems (Bittorrent) Kolweyh Towards Next-Generation Peer-to-Peer Systems Summary Scientific P2P Current trends in file sharing Scalability and routing Coding resources Novel services for P2P systems Association Rule Mining mag@tzi.de Kolweyh …. Questions and discussion please… Towards Next-Generation Peer-to-Peer Systems