Data availability analysis in P2P networks

Sanaullah, Nazir

dc.contributor.advisor	Hauswirth, Manfred
dc.contributor.author	Sanaullah, Nazir
dc.date.accessioned	2013-08-07T14:25:55Z
dc.date.available	2013-08-07T14:25:55Z
dc.date.issued	2012-09-28
dc.identifier.uri	http://hdl.handle.net/10379/3606
dc.description.abstract	P2P network architectures have gained popularity as applications for sharing files between users. A P2P network provides a scalable, robust, and economical storage architecture. These features have led to the extended use of P2P network applications, ranging from file sharing to data sharing for video and telecommunication domains. The shift in storage system being used from high cost, reliable servers to usercentered storage devices led to reliability and availability problems for the P2P network. Peers are machines of users that can go offline at any time. The data stored on the machines are not available during the offline time. Data replication is a common approach for handling data unavailability, which is where multiple copies of files are placed on different peers in the network. In data replication, peers transfer complete/partial data to other nodes. Therefore, data replication provides higher data availability in case of churn. I present data replication algorithms in this thesis to improve the availability of data in the network. With an increase in availability and overhead, the basic challenges faced during the development of data replication algorithm are: (i) How many replicas for a data object should be created? (ii) On which peer(s) should the replicated data objects be stored? (iii) Which files should be replicated? Initial work in data replication considered the static replication of data based on the overall availability of nodes in the network. These approaches overestimated the number of replicas, which lead to high maintenance costs. Dynamic approaches for estimating replica numbers were developed to handle this issue. From the analysis of the current approaches, I found that the proposed mechanisms for dynamic approaches to replication did not provide a balanced replication of data. Data were only replicated to highly available nodes, which were overloaded with data. The second issue was the inability to adapt to the changing behaviour of peers. In this thesis, I present an approach that selects a node set comprised of both highly available and lowly available nodes, in order to provide load balancing in the network. I provide a feedback-based approach where previous behaviours are incorporated in the next behavioural analysis. Compared to the existing approaches to replica calculation, this approach is able to determine the appropriate number of replicas and placement locations with the changing dynamics of the system. The replication system relies on node behaviour prediction algorithms using Monte Carlo simulation and Time series analysis. Each node performs an analysis on the historical traces of its online and offline times in the network. Each node shares the availability log with the replication initiator node, and the prediction of future behaviour is made based on the logs received. The data-owning peer uses this information to run the replica placement algorithm to select nodes that are present for a particular duration, supporting the presence of each others in the network. Partial data replication is supported by the system by applying Zipf distribution to calculate the most popular files. I performed the evaluation using my replication approach and dynamic replica placement algorithms, based on the following parameters: replica count, reliability of data, average availability of nodes in the replica set, and failure analysis for querying data. The replica count analysis shows that the number of replicas required were almost half compared to the previous dynamic approaches. The reliability analysis shows that overall reliability of the data was better in this approach compared to the other dynamic replica placement algorithms. My replication algorithm produced replica sets with a lower average availability compared to the replica set of the other approaches, but the reliability analysis suggests that my approach distributes data more evenly between nodes, resulting in better overall data availability. The availability of data in the network was higher than other approaches. The failure analysis for request failures for data shows that my replication algorithm has a better node selection mechanism compared to other approaches, with better data availability.	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subject	Distributed storage systems	en_US
dc.subject	P2P network	en_US
dc.subject	Data availability	en_US
dc.subject	Data replication	en_US
dc.subject	Reliability analysis	en_US
dc.subject	Time series analysis	en_US
dc.subject	Monte Carlo simulation	en_US
dc.subject	Fault tolerance	en_US
dc.subject	Information storage systems	en_US
dc.subject	Digital Enterprise Research Institute	en_US
dc.subject	DERI	en_US
dc.title	Data availability analysis in P2P networks	en_US
dc.type	Thesis	en_US
dc.contributor.funder	Science Foundation Ireland (SFI) Grant No. SFI/08/CE/I1380 (Lion-2).	en_US
dc.contributor.funder	EU SemanticGov Project FP6-2004-IST-4-027517.	en_US
dc.local.note	High availability systems are systems which need to be continuously operational for long periods of time even if some components become temporarily unavailable or fail. P2P systems answer to some of these requirements and are investigated in this thesis in respect to data availability through optimized data replication over sets of nodes.	en_US
dc.local.final	Yes	en_US
nui.item.downloads	2582

Files in this item

Name:: license.txt
Size:: 5.659Kb
Format:: Text file

View/Open

Name:: sanaullah_phd-thesis_final.pdf
Size:: 1.212Mb
Format:: PDF
Description:: Data availability analysis in ...

View/Open

This item appears in the following Collection(s)

University of Galway Theses (PhD Theses)

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland