On Collective Communication and Notified Read in the Global Address Space Programming Interface (GASPI)

End, Vanessa

dc.contributor.advisor	Yahyapour, Ramin Prof. Dr.
dc.contributor.author	End, Vanessa
dc.date.accessioned	2017-03-28T08:08:55Z
dc.date.available	2017-03-28T08:08:55Z
dc.date.issued	2017-03-28
dc.identifier.uri	http://hdl.handle.net/11858/00-1735-0000-0023-3DF3-4
dc.identifier.uri	http://dx.doi.org/10.53846/goediss-6213
dc.language.iso	eng	de
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject.ddc	510	de
dc.title	On Collective Communication and Notified Read in the Global Address Space Programming Interface (GASPI)	de
dc.type	doctoralThesis	de
dc.contributor.referee	Yahyapour, Ramin Prof. Dr.
dc.date.examination	2016-12-14
dc.description.abstracteng	In high performance computing (HPC) applications, scientific or engineering problems are solved in a highly parallel and often necessarily distributed manner. The distribution of work leads to the distribution of data and thus also to communication between the participants of the computation. The application programmer has many different communication libraries and application programming interfaces (APIs) to choose from, one of the most recent libraries being the Global Address Space Programming Interface (GASPI). This library takes advantage of the hardware and especially interconnect developments of the past decade, enabling true remote direct memory access (RDMA) between nodes of a cluster. The one-sided, asynchronous semantic of GASPI routines opens multiple research questions with respect to the implementation of collective communication routines, i.e., routines, where a group of processors is involved in the communication. The GASPI specification itself only offers two of these collective operations: the allreduce, computing a global result from the data of all participants, and the barrier, constituting a synchronization point for all members of the group. For these collective routines, appropriate underlying algorithms have to be chosen. In the scope of the one-sided, asynchronous and split-phase semantic of GASPI collective routines, algorithms used in other wide-spread communication libraries like the Message-Passing Interface (MPI) may not be fitting any more. In this thesis, existing algorithms have been reevaluated for their usability in GASPI collective routines in the context of a newly designed library GASPI_COLL, amending the existing GASPI implementation GPI2 with additional algorithms for the allreduce and with further collective routines: reduce and broadcast. For the split-phase allreduce, algorithms with a butterfly-like communication scheme have been extensively tested and found to be very suited due to their low number of communication rounds and involvement of all participants in each communication round. This ensures few repeated calls to the allreduce routine and also very small idling times for all nodes. One of the most wide-spread algorithms for barrier operations, the dissemination algorithm, has been adapted to be usable for the allreduce operation as well. The adapted n-way algorithm shows very good results compared to the native implementation of the GPI2 allreduce and different MPI implementations. To make the one-sided communication semantic of GASPI manageable for the application programmer, the GASPI specification introduces weak-synchronization primitives, notifying the destination side of the arrival of data. This notification mechanism prevents the necessity of global synchronization points or the waiting on multiple communication requests. This notification mechanism has previously only been available for write-based operations but has been extended to the read routine in the scope of this thesis, introducing gaspi_read_notify. With this new routine, the thesis establishes the basis of a completely one-sided, asynchronous graph exploration, implemented with the notified read operation. This enables a broader audience to use data analytical methods on big data. Big data poses a real challenge for graph analytical methods, because the data needs to be distributed on multiple nodes, introducing high communication overhead if two sides are involved in the communication. This issue is eliminated through gaspi_read_notify. Last but not least, the potential usage of gaspi_read_notify for a distributed matrix transpose was investigated. Not only is a matrix transpose a wide-spread communication scheme in HPC applications, it can also be considered as a special case of an alltoall communication. The split-phase, one-sided paradigm of GASPI collective routines, has inspired the idea of a partially evaluable alltoallv and as a first step towards this routine, the applicability of gaspi_read_notify for the implementation of the alltoall can be deduced from the matrix transpose. On the available systems, this kind of implementation can not be encouraged though. Yet, the experiments in this thesis have also shown the high dependence of communication routines and algorithms on the underlying hardware. Thus, extensive tests on different system architectures will have to be done in the future.	de
dc.contributor.coReferee	Lube, Gert Prof. Dr.
dc.contributor.thirdReferee	Geiger, Alfred PD Dr.
dc.subject.eng	GASPI	de
dc.subject.eng	PGAS	de
dc.subject.eng	Paritioned Global Address Space	de
dc.subject.eng	Collective Communication	de
dc.subject.eng	Notified Read	de
dc.subject.eng	Adapted n-way Dissemination	de
dc.identifier.urn	urn:nbn:de:gbv:7-11858/00-1735-0000-0023-3DF3-4-6
dc.affiliation.institute	Fakultät für Mathematik und Informatik	de
dc.subject.gokfull	Informatik (PPN619939052)	de
dc.identifier.ppn	883167972

Files in this item

Name:End_Vanessa_Dissertation_online_webopt.pdf

Size:2.756Mb

Format:PDF

Description:Dissertation Vanessa ...

View/Open

Name:: End_Vanessa_Dissertation_onlin ...
Size:: 2.756Mb
Format:: PDF
Description:: Dissertation Vanessa End 2016

View/Open

This item appears in the following Collection(s)

Fakultät für Mathematik und Informatik (inkl. GAUSS) [518]

Show simple item record