Amin Sahebi

PhD Student

About Me

I’m a PhD student, have been admitted to the Smart Computing Program on November 2018, a joint program between three universities, Firenze, Pisa and Siena in Tuscany region. Currently, I’m working on Reconfigurable Computing and High Performance Computing solutions for Artificial Intelligence Application.

Please find my complete Curriculum Vitae (CV) here: Download

List and Repository of Some Projects

Dataflow-Runtime (DRT)

Link to the ARCS 21 paper.

This repository contains the DRT (dataflow runtime) A Lightweight Runtime for Developing Benchmarks for a Dataow Execution Modelof the paper in International Conference on Architecture of Computing Systems (ARCS 21) conference.

OmpSs Tutorial

This repository contains a complete tutorial to install and test properly OmpSs on conventional machines with different architectures such as ARM64, x86 and Hetereogenous architecture like zynq ultrascale+ with quad-core arm cortex A53.

Education

PhD student

Computer Science

Tuscany-ITALY

November 2018 - Present

Smart Computing is a joint program between three universities, University of Florence, University of Siena, University of Pisa.

My researches are mostly focusing on reconfigurable architectures to accelerate performance of computing applications such as Artificial Intelligence real-world applications. I’m contributing in a teamwork group and we have been powered with AXIOM project, which is a part of European Horizon 2020 project which already finished. My current tasks are comprising FPGA PL and PS side design (we are working on ZYNQ Ultrascale+ MPSoCs) including HLS-high level synthesized language and VHDL and Linux Embedded Device Driver development to have access our Metrics and Registers from User level side. Moreover, as some case studies, I worked on MAXELER accelerators to design and develop a Data- Flow FFT architecture which at the first step concluded to achieve the Best Paper Award of the IEEE MECO conference June 2019.

Master of Science

Digital Electronic

Mazandaran-IRAN

2011 - 2013

Smart Computing is a joint program between three universities

Experience

MAPNA

EMBEDDED SYSTEMS ENGINEER

Tehran-IRAN

September 2013 - October 2018

https://www.mapnagroup.com/en

During 5 years working as Embedded System Engineer, my major activities were about Hardware solutions and firmware applications and protocols on real-time operating systems. I contributed to a research and development team with responsibilities to design and developing Main Processing Units (MPU) Firmware applications, operating system and interfaces, with a concentration on C/C++ applications, along with powerful IBM UML Modeling software tools. The environment of programming was almost in Unix-based operating systems such as Linux and QNX OS and almost using POSIX as standard programming API.

Publications

DRT : A Lightweight Runtime for Developing Benchmarks for a Dataflow Execution Model.

International Conference on Architecture of Computing Systems (ARCS)

June 2021

htps://TBD

Abstract Future computers may take advantage of a dataflow program execution model (PXM) for both performance and energy advantages. One key element to provide a compilation tool-chain for such machines is a framework for developing initial benchmarks. DRT (Dataflow Run-Time) is a tool that enables the fast prototyping of those benchmarks for the Dataflow Threads (DF-Threads) PXM. In this work, we show how to use DRT to develop dataflow based examples to be targeted by a future compiler for the dataflowPXM. DRT has been written in portable C code (tested with the GNU C compiler), and it is open-source, therefore, it can be used on real machines based on architectures like x86,AArch, RISC-V ISA. Here, we discuss some didactic examples, and we show how to study and debug the data exchange, which is flowing through frames that are detached from the data stack. We compare DRT against similar data flow runtime libraries such as DARTS and OCR. Even though our environment is not yet optimized, we found that DRT outperforms the above runtime frameworks in terms of execution time. We also give an evaluation of the time and complexity to develop DF-Threads examples in DRT compared to the approach of using a full system simulator and FPGAs for more accurate modeling.

GLUON, The High-Speed Inexpensive and Easy Interconnect Solution.

HiPEAC International Summer School (ACACES)

June 2020

http://www.dii.unisi.it/~giorgi/papers/Sahebi20-acaces.pdf

Abstract Heterogeneous systems are one of the most discussed architectures in computer science. Their capabilities have provided many good features for researchers to use this kind of structure in their state-of-the-art works. Heterogeneous systems are flexible, cost-efficient, and well-supported by communities. They are widely used in artificial intelligence, automotive, IoT, and embedded applications. Moreover, there is also a challenge to have a sufficient, cost-efficient, and flexible structure to use heterogeneous systems. In this work, we present the GLUON board, which is capable of using serial transceivers in Xilinx Ultra-scale+ structure and facilitates using GTH transceivers in high rate data transfer applications, the possible solution would be a high data rate cluster network based on Zynq Ultra-scale+ MPSoCs, which can easily deploy a multi-node, multi-code structure in reasonable cost.

A Data-Flow Methodology for Accelerating FFT.

Mediterranean Conference on Embedded Computing (MECO)

June 2019

https://ieeexplore.ieee.org/abstract/document/8760044

Abstract The native implementation of the N-point digital Fourier Transform involves calculating the scalar product of the sample buffer (treated as an N-dimensional vector) with N separate basis vectors. Since each scalar product involves N multiplications and N additions, the total time is proportional to N^2 , in other words, its an O(N^2 ) algorithm. However, it turns out that by cleverly re-arranging these operations, one can optimize the algorithm down to O(Nlog_2 (N)), which for large N makes a huge difference. The optimized version of the algorithm is called the Fast Fourier Transform, or the FFT. In this paper, we discuss about an efficient way to obtain Fast Fourier Transform algorithm (FFT). According to our study, we can eliminate some operations in calculating the FFT algorithm thanks to property of complex numbers and we can achieve the FFT in a better execution time due to a significant reduction of N/8 of the needed twiddle factors and to additional factorizations.

A Little More About Me

This is where you can write a little more about yourself. You could title this section Interests and include some of your other interests.

Or you could title it Skills and write a bit more about things that make you more desirable, like leadership or teamwork