Deneb is a solution for synchronizing directories across multiple computers. This project is in the early stages of development, but the goal is to create something similar to commercial synchronized storage solutions such as Dropbox, Google Drive, or Box.com.
I’m working on Deneb in my free time and it gives me the opportunity to experiment with various concepts such as: content-addressed storage, Merkle trees or conflict-free replicated data types. It’s also an occasion to build a complete project using the Rust programming language.
I’ve started the project to scratch a personal itch, but I would be happy if others found it useful, too. The planned feature set, to distinguish it from existing solutions is:
- Immutable content-addressed storage - old versions of files are not deleted, since content blocks are never modified; ability to revert to an earlier state of the synchronized directory.
- Deduplication - comes for free from the use of content-addressed storage.
- Compression - content chunks should be stored compressed to reduce space requirements and the amount of data to be transfered.
- End-to-end encryption - data should never leave the clients unencrypted.
- (Optional) Laziness - file contents are only transfered between clients when needed.
- (Optional) Decentralized - it may be possible to do synchronization with a peer-to-peer approach, instead of using a central server.
- Open Source (MPLv2) - it’s best to be able to inspect the code that is storing your data, moving it around, encrypting it, etc.
CernVM File System
The CernVM File System (CernVM-FS) provides a scalable and reliable software distribution service. It was developed at CERN to assist High Energy Physics (HEP) collaborations in deploying software on the worldwide distributed computing infrastructure for data processing applications, but it can also be used in other domains.
CernVM-FS is implemented as a POSIX read-only filesystem in user space (FUSE) and uses content-addressed storage and Merkle trees for maintaining file data and meta-data. Files are stored remotely on standard web servers and are retrieved and cached on-demand through outgoing HTTP connections only, avoiding most of the firewall issues of other network filesystems. For writing, CernVM-FS follows a publish-subscribe pattern with a single source of new content that is propagated to a large number of readers.
My current focus in the project is improving the scalability of the filesystem and implementing a more reactive distribution architecture. If you want to know more about my work on CernVM-FS, you can watch this talk I gave at CodeMesh 2017 or this one at CurryOn 2017.
Pix4Dmapper is a complete processing suite for photogrammetry developed by Pix4D. It’s used to obtain 3D point cloud and various other outputs, such as maps or surface models, starting from UAV imagery. At Pix4D I was working on the real-time point cloud visualization and analysis component of Pix4Dmapper.
The application is not open source, but a free trial version should be available on the company website, if you’re curious.
LifeV was the main software project I worked on during my PhD. It’s a set of C++ libraries for solving PDEs with the finite elements. The focus of the project is on the accurate solution of very large problems. As such, the algorithm and data structure implementations are parallel, suitable for use on large clusters and supercomputers.
My PhD work was on trying to improve the scalability of the finite element simulations in LifeV - through both algorithmic and implementation improvements. LifeV is based on the Trilinos libraries, which gave me the opportunity to contribute to that project, too.
Trilinos is a set of foundation C++ libraries for developing high-performance parallel numerical applications. The various packages of Trilinos provide high-quality implementations of the algorithms (linear algebra, linear and non linear solvers etc.) and data structures (matrices, vectors, distributed maps etc.) needed to construct performant numerical software.
I contributed to Trilinos during my PhD, focusing mostly on shared-memory parallelism support in data structures and linear solvers for multi-core NUMA architectures (ShyLU).