For PeerLibrary project we had discussions on how much distributed or centralized should we make it. It is a cloud service and centralizing (unifying) user base and content has clear benefits for both end users and developers. End users have better user experience and ease of use, all content is available quickly and easily, and all other users are there, making social experience better. For developers, code can be much simpler and maintenance of only one instance makes it easier to deploy new versions and push security fixes quickly. It is easier to collect statistics and do A/B testing on a large sample. On the other hand, having multiple instances of PeerLibrary distributed around the world makes whole system more robust, specialized instances could be offered, privacy of users increased. Having PeerLibrary distributed would make forkability easier, encouraging more community control of both the project and the content (commons), preventing corruption of the main instance or core project.
I argue that what if we want a distributed system, what we in fact want is what I call “proactive federation” and not simple federation.
By simple federation I have in mind a project that supports running multiple instances which communicate to each other and sync between themselves. This gives a promise of data not being siloed, a promise of data being open and for everyone to use. But I believe we can do better. It is not so important that data is stored in different locations, but that data is used for different purposes and processed in different systems and simple federation does not addresses this.
With proactive federation I have in mind a platform which proactively pushes its data to other, different, platforms. Other platforms which have different use cases and do different things with data. Proactively means that platform itself takes care of distributing data to other platforms. Important part is that they are different platforms, different codebases, different organizations, different reasons for their existence. This makes data really useful: it finds use in use cases not imagined or possible by original project. It makes sure that data is assured longevity because different projects take care of it. It makes sure that it is combined with other data. If we compare to nature, instead of trying to prevent extinction of a particular gene by cloning, we should mix the gene with genes of other organisms. This improves whole ecosystem.
Technically, it also improves access to data. If data is available only through API for a project A, then any other project has to implement project A API. If project A pushes to project B, then and other project which already uses project B API benefits from that. We get a network effect.
As data crosses borders between projects we might say that data will lose some of its original properties. Maybe. But does translation between languages always just loose or it sometimes also adds? Adds new culture, new context? In the long term I believe this will enrich commons. We should not be afraid of imperfect copies. This is how progress is made.
So instead of using time and resources on how to make federation between instances of PeerLibrary I am more interested in how to proactively push data to other projects. To push metadata about academic publications. To push annotations of academic publications. To push relations between academic publications. Let’s share, proactively share. Because you never know how will data available in some other project spark a new idea and use.