Decentralised machine learning for data privacy?
There are plenty of benefits to be gained from machine learning but most techniques require a large amount of data. For example, automatic tagging of photographs is one simple but useful feature but this requires a large dataset of images.
We also live in a world where surveillance is a big problem, and as we give up our data to big corporations, this only gets worse. It would be wise to not allow governments and large corporations to have access to so much data without our explicit consent. To solve these two problems, I've been thinking about an alternative solution.
Would it be possible to create a framework whereby the training of algorithms occur on small sets of data and only the end results are aggregated on a large scale in a desynchronised, decentralised fashion?
For example, let's take face tagging. The current method of tagging used by Facebook involves looking at all the photographs uploaded by users that have been manually tagged and then training their facial recognition algorithm on this large data set. This requires that we give facebook unencrypted photographs to be stored on their servers, as well as a large database of all the facial tags that we manually added to said photographs. This is a huge privacy hazard.
The alternative: each user encrypts photographs before storing them on facebook's servers. The only people who can see them would be your friends who have keys to the photographs. When you or your friends tag photographs, client-side software on either your phone or your computer uses those tags to train a facial tagging algorithm about the photographs. The results of this training are then uploaded to a server, where they are then collated with other people's results. The final, master algorithm is then used by everyone for auto-tagging of photographs, run locally on the user's computer or phone.
This alternative strategy reduces the need to open up data to big companies, while we still get the benefits of big data. If users don't want to train data on their own phones/computers, they could have the option of uploading their unencrypted photos to facebook (or another trusted server), at their own risk.
I would love to know what you guys & gals think about this idea. Is it technically feasible? Is it even sensible? This is something that we need in the zeronet world!