GitStorage Interview

Question: What exactly is GitStorage?

Answer: GitStorage is a device that runs the version control system "git" on a small Linux computer. Git can be hard to use, especially when dealing with repositories on a server. Git Storage is sugar coating git, making it easier and safer to run git in a network.

Question: Why is GitStorage relevant?

Answer: When you develop something that goes through revisions like code, you need a tool that handles those revisions. This is why revision control systems were invented a long time ago. GitStorage provides that functionality.

Source code is an important asset in a company. It makes sense to protect those assets, just like you protect money or your inventory.

The difference between stealing cash and source code is that you usually know when your money was stolen. But when someone steals your source code they usually don't leave any traces behind. You might find out later that your competitors know too much about your products, or that backdoors have been opened into your products. We tried to find statistics about how big the risk is; but it is really hard to say how many times code has been stolen. Only few companies can say for sure code was stolen and out of those companies who would make a press announcement out of this. It does not get our attention, a typical iceberg problem.

Question: Why not using a cloud service for that problem?

Answer: We are huge fans of the cloud. Some sites that host git are like social networks for coders. They make it possible to share code and enable collaboration like never before.

Generally speaking, in many cases it makes a lot of sense to have a professional team take care about complex IT problems. That is one major benefit of the cloud. But not everything should be in the cloud. The problem is that such services are such a big target for attackers. If someone can break in, this can be not only about one tenant it can be about all tenants. There were some major attacks on high security data company's recently. I mean look at the Bitcoin banks, they were such a huge target. We might never find out how big the damage was. We worry that sooner or later one of the git hosting companies gets hit and code that was supposed to stay private gets into the wrong hands.

Also you simply have to trust people that you have never seen in your life and probably never will. Employees in the hosting company can access your repositories, even if they are private. I don't think that admins have to walk through metal detectors on their way out. The setup is appropriate for sharing code with a large community, but this is not a setup for keeping code safe.

Question: How can a small device handle such important data?

Answer: The times where you needed a mainframe to store valuable information is totally over. You can store an awful lot of information in 64GB or even 16GB. If you have a web page with 150MB of code and images, you can roughly fit more than 300 repositories on 64GB on that device. The CPU is not as fast as a desktop computer, but if you are running the device in your LAN the benefit of the fast connection usually outweighs the slower performance of the embedded CPU.

But what is important that this device does only this one function. There is nothing else happening on that device. That means we don't have to worry that another process is looking into what we are doing in memory and on the file system. The only way to get to the device is through the ports that we have opened and which we control. That is a lot easier to defend that a whole server.

Question: What if someone steals the server?

Answer: That person does not get much out of it.

We have added the possibility the screw the device on the wall or under the table to make it harder to remove. Though honestly that is more symbolic that really helping to prevent losses.

The real point here is that the data is encrypted on the file system. Someone who steals the device does not have the password for that. That password has to be entered after a boot up; it is not stored on the file system. Without the password you will only see digital spaghetti on the file system.

Question: Would there be a way to get the data back?

Answer: Yes, that is an important feature. The device has an automatic cloud backup. But not the repo in plain text; the device encrypts the backup with a password. After a device gets lots, or what can also happen, the hardware just breaks, the customer can buy a replacement device and restore the repos on that new device. It might take a couple of days until the device is available, and git allows to work offline. After the new device is installed the show will go on. You can as well just buy a second device as a standby replacement device if your time is critical, the devices are not very expensive after all.

This can also be used to restore the data on a regular git server if you don’t want to use the GitStorage device anymore for whatever reason. There is no lock in. Of course that would mean you would have to go through the whole installation and maintenance process yourself, including securing the server.

Question: What about the client? Wouldn't it be silly to do everything on the server side and then leave the client wide open?

Answer: Absolutely. The client is also very important. Or course we use a secure connection with the client. The git client comes with a good way of encrypting the traffic to the server. But it is not only about git and the server, there is a lot more to do like making sure that the devices that they use to develop code are physically separated from devices that they use for example for their social networking stuff. Breaking projects into smaller repositories also helps as it reduces the exposure when one client fails. Protecting the clients is a topic that every user or project manager has to think about.

Also you have to keep in mind that the client may store the code only for a relatively short period of time, while the developers are working on it. But the code must be stored potentially for a long time. You want to make sure that during that time it does not accidentally get into the wrong hands.

Question: How secure is the software?

Answer: We are using our own TLS implementation and of course we believe that this is the best TLS implementation ever. I don't say that it is better than for example OpenSSL. But it is a lot less known. The problem is that OpenSSL is so much used everywhere, this is simply a gigantic target. My impression is that there are a lot of people looking for weaknesses in those mainstream TLS implementations, but a lot less people reporting and fixing those bugs. The benefits from exploiting are higher than the benefits for fixing them.

By default we are also opening the SSH port for git. This was out of practical considerations, SSH is widely used with git and we want to make it easy to use it. We don't control the SSH software, know it in detail or the persons who wrote it. Personally we have turned the SSH port off for our own server. I don't think we need to open the SSH port, we have set our device up properly with HTTPS which is also easy and convenient to use and we don't have to speculate how secure the SSH port is.

Question: How much does a customer have to trust you?

Answer: Not much. There are a few things that the device needs to do to operate, like updating the server about its local address. We deliberately do that using unencrypted HTTP, so that everyone can take a look what is being transferred and that this does not pose a risk. But you can and maybe even should operate the device completely isolated from the Internet. If you want to perform a software upgrade, you can open the firewall just for that upgrade.

Question: Can this device also used for other purposes than code?

Answer: It is not limited to C++, JavaScript or HTML. I would consider also design data, for example for mechanical design as "code". Even someone writing a book who has to store different revisions of the book would be a coder in that sense. Or someone working with DNA. Essentially everything that is intellectual property which is undergoing revisions would be code for us. The main limitation would be that the users need to know how to use the git client, or their tools know how to do that.

Question: How do you justify the price?

Answer: If you calculate your time, it is cheaper to use a device that is just ready instead of setting up a server yourself, including purchasing the hardware components and installing the software. The device is low power, keeping your electricity cost low. Keep in mind this is a 24/7 server, and 1 W roughly means 1 USD cost per year depending on how much you pay per kWh. But the main cost is the time to set up and maintain your own server.