Image: Pachyderm

Joe Doliner
Pachyderm
Co-founder & CEO
Joe Doliner graduated from the University of Chicago. He worked for Airbnb as a software engineer in 2014. After that, he founded Pachyderm.

Why do businesses need Pachyderm?

Pachyderm allows users to track their data in new ways, and work with their data without being constrained to specific coding languages.

Companies have data coming in from all corners of their business but frequently struggle to find an effective way to store and process it. In the realm of data infrastructure, Pachyderm offers a unique solution by leveraging the use of Docker containers. Docker containers, as company co-founder and CEO Joe Doliner explains, are essentially “a standardized way of shipping around code. It lets me take some code that I wrote in Python, and some code that I wrote in C++, and then hand them to you and let you use them in exactly the same way.”

Why should companies be excited by the prospect of using such a system? The benefits of this flexibility may not seem immediately obvious, at least until you consider the fact that the majority of data infrastructure is built around the Java language. This is fine as long as you know Java--but there are a lot of data scientists who don’t. Moreover, there is a need in many verticals to be able to apply tools that are not written in Java. “This is why we get a lot of interest from biomedical scientists, for example. They have tools that are not in Java, but still work great in Pachyderm,” says Doliner.

Pachyderm, though, offers more than just flexibility. They also offer what is called “data provenance,” the ability to discern where data has originated, as well as version control, the ability to see how your data changes over time. Both of these features are unique to Pachyderm, and potentially very valuable to clients.

To highlight the benefits of data provenance, Doliner offers a clear example from the financial industry: Consider a bank, which takes in huge amounts of transaction data. Banks want to use this data in order to train machine learning models to make better decisions regarding whom they should grant loans. However, regulations governing what data are allowed to be used in making these decisions change. Therefore, in order to show that a particular decision complied with regulations at the time it was made, banks need to be able to retrace their steps and show where the data they used to train their machine learning models originated. Version control offers additional perks. If a machine learning model is trained on a particular data set, but then does a terrible job of predicting outcomes, analysts want to be able to hone in on the problem. Do they need to think of a different way to model these relationships, or did they train on a sub-par dataset? With version control allows analysts to do just that, by tracking how their data change over time.

Open source code, closed source option

Pachyderm is based off their open source core technology, which is totally free. On top of that core offering, Pachyderm also has an enterprise product that is closed source, which they sell licenses for. “The pricing depends on the customer. Generally, we sell licenses for about $100k a year. We also charge based on how many people are using it. The low end for non-profits is about $5k, with the high end up to $2M a year,” explains Doliner.

While the enterprise product includes important benefits that the open source code lacks, Doliner points out: “We try to keep that functionality to stuff that is going to add value to big enterprises but isn’t going to matter to a hobbyist. It allows you, for instance, to have governance over the data. It allows you to regulate who is allowed to see what data. We also offer a must nicer front end on the enterprise product.”

Future horizons

For the immediate future, Pachyderm remains focused on improving its existing products. As Doliner says, “We are trying to build out the functionality of the product. We have lots and lots of customers asking for various things, so we are focused on staying on that trajectory.” But that isn’t to say Pachyderm doesn’t have other plans as well. To summarize their long term goals, Doliner draws an analogy with Github and its effect on software development. “Git is just a piece of open source software, which is free to use for everyone. Github adds the social layer and allows people to collaborate on code.” Right now, Pachyderm is the Git for data, but the company sees an opportunity to do for data what Github did for software by adding in the social aspect. As to why this would be beneficial? “You need to be pretty savvy with cloud infrastructure to get a container running. But once you have it running you can do a lot of cool stuff with it without having to know much.” By adding in this social piece, Pachyderm aims to help more people who may lack a technical background use data effectively.



RELATED ARTICLES
スマホ撮影のスポーツ映像をAI解析 手作業依存だったスタッツ市場を変革するSportsVisio
スマホ撮影のスポーツ映像をAI解析 手作業依存だったスタッツ市場を変革するSportsVisio
スマホ撮影のスポーツ映像をAI解析 手作業依存だったスタッツ市場を変革するSportsVisioの詳細を見る
購買交渉にゼロサム思考はいらない!ウォルマートも導入するAI交渉エージェント Pactum
購買交渉にゼロサム思考はいらない!ウォルマートも導入するAI交渉エージェント Pactum
購買交渉にゼロサム思考はいらない!ウォルマートも導入するAI交渉エージェント Pactumの詳細を見る
「農業インフラ」のミツバチを絶滅から救え!解約数ゼロ件のロボット養蜂箱を展開するBeewise
「農業インフラ」のミツバチを絶滅から救え!解約数ゼロ件のロボット養蜂箱を展開するBeewise
「農業インフラ」のミツバチを絶滅から救え!解約数ゼロ件のロボット養蜂箱を展開するBeewiseの詳細を見る
アメリカのパワーカップルに人気!1,000種類以上の料理を「自動調理」するロボットを開発 Posha
アメリカのパワーカップルに人気!1,000種類以上の料理を「自動調理」するロボットを開発 Posha
アメリカのパワーカップルに人気!1,000種類以上の料理を「自動調理」するロボットを開発 Poshaの詳細を見る
九大発の「アミン含有ゲル」技術  CO2回収のエネルギーコストを大幅削減 JCCL
九大発の「アミン含有ゲル」技術  CO2回収のエネルギーコストを大幅削減 JCCL
九大発の「アミン含有ゲル」技術  CO2回収のエネルギーコストを大幅削減 JCCLの詳細を見る
三菱地所と提携し日本展開中、ホテル暮らしの不便さが生んだ月単位の家具付き高級賃貸 Blueground
三菱地所と提携し日本展開中、ホテル暮らしの不便さが生んだ月単位の家具付き高級賃貸 Blueground
三菱地所と提携し日本展開中、ホテル暮らしの不便さが生んだ月単位の家具付き高級賃貸 Bluegroundの詳細を見る

NEWSLETTER

TECHBLITZの情報を逃さずチェック!
ニュースレター登録で
「イノベーション創出のための本質的思考・戦略論・実践論」
を今すぐ入手!

Follow

探すのは、
日本のスタートアップだけじゃない
成長産業に特化した調査プラットフォーム
BLITZ Portal

Copyright © 2025 Ishin Co., Ltd. All Rights Reserved.