{"id":45,"date":"2023-02-05T04:07:56","date_gmt":"2023-02-05T04:07:56","guid":{"rendered":"http:\/\/miranda-rosalise.net\/?p=45"},"modified":"2023-02-05T08:47:31","modified_gmt":"2023-02-05T08:47:31","slug":"jetbrains-please-fix-dataspell","status":"publish","type":"post","link":"https:\/\/miranda-rosalise.net\/?p=45","title":{"rendered":"Jetbrains, Please Fix DataSpell!"},"content":{"rendered":"\n<p>I mean it. I&#8217;ll go back to VSCode! I&#8217;ll even work in a browser on JupyterLab!<\/p>\n\n\n\n<p>I love <a href=\"https:\/\/www.jetbrains.com\/dataspell\/\">DataSpell<\/a>. Or the idea of it anyway. It&#8217;s billed as &#8220;The IDE for Data Scientists&#8221;, and it (mostly) lives up to this name. Through my years I&#8217;ve seen all kinds of Jupyter solutions, including barebones on a private box that was WAY too cool for someone of my pay grade (ask me how you compile <code>gcc-5.x<\/code> on RHEL without any tools that have been updated beyond 2005. Actually don&#8217;t, that&#8217;s a bad conversation starter), semi-managed installations on a much more appropriately-sized instance, fully customized internal solutions built by a small army of EEs to integrate perfectly with one of the world&#8217;s largest machine learning workflows, and, well, <a href=\"http:\/\/databricks.com\">Databricks<\/a>.<\/p>\n\n\n\n<p>I&#8217;ll give you a hint: Databricks blew them all away. (Sorry bento!)<\/p>\n\n\n\n<p>But aside from medium-to-large data workflows that demand that kind of hardware, there&#8217;s a large gulf of space in the &#8220;laptop notebook&#8221; world, with an endless collection of IDEs and <a href=\"http:\/\/emacs.org\">things that are objectively better but I&#8217;m too dumb to use them properly<\/a>, and some of these solutions are just far superior than cloud-based solutions that run in a web browser (in terms of usability).<\/p>\n\n\n\n<p>The main usability gains of an IDE in no particular order:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Full plugin support for interacting with non-notebook code<\/li>\n\n\n\n<li>Git integration<\/li>\n\n\n\n<li>Incredible customizability\n<ul class=\"wp-block-list\">\n<li>Especially keyboard shortcuts. Ever tried to use emacs keybindings on a browser-based application? Yeah, it&#8217;s not a good time.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>And from a high level, DataSpell nails all of these. It&#8217;s got a kick-ass built-in environment manager, allows you to seamlessly switch between .ipynb, .py, .json, or whatever you need to edit. And yes, it handles R pretty well. I&#8217;m told as much anyway.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.03.21-PM.png\" alt=\"An screenshot from DataSpell. The screenshot shows several different services that can be managed via a GUI. The items include database scripts, python tests, R, and tox commands.\" class=\"wp-image-46\" width=\"190\" height=\"232\" srcset=\"https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.03.21-PM.png 380w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.03.21-PM-246x300.png 246w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.03.21-PM-300x366.png 300w\" sizes=\"auto, (max-width: 190px) 100vw, 190px\" \/><figcaption class=\"wp-element-caption\">look at all these services!<\/figcaption><\/figure>\n<\/div>\n\n\n<p>Check it out! GUI first-party tox support? PyCharm users may already know about this, but as someone who was coming from VSCode, this kind of integration was just about the smartest thing you could put into one of my IDEs. Countless data scientist work-hours have been spent doing battle with tox in a terminal while trying to fix airflow DAGs or other similar job code objects.<\/p>\n\n\n\n<p>But if the ambition and the scope gets you excited, it&#8217;s the implementation of these features that is the ultimate letdown. There&#8217;s simply a huge number of small, annoying bugs throughout this product. That tox configuration? Yeah, it&#8217;s just DOA from the moment I started up DataSpell to write this post.<\/p>\n\n\n\n<p>What really broke me recently, however, was Git integration. Git is a thorny issue for data scientists, because it&#8217;s become synonymous with context-switching. We must often move between a journeyman&#8217;s understanding of git and the peculiarities of an organization&#8217;s pre-commit and push githooks when editing, say, a pipeline job. But when we switch back to our notebooks, there&#8217;s snag after snag.<\/p>\n\n\n\n<p>The debate on how to properly version control data science artifacts goes back to the earliest days notebook software itself. Indeed, while Jupyter dates back to 2014, one of its many spiritual predecessors, SageMath, was released only a month or so before Git. Jupyter notebooks, like SageMath, initially had a primarily single-user intended use-case. A lonely graduate student, hacking away at a difficult tensor calculus project, or some numerical simulation of fish inside of a large container: this was the audience for these early notebook products. These kinds of projects not only weren&#8217;t collaborative to begin with, but in many cases they had externalities that actively refused collaboration. The amount of time and background required to explain not only the problem itself, but how the code works (one needs only ask 2-3 people in any STEM department about how good the coding standards are amongst their graduate students) both posed a huge barrier to the kind of distributed, open-source development that became increasingly common in the early 21st century.<\/p>\n\n\n\n<p>But we&#8217;re no longer lonely graduate students sitting in windowless offices. Jupyter (and even SageMath, which now just uses Jupyter as a backbone) has grown up and now occupies critical toolbox space in virtually every company that&#8217;s serious about data science. The decision to adapt mainstream version control practices to this technology has many pain points, not least of which is the fact that Jupyter saves outputs along with code cell contents, all in a JSON-formatted specification. This makes reading diffs a <strong>huge<\/strong> PITA.<\/p>\n\n\n\n<p>Anecdotally, I probably field 2-3 git-related questions a week during <code>$DAYJOB<\/code>. We even snagged a DE to write us some awesome githooks that helped us integrate both our notebooking solution and our git repo solution in a more seamless manner. We&#8217;ve probably pinged our sales rep about this issue at least a dozen times.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-wide\"\/>\n\n\n\n<p>I say all this to say the following: Doing version control correctly in many common DS workflows is a hard problem. A problem which I was downright excited for DataSpell to blow me away with. Just take a look at this diff viewer:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"467\" src=\"http:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.52.13-PM-1024x467.png\" alt=\"\" class=\"wp-image-49\" srcset=\"https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.52.13-PM-1024x467.png 1024w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.52.13-PM-300x137.png 300w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.52.13-PM-768x350.png 768w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.52.13-PM-1536x701.png 1536w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.52.13-PM-2048x935.png 2048w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.52.13-PM-850x388.png 850w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">sorry danny. at least your api didn&#8217;t seem to mind it too much.<\/figcaption><\/figure>\n\n\n\n<p>And compare that to this one, on Github.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"502\" src=\"http:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.54.11-PM-1024x502.png\" alt=\"A messy github diff view is shown, with extraneous metadata\" class=\"wp-image-50\" srcset=\"https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.54.11-PM-1024x502.png 1024w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.54.11-PM-300x147.png 300w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.54.11-PM-768x377.png 768w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.54.11-PM-1536x753.png 1536w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.54.11-PM-2048x1004.png 2048w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-7.54.11-PM-850x417.png 850w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">ouch!<\/figcaption><\/figure>\n\n\n\n<p>You can see that a lot of extra stuff gets thrown in there. This means it&#8217;s possible to unknowingly stage and commit changes that don&#8217;t actually have any real differences. Simply re-executing the code without a change will create an unstaged change on disk when the file is saved!<\/p>\n\n\n\n<p>There&#8217;s even more extreme examples, like this one from reviewnb&#8217;s project maintainers:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"457\" src=\"http:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/image-1024x457.png\" alt=\"\" class=\"wp-image-51\" srcset=\"https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/image-1024x457.png 1024w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/image-300x134.png 300w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/image-768x343.png 768w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/image-1536x686.png 1536w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/image-850x379.png 850w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/image.png 1875w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">source: <a href=\"https:\/\/blog.reviewnb.com\/github-jupyter-notebook\/\">https:\/\/blog.reviewnb.com\/github-jupyter-notebook\/<\/a><\/figcaption><\/figure>\n\n\n\n<p>Looks like dataspell is a winner right? Look at the next two images and see if you can spot something off.<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"500\" height=\"636\" data-id=\"53\" src=\"http:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-6.47.46-PM-1.png\" alt=\"\" class=\"wp-image-53\" srcset=\"https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-6.47.46-PM-1.png 500w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-6.47.46-PM-1-236x300.png 236w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-6.47.46-PM-1-300x382.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"676\" height=\"402\" data-id=\"55\" src=\"http:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-6.48.32-PM.png\" alt=\"\" class=\"wp-image-55\" srcset=\"https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-6.48.32-PM.png 676w, https:\/\/miranda-rosalise.net\/wp-content\/uploads\/2023\/02\/Screenshot-2023-02-04-at-6.48.32-PM-300x178.png 300w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/figure>\n<\/figure>\n\n\n\n<p>On the left, we have VSCode. And on the right, DataSpell. You might not catch it at first, but once I told you that these IDEs are working in identical file contexts, you might notice that VSCode lists <strong>each<\/strong> individual git repo within the workspace. DataSpell? Just the one. And you have no control on how to switch to the other(s).<\/p>\n\n\n\n<p>Pretty annoying right? It frustrated me to no end during my company&#8217;s last hack-a-thon. And it&#8217;s this kind of lack of polish that really has put me off from adopting DataSpell as my daily driver. Add into all of this that you have to pay for DataSpell while VSCode is free, and, well.<\/p>\n\n\n\n<p>It&#8217;s not really a competition anymore, is it?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I mean it. I&#8217;ll go back to VSCode! I&#8217;ll even work in a browser on JupyterLab! I love DataSpell. Or the idea of it anyway. It&#8217;s billed as &#8220;The IDE for Data Scientists&#8221;, and it (mostly) lives up to this name. Through my years I&#8217;ve seen all kinds of Jupyter solutions, including barebones on a&#8230;<\/p>\n","protected":false},"author":1,"featured_media":47,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-45","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=\/wp\/v2\/posts\/45","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=45"}],"version-history":[{"count":2,"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=\/wp\/v2\/posts\/45\/revisions"}],"predecessor-version":[{"id":78,"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=\/wp\/v2\/posts\/45\/revisions\/78"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=\/wp\/v2\/media\/47"}],"wp:attachment":[{"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=45"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=45"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/miranda-rosalise.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=45"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}