332
        you are viewing a single comment's thread
view the rest of the comments
    
  
  
    view the rest of the comments
        this post was submitted on 14 May 2024
        
  
      
  
      332 points (100.0% liked)
      Programmer Humor
    38851 readers
  
      
      19 users here now
  
      Post funny things about programming here! (Or just rant about your favourite programming language.)
Rules:
- Posts must be relevant to programming, programmers, or computer science.
- No NSFW content.
- Jokes must be in good taste. No hate speech, bigotry, etc.
        founded 6 years ago
      
  
  
      MODERATORS
      
  
     
            
          
In my experience, mypy + pydantic is a recipe for success, especially for large python projects
I wholeheartedly agree. The ability to describe (in code) and validate all data, from config files to each and every message being exchanged is invaluable.
I'm actively looking for alternatives in other languages now.
You're just describing parsing in statically-typed languages, to be honest. Adding all of this stuff to Python is just (poorly) reinventing the wheel.
Python's a great language for writing small scripts (one of my favorite for the task, in fact), but it's not really suitable for serious, large scale production usage.
I'm not talking about type checking, I'm talking about data validation using pydantic. I just consider mypy / pyright etc. another linting step, that's not even remotely interesting.
In an environment where a lot of data is being exchanged by various sources, it really has become quite valuable. Give it a try if you haven't.
I understand what you're saying—I'm saying that data validation is precisely the purpose of parsers (or deserialization) in statically-typed languages. Type-checking is data validation, and parsing is the process of turning untyped, unvalidated data into typed, validated data. And, what's more, is that you can often get this functionality for free without having to write any code other than your type (if the validation is simple enough, anyway). Pydantic exists to solve a problem of Python's own making and to reproduce what's standard in statically-typed languages.
In the case of config files, it's even possible to do this at compile time, depending on the language. Or in other words, you can statically guarantee that a config file exists at a particular location and deserialize it/validate it into a native data structure all without ever running your actual program. At my day job, all of our app's configuration lives in Dhall files which get imported and validated into our codebase as a compile-time step, meaning that misconfiguration is a compiler error.
I am aware of what you are saying, however, I do not agree with your conclusions. Just for the sake of providing context for our discussion, I wrote plenty of code in statically typed languages, starting in a professional capacity some 33 years ago when switching from pure TASM to AT&T C++ 2, so there is no need to convince me of the benefits :)
That being said, I think we're talking about different use cases here. When I'm talking configuration, I'm talking runtime settings provided by a customer, or service tech in the field - that hardly maps to a compiler error as you mentioned. It's also better (more flexible / higher abstraction) than simply checking a JSON schema, and I'm personally encountering multiple new, custom JSON documents every week where it has proven to be a real timesaver.
I also do not believe that all data validation can be boiled down to simple type checking - libraries like pydantic handle complex validation cases with interdependencies between attributes, initialization order, and fields that need to be checked by a finite automaton, regex or even custom code. Sure, you can graft that on after the fact, but what the library does is provide a standardized way of handling these cases with (IMHO) minimal clutter. I know you basically made that point, but the example you gave is oversimplified - at least in what I do, I rarely encounter data that can be properly validated by simple type checking. If business logic and domain knowledge has to be part of the validation, I can save a ton of boilerplate code by writing my validations using pydantic.
Type annotations are a completely orthogonal case and I'll be the first to admit that Python's type situation is not ideal.
Gradual typing isn't reinventing the wheel, it's a new paradigm. Statically typed code is easier to write and harder to debug. Dynamically typed code is harder to debug, but easier to write. With gradual typing, the idea is that you can first write dynamic code (easier to write), and then -- wait for it -- GRADUALLY turn it into static code by adding type hints (easier to debug). It separates the typing away from the writing, meaning that the programmer doesn't have to multitask as much. If you know what you're doing, mypy really does let you eat your cake and keep it too.