1360
LibreOffice wee (lemmy.dbzer0.com)
submitted 1 day ago* (last edited 1 day ago) by Stamets@lemmy.dbzer0.com to c/whitepeopletwitter@sh.itjust.works

Now. Why am I wrong for Libre

you are viewing a single comment's thread
view the rest of the comments
[-] antonim@lemmy.dbzer0.com 2 points 1 day ago* (last edited 1 day ago)

What program do you use to convert PDFs, what format do you convert them into for editing?

[-] lime@feddit.nu 2 points 1 day ago

pdf is a compiled format for typeset text, so you need a pdf compiler. i use latex + tectonic. pandoc is also a popular alternative. "converting for editing" is like decompiling a program, you're not guaranteed to get the same thing back as was put in. i never do that, i recompile instead. if i need text from a pdf i use pdftotext and cross my fingers because the formatting ain't coming back out. any program that does replicate formatting just does a best guess.

[-] antonim@lemmy.dbzer0.com 1 points 23 hours ago

I'm not sure if I'm following you - a compiler can be used to edit an existing PDF?

[-] lime@feddit.nu 3 points 23 hours ago

no, you can't edit an existing pdf, the nonstandard form filling extension notwithstanding. you can extract as much information as possible from it and recreate it. that's what "pdf editors" are doing. and since it's not officially supported, any edit can screw the file up.

the reason you can't just edit it is that pdf is basically a container for program code that runs on printers. so you can have text interspersed with formatting information, or text with non-existent characters approximated by vector images, or text that's been rendered to a raster image and is not actually in the document. then you have the fact that pdf can embed specialized fonts, compressed files, security measures, and even internal programs. and it's all offset-based in there so you need to modify the entire file structure in order to get it working again after adding text. what's worse, since any file with a pdf document in it is a valid pdf document according to the spec, less reputable "pdf editors" can just embed whatever shit they want. it's a common malware vector.

it's much safer to re-build the document from source. if you don't have the source, there are tools to extract just the textual content.

[-] antonim@lemmy.dbzer0.com 2 points 22 hours ago

Ok, this definitely helps in understanding how PDF works. However, I really do edit PDFs regularly and have no problems with the edited ones. Already mentioned it ITT, PDF-Xchange lets me do so many things that listing them would sound like an advertisement. Editing the existing text tends to mess it up, that's true, but it's not crucial for me and all sorts of other actions work almost perfectly.

You're imagining some very ideal circumstances for working with PDFs that have nothing to do with my own needs, so I can't really make use of your advice. :/

[-] lime@feddit.nu 1 points 15 hours ago

in what circumstance does pdf editing come up regularly?

[-] Trainguyrom@reddthat.com 1 points 6 hours ago

Banking is very PDF heavy, and many of these PDFs have a ton of logic baked into them. Some of the loan documents do literally all of the math for you so the loan officer just inputs the amount, term and APR and the PDF outputs a fully-filled loan document. Its pretty magical to see until you peek under the hood at the code and oh-my-god-what-the-hell-how-did-this-ever-work-in-the-first-place-this-must-be-purgatory

[-] lime@feddit.nu 2 points 5 hours ago

yeah fun fact that's usually an embedded javascript runtime

yet another reason for it to die in a fire

[-] antonim@lemmy.dbzer0.com 1 points 11 hours ago* (last edited 11 hours ago)

I frequently download book and journal article PDFs, scan books myself, and upload them online. And ofc read them.

Editing the PDFs in my case includes e.g. adding the outline/bookmarks that allow for easier navigation, adding OCR, cropping, splitting and rearranging the pages when the scanned images aren't ideal, removing watermarks...

[-] lime@feddit.nu 2 points 10 hours ago

that sounds like actual typesetting work! i'm very surprised that you don't get access to the source. usually when uploading to a journal they want the latex source.

[-] antonim@lemmy.dbzer0.com 2 points 10 hours ago

I'm not uploading to a journal. I upload stuff e.g. to Internet Archive. When I download stuff from various databases (journals, academic repositories, Google Books), it ranges from recent publications to stuff from several centuries ago, in which case a scan is all you can get.

[-] lime@feddit.nu 1 points 9 hours ago

so in that case i'm guessing it's mostly just pdfs as containers for a series of images. that's frustrating. there should really be a better format for that kind of thing. cbz is the simplest i can think of but that doesn't really allow the same amount of metadata.

this post was submitted on 20 Nov 2025
1360 points (100.0% liked)

People Twitter

8553 readers
1854 users here now

People tweeting stuff. We allow tweets from anyone.

RULES:

  1. Mark NSFW content.
  2. No doxxing people.
  3. Must be a pic of the tweet or similar. No direct links to the tweet.
  4. No bullying or international politcs
  5. Be excellent to each other.
  6. Provide an archived link to the tweet (or similar) being shown if it's a major figure or a politician. Archive.is the best way.

founded 2 years ago
MODERATORS