NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
The Editable PDF Initiative (editablepdf.org)
laurent123456 1588 days ago [-]
> PDF has long become the de facto format for exchanging print-oriented documents on the Web, and for a good reason: it works, and reliably so!

Perhaps it's that way because it's read-only? If PDF files had to be generated in such a way that they can later be edited, things would get a lot more complex and probably less reliable.

Also it feels like it's the wrong way to go about it, because no matter what PDF editing will never be as powerful as a proper text editor. So it would be the wrong tool to collaborate on a document (because as soon as you want to do something more advanced with layout, images, etc. you probably can't). Maybe it's good if you want to quickly amend a contract before sending it, but then you need to remember that your .doc is no longer the latest version.

Basically a PDF document shouldn't be the source of truth for document editing as that would lock you to the wrong format.

pge 1588 days ago [-]
While this is mostly true, it should not be relied on. If the PDF is produced from a text document or report generator, then the text as well as the charts are easy to edit with any text editor (only requires decompressing the PDF first). Obviously different if the document is a scanned image, but just saving a Word doc into PDF does not make it a read-only file. The benefit of PDF is that it is (as the name suggests) portable, and one knows that the recipient will see exactly what was sent. With a Word or Powerpoint, formatting can show up different on different machines, fonts may not be available, etc.
pdpi 1588 days ago [-]
It's read-only in the same sense as a jpeg or an mp3 are read-only — it's a format designed for publishing, rather than editing, data.
hvidgaard 1588 days ago [-]
When you print a word document to PDF the actual text, placement, and fonts are all embedded in the pdf. It may have been designed for publishing, but it's pretty simple to edit text in it.
nradov 1588 days ago [-]
Editing stops being simple when you have to deal with reflows and boundaries. In general it doesn't really work, at least not for any complex documents.
darau1 1588 days ago [-]
Yes, and following this logic, anything can be called editable. A binary file is 'editable' with enough work. The point is that PDF is the format used to disseminate private information to multiple and disparate parties with the confidence that will all receive the same informtion, and cannot (reasonably) change that document.

That's why it's used in marketing for things like lookbooks and elsewhere for things like contracts, that are read-only by design and should never be edited by anyone but the entity that wrote it in the first place.

pge 1588 days ago [-]
In the case of PDFs, though, it is almost trivial to edit. Uncompress the PDF, and then search in a text editor for the sentence you want to change and edit it. Uncompressing takes one command, and after that it is almost as easy as editing a word doc (in your favorite editor, search and replace the text you want to change)

It's a little more complicated but not difficult to edit charts.

darau1 1588 days ago [-]
Trivial for you, impossible for the likes of marketing execs, lawyers, etc. As far as most laymen are concerned PDF's are completely unchangeable, and even those that know you can edit it, don't know how to do it themselves and tend to ask someone else. That's in my experience anyway.
blunte 1588 days ago [-]
Your argument is similar to security through obscurity. Obviously semi-technical people know how to "modify" (add objects that obscure other objects, then flatten and cleanse the output).

Really the issue is that we need a non-MS-Word editable document format that includes hash/signature features to ensure the edit/publish state of the document.

kccqzy 1589 days ago [-]
Meh. Not being easily editable is a feature, not a bug.

Sometimes you want to send out a finalized document and want to make 99% of the users unable to edit them. That's what PDFs are for. Imagine lawyers needing to send out a finalized contract. Or a graphic designer sending out the finalized design. Or an electronic book that has gone through the work of the author, the editor, and the publisher and needs no more changes. PDFs give an air of permanency and stability when so many other digital formats are malleable.

klint 1589 days ago [-]
That objection, and others, are addressed in the project FAQ[1]. It's already possible to open and edit a PDF in various applications.

[1] https://editablepdf.org/faq/

>But isn’t the whole point of PDF that you can’t edit it? No. The fact that standard PDFs are difficult to edit is more of an accident than a feature, as PDF’s roots are in printing, where only final-form documents needed to be transmitted. Many people believe PDF to be “impossible to edit,” but beware: minor edits in PDFs, such as swapping figures on an invoice, are trivial — therefore you need other technologies, such as digital signatures, to verify that your PDFs have not been tampered with. More extensive edits, however, are more difficult, as they require the document’s logical structure to be automatically detected, and this is an error-prone task.

[...]

>Are you sure we need such a editable PDF format? I believe one of the most important benefits of PDF is its concrete, solid state. The idea of Editable PDF stems from a real-world need to improve the efficiency in the way that we work with documents. Today, the only editable file formats are those native to the applications that generated documents, and none of these formats guarantees the layout to be preserved in the same way as PDF. Furthermore, despite improvements in compatibility, using a native file format still often requires the recipient to be using the same software (and often the same version) of the application, which may not be available.

PDF’s largest asset, its rock-solid visual presentation, will remain, and editable PDFs will be backwardly compatible with the current installed base of PDF viewers such as Adobe Reader and Preview.

kccqzy 1589 days ago [-]
The FAQ explains things well: making minor changes, although difficult is still possible. I have even personally changed multiple PDFs; I find qpdf to be a great tool for that. But making changes large enough to require reflowing the text is almost impossible.

I do not consider my objection addressed. If anything, that FAQ further emphasized the current difficulty of editing PDFs and they are only proposing to make things easier. So no one is really objecting to the fact that currently changing PDFs is hard. Whether or not that is desirable though, is a separate matter, one that the FAQ does a poor job explaining.

Angostura 1588 days ago [-]
> The fact that standard PDFs are difficult to edit is more of an accident than a feature, as PDF’s roots are in printing, where only final-form documents needed to be transmitted.

I disagree with this - I was an IT journalist when PDFs came out and all the blurb at the time was centred around the advantage of being able to create a document and know that it could be displayed by anyone, irrespective of machine, OS etc while retaining visual fidelity.

PDF only became a thing in the print production world quite a bit later. For many years, you were making sure your printer got the QuarkXpress Files and all the high-res asset files collected together into a single folder and zipped up.

tonyedgecombe 1588 days ago [-]
PDF was based on PostScript, a system that was created for digital printers. The big selling point of PostScript was that your output would look the same whether you were printing on an Apple LaserWriter or a Linotype.
Angostura 1588 days ago [-]
Indeed. Postscript is great for printers. The fact that the PDF was created in addition was because a comparable system was needed for human eyeballs.
hnick 1589 days ago [-]
Absolutely agreed. My day job is in the print industry so I deal with a lot of PDFs and they are perfect for print production for this reason.

Their main complaint seems to be structural metadata (this text is a heading, this text is the same font as that text on another page, so if you can change one the other should change, etc). I don't think at that point it's worth keeping PDF in the name, it'll confuse people. I certainly don't want to receive files built like that since printers tend to have memory issues with bloated files.

You can already do minor edits anyway with some knowledge (the spec is pretty easy to read) and some programming or a hex editor. The only issue I have is fonts, they are very complicated.

vbezhenar 1588 days ago [-]
This is sounds like security through obscurity.
alpaca128 1588 days ago [-]
This isn't about security, it's about making sure that a text document is formatted exactly the same on all devices, whether it's a PC, a phone or a printer.

The PDF file is reliable in that aspect while other widely used document formats like .doc and .docx are basically a gamble, even if you open the document on the same machine with the same software. The same with presentations: You just need to open the file in a slightly different software or version or encounter one of the countless bugs and suddenly you have a picture overlapping with text.

blunte 1588 days ago [-]
By your argument, PDF fails. I can provide examples of PDFs that look different on different devices.
kccqzy 1588 days ago [-]
PDF gives you all the tools to make sure that doesn't happen. No one prevents you from shooting yourself in the foot by writing a bad PDF that doesn't adhere to common good practices. For example it is technically possible to make a PDF with text without embedding fonts. Doing that is a bad idea.

By your argument, any programming language, no much how safe it purports to be, fails because they allow you to write bugs.

blunte 1587 days ago [-]
This was about the claim that PDFs provide consistent/identical document representation on every device.

Few programming languages claim to provide bug free programs, so your example is irrelevant.

MaxBarraclough 1588 days ago [-]
> This isn't about security

kccqzy's comment was clearly referring to security.

chungy 1589 days ago [-]
Agreed, 100%.

Literally the only reason I make and send out PDFs is because they're effectively read-only. (It's not really perfectly so, but nobody should claim it's a tamper-proof document...)

dvdkon 1588 days ago [-]
So if you didn't care about the recipient editing the document, you'd send what format? Pretty much anyone can view a PDF and see it as intended, not so much with DOC, DOCX, ODT, INDD...
MaxBarraclough 1588 days ago [-]
> Sometimes you want to send out a finalized document and want to make 99% of the users unable to edit them. That's what PDFs are for. Imagine lawyers needing to send out a finalized contract.

That is not what PDFs are for. PDF is, well, a Portable Document Format. It is not convenient to modify a PDF, but PDF is not securely resistant to modification (discounting its cryptographic features [0]). Its resistance to modification is a side-effect of its design, not a primary goal.

An attacker will be able to modify your PDF. This gets easier every year, as we'd expect. That doesn't matter, though, as an attacker can always recreate the document, with whichever changes they wish. (Again, neither of these attacks will work if you use cryptographic signing.)

If you want secure assurance of authenticity, you use cryptographic signing. No excuses. If you're a lawyer, I'd hope you aren't placing any stock at all in the inconvenience of modifying a PDF.

[0] https://acrobat.adobe.com/uk/en/sign/capabilities/digital-si...

AnIdiotOnTheNet 1588 days ago [-]
Anyone else remember all the times a government released a censored document only to have someone discover that they could just remove the black bar layer and see the original text with hardly any difficulty at all?
jjohansson 1588 days ago [-]
This is not because PDFs can’t be redacted, but was instead caused by crappy redaction software.
rapsey 1588 days ago [-]
Great feature
pdpi 1588 days ago [-]
PDF is a publishing, rather than an editing format. It belongs in the same bucket as .mp3 and .jpg, rather than the .doc and .psd bucket.

This is not about how easy or how hard it is to modify a pdf, it's about the intended purpose. The fact that it's meant for publishing means we get to optimise it as such, both in terms of simplicity of the format itself, and in terms of the tools that interact with it. This makes consistent-ish rendering much easier. The features that would enable the format to be "editable" are also the sort of features that make consistency hard.

DannyB2 1588 days ago [-]
I think of PDF like PostScript but not Turing complete.

I think of PostScript as an "ink on paper" format.

While you can take apart the PDF / PS format, dictionaries, etc. It's not a high level representation format, like a word processor. It's a way of specifying how to draw vector shapes onto "paper".

social_quotient 1588 days ago [-]
Totally agree here. It would be nice to edit pdf the same way it would be nice to edit a jpg. But it’s the wrong tier to operate on. What people really need is more consistent access to source files aka design files. The only time I feel stuck needing to change a pdf or jpg is when I don’t have access to the “design file”.

If reports that come out of systems need to be edited they should dump to excel or word and not pdf.

While the P in PDF means portable I think it’s better thought of as “published” as in “published document file”.

wolrah 1588 days ago [-]
Exactly. PDF is digital paper. The "print to PDF" metaphor is perfect. If you've printed something and decide you want to change it, you don't grab white out, you open the source file up and change it then print again.

If editing PDFs is something you find yourself needing to do regularly, something is very wrong with the process that's leading to this. It may not be your fault, it may be an upstream party who should be providing you with the source material, but either way making PDFs easier to edit is not the correct solution.

blunte 1588 days ago [-]
Consistency can be solved with hashing and cryptographic signatures.

The user story here is that PDFs get sent around as forms to be filled out, and that poses a problem for non Mac users or users without sufficient technical skill.

And since you reference mp3 and jpg, you surely know that both formats can be modified in ways that many people will not recognize as modifications. It just pushes the skill level up a bit. But there's always a technically capable person available for hire to modify one of the "permanent" formats you mention.

brokenmachine 1588 days ago [-]
>PDFs get sent around as forms to be filled out, and that poses a problem for non Mac users

Why is that a problem? I'm pretty sure there's a PDF reader app for every major platform.

blunte 1587 days ago [-]
Not all readers allow annotations or changes, so forms cannot be filled out on the computer. Instead they have to be printed, hand filled in, then scanned again.
m-p-3 1588 days ago [-]
Agreed, and I prefer the LibreOffice approach to embed the original file within the PDF if the author decides to.

This doesn't break the simplicity of the PDF, while making it easy to edit.

ropiwqefjnpoa 1589 days ago [-]
I'm not going to say the PDF format is perfect, but I do like sending out documents knowing that in general they are going to be opened by a reader in a presentation format. I'd rather they not be opened by an editor, where my viewers are immediately invited to start making changes like a word doc. I suppose having them open initially in a non-editable mode would work, Acrobat functions that way.
klodolph 1589 days ago [-]
It’s phenomenal for so many use cases it’s absurd. I have a huge stash of PDF files—tons of articles that I’ve saved so I can refer to them later. Meanwhile, half the links I have to blog posts are dead, if not more. I know that I’ll be able to read these PDFs 10 years from now, or 20 years from now. Plenty of them are 10 or 20 years old.

PDF is great for:

- Archiving. It’s self-contained and will work 20 years down the line.

- Math. Anything with equations.

- Printing.

I’ve tried various techniques to archive web pages with varying degrees of success. With PDFs I don’t need to think about it.

lawry 1588 days ago [-]
You might want to take a look at the singleFile chrome/firefox extension. It essentially combines all images, scripts, css into one html file that can be opened offline. As a bonus you can edit the html page with the inspector to hide irrelevant parts and it will save it in that state.
LordDragonfang 1588 days ago [-]
How is that different than mhtml, which is already the default file type for ctrl-s on Chrome?
ropiwqefjnpoa 1589 days ago [-]
Yeah, now that nearly everything prints or saves to pdf format, I have little use for a direct editor. Just make sure my PDF's always open and always look the same.

What I would like, is to make PDF's easy to markup with highlighting, circles, comments etc. Currently, even in Acrobat, it's not very intuitive.

rusk 1588 days ago [-]
I find "Preview", standard PDF viewer for the Mac to be pretty good for annotations. I "think" it might be available under acrobat for windows but it's a bit more obscure.

Certainly, between me a mac user, and colleague using Linux I was able to provide feedback on their documentation in this way ...

scarejunba 1588 days ago [-]
Interesting. I recall saving as MHTML when I was young but it looks like that's not a thing now.
yeahtruck 1588 days ago [-]
Internet explorer 5.5, or whatever, supported mhtml flawlessly, but then, Mozilla, or Netscape, it was called, didn't. So unless you're on ie 5.5 for life you're skits out of luck. Mhtml was flawless afaik but nothing else than ie 5.5 or whatever supported it. Mozilla which was Netscape sure did not. mhtml was the best thing. Except whenever using anything but ie 5.5 that is/was.
adrianN 1588 days ago [-]
On the mac I found Skim to be pretty good for annotating PDFs
alpaca128 1588 days ago [-]
Also, presentations. It's pretty calming to know that every slide will look as intended, without anything misplaced or missing.
Nemo_bis 1588 days ago [-]
Unlikely to be true unless you use PDF/A.
jjohansson 1588 days ago [-]
Even with PDF/A, rendering fidelity is greatly impacted by the viewing software. Especially if that viewer is based on open source, where only a fraction of the PDF spec is implemented.
efls 1588 days ago [-]
I tend to save a lot of online articles I want to read as PDFs, but found that on many "modern" sites printing as PDF only saves the first page. What's your work process? Do you run into similar problems?
kick 1589 days ago [-]
Why not just archive.org or CTRL-S?
klodolph 1589 days ago [-]
- archive.org - lots of stuff was never archived, or is not available due to /robots.txt

- ctrl-S - not self-contained, compatibility depends on which browser you use (Safari at least gets it right and puts it in a single file, but then other browsers can't open it. Firefox usually saves it correctly and portably, but now you have multiple files, and you can't rename them because they have references to each other, which makes it hard to organize. Then there's all the JavaScript that web pages sometimes have, which can break in the archive.)

If a PDF is available it's easily better than these alternatives. In general I save webpages I really want to refer to later by copy-pasting the text out and manually reformatting as Markdown, or sometimes with wget.

1589 days ago [-]
jjohansson 1588 days ago [-]
Didn’t archive.org at some point say they would ignore robots.txt? I wish they would.
ropiwqefjnpoa 1589 days ago [-]
CTRL-S might download 10's of files and folders just to save one webpage. And then the html page when opened, starts running scripts and calling home, it's a mess really.
taneq 1588 days ago [-]
Exactly. Even if it's now fairly easy to edit a PDF (although the obvious approaches like opening them in Word still often visibly mess with the formatting), sending something as a PDF is a signal to the recipient that it's not meant to be changed.
IvanK_net 1588 days ago [-]
I have been working on a PDF editor for several years. It is available inside my photo editor https://www.Photopea.com (press File - Open - choose a PDF file). People open 7 000 PDF files in it every day.

Often, a PDF contains just a single raster bitmap with the whole content rasterized. Also, text is often converted to vector shapes, which also makes it non-editable (as text). But it can open / save PDFs from Google Docs and other editors quite well.

piadodjanho 1589 days ago [-]
The PDF file format is anachronous.

When the format was created, computers only had a few KBs of RAM. Yet the format should be capable of editing documents with thousand of pages. The format solves this issue by delegating the memory management to the user.

Also, the file was made with the assumption it was suppose to be printed, not shared. It is easier to hide parts of the document instead of removing the data.

A funny trivia. The PDF is suppose to be read from the end of file. That's why some documents need to fully downloaded before they can display the first page. Of course, nowadays most PDF are linearized and load, at least, the first page right away.

Over the years specification got so complex it became very hard to implement a minimal editor, viewer, parser or generator. If the format was simpler, it would be possible to make "save as PDF" more accessible.

I've other issues with the typesetting and the way color is handled (it has a printer first approach), but I think this post got too long already. I just want to point out the spec supports so many pointless features such drawing in 3D space, movies, audio, HTML support, etc.

Finally, I don't understand why most people are against a revision on the PDF format despite clearly having very little knowledge on how it works. I think multi person edition of the same entry with some version control can be useful. By the way, the format kinda let many people edit the document at once, as long as they are not working in the same part.

kccqzy 1589 days ago [-]
> When the format was created, computers only had a few KBs of RAM. Yet the format should be capable of editing documents with thousand of pages. The format solves this issue by delegating the memory management to the user.

That's a good decision. Make the file format versatile and powerful. Don't constrain it by the limitations of contemporary hardware.

> Also, the file was made with the assumption it was supposed to be printed, not shared. It is easier to hide parts of the document instead of removing the data.

I agree it's made with the assumption of being printed, but that's part of the appeal—preserving visual fidelity of how the document looks. You can't send people a docx and expect them to see the exact same thing on their screen down to every detail.

And no it's not difficult to remove data. If you know exactly what to remove, it is quite easy to remove things. To remove text, find the Tj or TJ operators, remove them and their arguments. To remove an image, find the Do operator (occasionally BI, ID, EI) and remove it. You might have to perform decompression before doing that. For images, you might have to run another pass to delete the referenced object. But all these are all very easily automated.

> Over the years specification got so complex it became very hard to implement a minimal editor, viewer, parser or generator. If the format was simpler, it would be possible to make "save as PDF" more accessible.

The reason "save to PDF" is difficult to implement from scratch is not because of its complicated specification. Indeed parsers are quite easy to write. The real reason "save to PDF" is difficult to implement is because PDF wants visual fidelity; that comes at the price of specifying where exactly text should be placed, all the way from how paragraphs are flowed to how kerning of the letter is to be handled. Most applications do not care about these details. Most developers hardly have any interest in understanding line-breaking algorithms or interpreting font files to produce the right offsets and glyphs (think ligatures). These things are, rightfully, way beyond the business domain of typical applications and beyond the knowledge of typical developers.

piadodjanho 1588 days ago [-]
Given the constraints of the time when the format was conceived, the pdf format has a great design. I think, embedded designers should have a quick look on the PDF format to learn some tricks on how reduce unneeded memory accesses -- it basically implement a directory inside a file.

With the entry removal example, I was trying to show the format was not meant not to be shared. I know it is possible to remove data in other ways, and that probably every modern editor removes the data correctly. But it was not how the format itself deals with it. Of course, hiding entries with the flag had others uses such only print only the pages you currently working on without having to rescan the whole file.

I agree most devs don't have interest in learning how to do typesetting. But also, typesetting is quite complex by itself, specially when dealing with non western language. Luckily, projects such Harfbuzz (nowadays, hb is used even my emacs) makes it a lot easier.

Like I said in my original post, the format is anachronous. I don't think the format is intrinsically bad, I just think the format is not right for your time. I think we can do better nowadays.

PS: I've been thinking, it would be pretty cool to talk with the engineering team that worked on the first spec, and actually know what they were thinking back them and what they would change in it nowadays.

maest 1588 days ago [-]
> If you know exactly what to remove, it is quite easy to remove things.

I think you are severly underselling how difficult it is to do these things.

1. You need to be a specialist in how the PDF format works.

2. From experience, it's not trivial to have logic that correctly handles all possible formatting cases in a PDF. 2.

tonyedgecombe 1588 days ago [-]
Microsoft's XPS format solved many of the issues with PDF. Unfortunately it was too late and came from the wrong people so didn't succeed.

As it's a zip file it also needs to be read from the end although it can be linearised as well.

Mikhail_Edoshin 1587 days ago [-]
Yep, XPS is a very clean format. And it seems to be easy to add support for it because of this simplicity. E.g. I normally use Sumatra PDF viewer on PC and MuPDF on Android and they both view XPS just fine.
rusk 1588 days ago [-]
> computers only had a few KBs of RAM.

I don't think this is right. Postscript maybe ... but PDF in my experience came about in the 90s, when computers typically had between 4 and 16MB of RAM ...

piadodjanho 1588 days ago [-]
You are probably right. But my point still stand, the format was made to deal with files much larger then the memory.
rusk 1587 days ago [-]
Probably?

No, exactly.

gpvos 1588 days ago [-]
The main reason making a "save as PDF" is hard is the font handling. The rest is fairly straightforward.
BEEdwards 1589 days ago [-]
I think it's funny that the top two comments of this post are diametrically opposed, yet I kind of agree with both of them.

The PDF is a terrible format, yet if I'm sending an email with an attachment I want you to see exactly how it looks on my computer then I'm exporting to PDF.

However if your book is only available as a pdf I'm probably going to skip it.

PDF is good for short things, a contract maybe. The best use case is forms which this doesn't really talk about but seems to address, the web has basically solved it, but there are times you want to send people a form to fill out that you don't want the formatting to be go wacky on, but still need to be editable.

PDF can do this but isn't good at it, this seems to take that not good and make it good.

enriquto 1588 days ago [-]
> However if your book is only available as a pdf I'm probably going to skip it.

Wait, what format do you expext a book to be? I mostly skip any book that is not on pdf

Terretta 1588 days ago [-]
Something that reflows when you change text size, font family, or page orientation.
enriquto 1588 days ago [-]
I am not sure that we can call a text that has not yet been typeset a "book". In any case, do you know any such format that does not completely botch equations? The closest I can imagine is html with mathjax.
contravariant 1588 days ago [-]
If you want a standalone HTML solution I reckon you could embed the formula as SVG. Or maybe you can somehow get MathML to look nice, but so far that seems tricky without using JavaScript to help (which might not be supported/desirable in a document format) and I'm not sure about the state of MathML support in e-readers.

One of the problems with including mathematical formulae in a reflowable document format is that the concept of reflowable mathematical notation simply doesn't seem to exist, so In practice you'll end up with something equivalent to a picture.

aurbano 1588 days ago [-]
eBook formats can do that, but they may struggle with math notation - I haven't read any mathematical work on ebooks.
enriquto 1588 days ago [-]
I once did. I'm still in therapy to recover from the trauma.
majkinetor 1588 days ago [-]
You can also reflow with Foxit PDF Reader FYI (on all platforms)
enriquto 1588 days ago [-]
it does not seem to be available for openbsd. Do you have a link? (or at least to the source code)
mung 1589 days ago [-]
If you find PDF painful because you can't reflow text or edit it, newsflash: you are using the wrong format. Industries that use PDF extensively: legal and printing. Neither of them want to be able to change documents.

To preempt: I work within printing, yes there are tools to hack into PDFs and make certain alterations or fixes, but it's to get you out of a bind only, it's not a normal healthy workflow.

jstewartmobile 1589 days ago [-]
This is a mistake. The hard-to-edit, assembly-language-like nature of PDF is a feature, not a bug.
smacktoward 1589 days ago [-]
The FAQ addresses this objection: https://editablepdf.org/faq/
roenxi 1589 days ago [-]
The FAQ addressing an objection doesn't mean that the objection is overcome.

If I send you a PDF the point is that I don't want you to edit it. Otherwise I would have sent a docx file. An 'editable PDF' may as well get lumped in to the OpenDocument standard. It is already universally editable and I'm sure it has support for adding application-specific metadata.

Having a PDF editor isn't some violation of sacred principles; but making changes to the standard to make it easy to edit is not improving the situation. I want to send a format that is hard to edit.

chrismorgan 1589 days ago [-]
Yet this perception and the fact that it’s difficult does discourage editing. It’s kinda like security by obscurity: it’s not “real” security, but it does work, for the most part.
jimueller 1589 days ago [-]
The FAQ doesn't address it completely in my opinion.

> Many people believe PDF to be “impossible to edit,” but beware: minor edits in PDFs, such as swapping figures on an invoice, are trivial — therefore you need other technologies, such as digital signatures, to verify that your PDFs have not been tampered with.

That's not really the point that it can't be edited, the value to me is that the sender has confidence that it will look the same to the receiver as the sender.

pacaro 1589 days ago [-]
And then of course there is the deliberately edit hostile way that PDFs can be (ab)used, by printing a document and then scanning it to PDF. In my experience some lawyers like this because it means that to suggest edits you have to retype or OCR entire paragraphs
kccqzy 1589 days ago [-]
I used to do a variation of this by converting all text to outlines in a PDF. It will even deter more people from editing the document. This also saves paper, and text will remain sharp (although slightly different because operating systems have different heuristics when it comes to rasterizing text and graphics).
dragonwriter 1589 days ago [-]
> That's not really the point that it can't be edited, the value to me is that the sender has confidence that it will look the same to the receiver as the sender.

Different PDF reader software, and even sometimes the same reader software installed in different environments, can render PDFs, especially those containing any text differently.

If you want confidence that it will appear the same, pure image formats are a safer bet.

gpvos 1588 days ago [-]
I'd say https://xkcd.com/927/ to that.
9nGQluzmnq3M 1589 days ago [-]
I'm going to add an unsolicited plug for PDFEscape, which effectively lets you "edit" any PDF: https://www.pdfescape.com/

It's an online service that lets you upload PDFs, then edit fields, add text, upload and paste images like your signature, etc. Perfect for filling out tedious paper application forms without having to deal with printing & scanning.

I have no connection other than as a satisfied user, and in fact I have no idea how they make money, since the free mode features suffice for every use case I've had.

scrollaway 1588 days ago [-]
I use Master PDF Editor (https://code-industry.net/masterpdfeditor/). It's not free, but it's not terribly expensive either and you can probably get it expensed depending on your job.

It also does PDF editing perfectly. I really hope there will be some open source version of it at some point. Or that someone's working on one.

calvinmorrison 1589 days ago [-]
SO to apples PDF veiwer. I don't own a Mac but I use my coworkers. Take a paper write your signature, then the webcam will scan, de crust and add your signature on a doc!
lxgr 1588 days ago [-]
The proposed way to achieve editability sounds like it is inherently at odds with the page description model of PDF, which is in turn exactly what gives it its stable output on different platforms.

A PDF renderer basically needs to be able to rasterize fonts and paint glyphs on a page/screen – that's it. Layout, spacing and even kerning are left to the producing application.

The project mentions the lack of robustness inherent to web-based document formats, but I'm afraid that any alternative would either be severely limited in the range of achievable output documents or would end up reinventing the wheel.

As an analogy: SVG has been around for a while, and yet we still use PNGs and I don't see them going away anytime soon.

Maybe what we really need is just more widespread support of ePub, and maybe some extensions for more "document-like" (instead of book-like) functionality in editors for it, and potentially support for an embedded rendered PDF for layout stability?

burtonator 1589 days ago [-]
The fact that PDF is immutable is a huge advantage.

In Polar we have taken the perspective that immutability is an advantage and is going to be the basis for our group collaboration around documents.

We ended up building out annotations on top of PDF including text highlights and area highlights which can then be commented on:

https://getpolarized.io/docs/annotation-sidebar.html

Some of our users keep asking for editable documentation and I think the main win here could just be using markdown which I'm thinking about adding.

The biggest thing that's needed though, for scientific use, is latex. Fortunately, there are plenty of markdown implementations with latex support.

PDF is amazingly good for printing documents but honestly 90% of the complex printing requirements aren't needed for regular use.

bloak 1588 days ago [-]
It sounds like what they want is a bit like what you get with a word processor provided that everyone is using the same version of the same program on identical systems, so you don't have the current situation of the layout getting completely broken because different people have different fonts, different paper sizes, and so on. In which case it's an interesting idea, but they shouldn't call it "PDF".

Although it's an interesting idea, I suspect it will never work in practice because word processing is just too complex. There are just too many complex features that people expect to have available. Different implementations will never be sufficiently compatible. Perhaps the solution is to bundle your document with a WebAssembly binary of a particular version of LibreOffice? OK, maybe you could separate the rendering functionality from the UI stuff, but it's hard to see how in practice you could get documents to be editable and rendered in the same way everywhere except by having everyone run the same binary to do the rendering, and there will inevitable be a hundred versions of that binary in use as new features get added.

lars-b2018 1588 days ago [-]
PDF is great because of its ability to present a print oriented view of any type of information, packed in a document container in an efficient manner. This is the design goal of the format. It is the source application's responsibility to manage the semantics of the document scope, where edits to the represented information can potentially cascade across the document in non-trivial ways (think Excel for example). PDFs CAN be edited today, but those edits are made by tools that just change the visual layout vs. the information structures represented by the document. It's a rather difficult problem to overcome if the PDF format now must contain rules about the underlying information structure itself in order to maintain a consistent representation in the document.
cm-t 1588 days ago [-]
As far I know, LibreOffice ('Draw' if i remember well) allow you to graphically edit PDF (xourjal too, but not as rich as LibreOffice)
dwheeler 1588 days ago [-]
LibreOffice does let you create editable PDFs. They do this with a very elegant solution, they embed the Open document format within the PDF. This takes very little additional space, because the Open document format is compressed. I think the LibreOffice Solution is quite elegant; Open document format is already a standard, we don't need to create another one. And most important, it works today, right now.

It would be a lot of effort to create a document format with the kind of richness that PDF supports. I am dubious it would be worth it.

I think most people do not need an editable PDF in the first place, so this is a minority problem. If you do want this, for most people there is already a working solution... just store Open document format within PDFs.

nxpnsv 1588 days ago [-]
I prefer my PDF static, my meticulously edited LaTeX would be ruined by sticky fingers. However, something I much would like is better copy to clipboard from PDF. Non trivial input with tables and line breaks turns in to indecipherable alphabet soup...
kccqzy 1588 days ago [-]
Check out https://www.ctan.org/pkg/accsupp which may be helpful. I've only used it a handful of times on small amounts of text though.
nxpnsv 1587 days ago [-]
Cool, I’ll try it out. although typically I only need to copy other researchers work.
diegof79 1589 days ago [-]
Adobe Illustrator files (.ai) are PDF compatible files, so you can view them with a PDF reader like preview. The file still contains all the data to be edited in AI. I guess it means that PDF format is already designed to hold extra data that can be used for editing. But since pdf has many use cases, I don’t think that it will change much for editing. You still need a tool compatible with the original editor. However it will be interesting if docx like ai files could be displayed in a pdf viewer, it will save a lot of time dedicated to export/save as pdf.
VvR-Ox 1588 days ago [-]
Wow this is an awesome idea!

While editing PDFs on Linux for me was always connected with pain I also had no joy using a plain macos for this. While the preview app is able to do some things it cannot do others that matter.

I wanted to copy some text just yesterday - while I could select and copy it I could not insert it as a text again in the same application.

To have to use some extremely overpriced adobe product for sometimes doing tasks like this is overkill and really unnecessary.

To all the people who like PDF because you cannot edit it like you want: This is the "obfuscation argument" because anyone who has the right tool or googles for 10 min. can somehow edit PDF - it is just a real pain to do so most of the time and the result may look like the patched overhead transparencies we saw back in school in the earlier days.

superkuh 1588 days ago [-]
> If anyone constructed a PDF, which was itself blank but, via embedded JavaScript, loaded parts of itself from a remote server, people would rightly balk and wonder what on earth the creator of this PDF was thinking — yet this is precisely the design of many “websites”. To put it simply, websites and webapps are not the same thing, nor should they be. Yet the conflation of a platform for hypertext and a platform for applications has confused thinking, and led developers with prodigious aptitude for JavaScript to mistakenly see mere websites of text as a like nail to their applications hammer.

This quote was supposed to be an absurd hypothetical. But I guess we'll live to see it in reality.

Meph504 1588 days ago [-]
I think this effort is misguided, they are attempting to take something that has a specific purpose and does it well, and subvert it into something that other applications and formats do well.

Pdf's aren't promoted as a portable editable format, but a portable, sharable, and archival format.

Why promote PDF over ODF? Is the issues of document reflow, of an editable document such an issue that they need to develop a new set of tools, and change the structure of PDF to resolve the issue, if that is the case, it seems they could contribute to resolving the issue in ODF?

runxel 1588 days ago [-]
Is this some weird kind of satire I don't understand?

Being read-only is the thing why we have PDF in the first place. If you want to do changes, go back to the program where it came from. It's simple as that! :)

tom_mellior 1588 days ago [-]
I've often needed to fill in PDF forms that were not using PDF's fillable form fields, or to paste in a scanned signature. This is an important use case for PDFs, and the people who send you these forms expect you to have a printer, nice handwriting, and a scanner. Instead you can do the whole thing in software, with only standard Ubuntu packages, but it can be painful for multi-page documents: I typically have to use a sequence of pdfseparate/edit each page in isolation/pdfjoin.
lxgr 1588 days ago [-]
PDFs specifically support the form use case (I remember using this a few times for things like applying for credit cards etc). That was on Ubuntu too!

Inserting a scanned signature is also not a problem at all these days, and even fits the PDF model quite well.

tom_mellior 1588 days ago [-]
> PDFs specifically support the form use case

Yes, but the person creating the PDF must know this, and must know how to do it with the software they have. In practice, many forms I've seen came out of situations where this was not the case.

> Inserting a scanned signature is also not a problem at all these days, and even fits the PDF model quite well.

It's not a problem to open a one-page PDF in Inkscape or the Gimp and paste a signature in there. That's what I said. It gets tedious with multi-page PDFs. Do you have a better solution for this?

howard941 1588 days ago [-]
This is one area where Acrobat (The Full Version) shines. It has a typewriter mode allowing adding arbitrary text set in arbritary fonts to a preexisting .pdf as easily for a single page pdf as for a 100 page pdf.
zcrackerz 1588 days ago [-]
MacOS's built-in Preview app can do this. It's fairly limited, but it supports text boxes and signatures, which I have found very useful for not needing to install custom software.
lawlessone 1588 days ago [-]
I don't want to give recruiters the ability to edit my CV.
specialist 1588 days ago [-]
PDF forensics would be nice.

I used to write print production software. I'm no stranger to PDF.

I recently had to fill out a PDF form and send it back. It took me way too long to figure out the "form elements" were just images. I kept trying to use different clients, thinking the content creator must have used some poorly supported corner case of the PDF spec.

So I printed the frikkin PDF, wrote on it, scanned it, and sent it back.

What could be easier?

campfireveteran 1589 days ago [-]
At first, I thought it said edible PDF, but I didn't think a closed format would taste very good.
nuclx 1588 days ago [-]
That's like requesting that executables should include their source code.
Mikhail_Edoshin 1587 days ago [-]
Switch to XPS; it's a very clean format, much easier to work with than PDF. And it's already supported by many apps, maybe not as widely as PDF, but pretty well.
aabbcc1241 1588 days ago [-]
If you want consistent display and editable format, why not just use HTML with standard css?

You also get enhanced accessibility. (I often need to reflow the pdf when reading from mobile device)

Mikhail_Edoshin 1587 days ago [-]
Generally it's a very confusing initiative, very much like asking that text in screenshots could stay editable and buttons clickable. Why?
okaleniuk 1588 days ago [-]
I misread it as edible PDF and thought wow! technology sure went a long way!

It would be nice to have an open standard for 3D food printing though.

tingletech 1589 days ago [-]
2018
Ididntdothis 1589 days ago [-]
I would prefer a “kill PDF” initiative :). PDF is a terrible format for almost all purposes it’s used for. It doesn’t adapt to screen sizes, is hard to parse and loses a ton of information from source documents. I don’t think we could have picked a much worse file format for widespread use.
jolmg 1589 days ago [-]
What's a better format for displaying and printing documents consistently and with selectable text?

> It doesn’t adapt to screen sizes

That is a feature. I expect my PDFs to display with pixel-perfect consistency everywhere.

There are other formats that adapt to screen sizes. HTML is good for that, if we ignore how people break that with styling.

Ididntdothis 1589 days ago [-]
“printing documents consistently and with selectable text”

It’s fine for that purpose but it’s terrible for eBooks, manuals, science papers and a lot of other stuff it’s used for. Some HTML with everything in one file would be much better in my opinion. Something like CHM maybe which MS used to use for help files.

mcswell 1589 days ago [-]
We've printed several grammars of languages south Asia from PDFs, and we've also used the PDFs in the absence of the printed grammars. Some of these languages use the Arabic script, generally the nasta'liq version. The Arabic language usually uses the naskh version of the Arabic script, and it can be represented reasonably well with any of a variety of fonts. But nasta'liq is a different beast, and until very recently typesetting it was virtually impossible. (Urdu newspapers were hand written by calligraphers, and reproduced using photo-offset, well past the milenium.) There are now reasonably good nasta'liq fonts (SIL offers a good one), which we used to produce our PDFs (and then embedded the font in the PDF for portability). You still won't get good nasta'liq in HTML unless you happen to have the right rendering engine + the right font.

In short, I don't think we could have typeset these grammars without PDF.

We also did a grammar of Dhivehi, which is the only language in the world that uses the Thaana script. Thaana can be typeset quite easily--if you happen to have the right font. Most people don't. I guess the same thing holds if you happen to be publishing grammars of languages that used cuneiform--not many computer systems have cuneiform fonts!

applecrazy 1589 days ago [-]
> some HTML with everything in one file

You just described the ePub format. It’s a ZIP file with HTML docs in it.

Ididntdothis 1589 days ago [-]
I didn’t know that but ePub is great for ebooks and manuals.
happytoexplain 1589 days ago [-]
I would argue that PDF is nightmarishly bad - truly one of the worst formats ever created on almost all fronts - but it's the best and most featured option for its use case. Depressingly, I find this is true about a lot of things. I don't know what the answer is.
Spooky23 1589 days ago [-]
I’ve met a few people who feel passionately the way you do, and don’t get it.

I worked with archivists on a few projects and never appreciated the dumpster fire that electronic documents presented.

PDF is an amazing thing as you get an expressive format that preserves look, feel and content and will likely do so for the foreseeable future. Just the fact that the US Federal courts standardized on PDF for most filings will ensure that it is a viable format for decades or more.

Ididntdothis 1589 days ago [-]
Problem is that PDF does not preserve content in a machine readable format. It’s a one way street. Once converted to PDF you can’t convert to another format without losing a lot of content and formatting.
Spooky23 1588 days ago [-]
That’s like saying that a spreadsheet is no good because it isn’t machine readable.

PDFs are often display focused and difficult to parse, but it’s certainly possible to do so.

It’s success in the market as compared to a edit focused format like ODF underlined how important display consistency is.

e1ven 1589 days ago [-]
That's exactly what I like about it. My ideal PDF is essentially a PNG file with selectable/searchable text.

It's a great WORM format. Every added feature makes it worse.

jolmg 1589 days ago [-]
Why is it desireable for it to not be machine readable? What could possibly be the advantage in that?
e1ven 1589 days ago [-]
Because I don't want anything to try to reflow the text, or adjust the kerning, or modify to use system fonts.

There are great systems for those already.

When I want a PDF, it's because I want a format that I know is always going to look the same.

A PDF is a great archive format. It's perfect for a scan of a document, or a printout.

I never want my viewer to add anything to it, I never want it to detect anything, I never want it to adjust anything.

Just render it exactly the same way, every time.

jolmg 1588 days ago [-]
One thing doesn't imply the other. The format could be machine readable and still be pixel-perfect consistent. It could also allow reflowing, adjusting the kerning, or use system fonts even if it's machine unreadable.
Spooky23 1588 days ago [-]
It is machine readable, just not readily machine malleable.

I worked on a project where we were digitizing and cataloging various records. It was less challenging to do this with papers from the British colonial administration from the late 1700s, than to decipher certain 1980s documents written with a defunct word processor. PDF is a compromise that helps address that issue.

I would not recommend maintaining your general ledger in a PDF. But an annual report that may be referenced for decades is a great example of why a PDF is a useful format.

naniwaduni 1589 days ago [-]
This is true of practically all formats.
Ididntdothis 1589 days ago [-]
It’s much truer for PDF than for other formats. The only format I could think of that’s worse would be plain images.
kick 1589 days ago [-]
djvu will continue to be a better format in every conceivable way long into the future.
segfaultbuserr 1589 days ago [-]
microcolonel 1589 days ago [-]
The claim that text is selectable in PDFs is often dubious.
klodolph 1589 days ago [-]
The problem is that text selection relies on the PDF generation to be done in some kind of sensible fashion. There are so many ways to generate PDFs, and in some of them, the actual text is mangled or its order is mangled before it gets to the PDF generation step itself.

But in general, if you generate the PDF with an authoring tool like LaTeX or InDesign, or if you print to PDF from a webpage or document, it's going to be selectable in a sensible way.

abrookewood 1589 days ago [-]
Not sure about other apps, but the paid reader I use (1) includes an OCR function that adds a text layer to the document. Seems to work pretty well. (1) https://www.tracker-software.com/product/pdf-xchange-editor
jolmg 1589 days ago [-]
Well, I meant that the format supports it. I merely mentioned it so nobody would reply with something like PNG.
danso 1589 days ago [-]
Some documents aren't meant to adapt to screen sizes gracefully, particularly ones where layout and typesetting are important/mandated.
happytoexplain 1589 days ago [-]
Yes, but an adaptive format could allow static layout, while a static format inherently can not allow an adaptive layout.
danso 1589 days ago [-]
And what is this adaptive format? Is it HTML? The FAQ addresses why this initiative is skeptical of using HTML:

https://editablepdf.org/faq/

> Why not use web standards, such as HTML/CSS?

> We do use the relevant parts of HTML and CSS, where appropriate. But web standards do not provide for specification of the layout of the document in a robust way, which is guaranteed not to reflow when opened on other systems. Furthermore, browser technologies are a moving target, with implementations changing very rapidly. Therefore, they do not provide a suitable basis for archival documents.

microcolonel 1589 days ago [-]
Indeed, advance needs to be measured in the integer units of the font, if you want stable line breaks.
fortran77 1589 days ago [-]
I use PDFs all day long. It's great. I know people will see documents the same as I do.
lxgr 1588 days ago [-]
> for almost all purposes it’s used for

So we should kill a format because it is used for something it was never intended for?

How is the lack of a widely compatible, self-contained markup-plus-resources format PDFs fault?

marcus_holmes 1588 days ago [-]
Also, can we have an open standard for this, please? Not a proprietary format owned by someone. Just because they're not currently charging for it, doesn't mean they can't.
UglyToad 1588 days ago [-]
As far as I'm aware PDF is an open standard? Happy to be corrected if wrong but the standard (2.0) is now published by ISO, which annoyingly enough means they charge for it which at least Adobe weren't (<2.0).
marcus_holmes 1588 days ago [-]
ahh, my bad then. I thought it was still owned by Adobe.
xvilka 1589 days ago [-]
Well, the idea behind PDF is good, just revamping the format will help a lot - remove the unnecessary features, ambiguities, remove old and legacy features, etc.
bscphil 1589 days ago [-]
If what you want is "PDF the way it should be", we already have that, it's PDF/A. https://en.wikipedia.org/wiki/PDF/A
xvilka 1588 days ago [-]
With PDF 2.0 and corresponding PDF/A 4 they did steps in the right direction, but there are still droves of unnecessary and ambiguous features.
klodolph 1589 days ago [-]
I'm not sure that is so necessary. PDF is kind of a cleaned-up version of PostScript to begin with. Which features or ambiguities would you remove?
tonyedgecombe 1588 days ago [-]
If you were starting from scratch it would look quite different, the obvious change would be Unicode only. In fact it would probably look much like XPS.
klodolph 1588 days ago [-]
Isn’t the main problem with XPS just software support, though?
tonyedgecombe 1588 days ago [-]
That's huge though, you need to know you will be able to open your documents in a decade or more.
klodolph 1588 days ago [-]
I don’t understand your position at all.

If you want to open your documents in a decade or more, redesigning a “cleaner” PDF would only make that less likely. If you want something cleaner than PDF, then XPS is already here. I don’t understand what scenario we’d have where designing a completely new format would give us better software support. So, the reason I’d see for designing a new format is if neither XPS nor PDF are good enough for some application.

tonyedgecombe 1587 days ago [-]
I don't think there is a solution, PDF is with us for the foreseeable future. That doesn't mean I like it as a format or can't point out the problems with it.
leni536 1588 days ago [-]
PDF is the most complete open vectorgraphic format. SVG comes close, but client support is all over the place. Browser support is attrocious.
1589 days ago [-]
saudijews 1588 days ago [-]
Isn't that because the files have been created by jews arabs, or all those other shits who write the wrong way? You know all those donkey enthusiasts aren't familiar with the left to right type of civilized systems. If your pdf is wrong, perhaps a jew or other primitive person has compiled it. You never know with these people.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 19:00:38 GMT+0000 (Coordinated Universal Time) with Vercel.