#35872 - 04/30/11 10:19 PM
Re: Looking for PDF Processing Ideas
[Re: BWW]
|
Eric Lachance
Unregistered
|
Hi Brandon,
Depending on what you need to do with the PDF, it may be much faster to write a script that uses the PDF/Alambic API to directly manipulate the PDF.
But before we go through that, I'll ask first what version of PlanetPress do you have installed? Because in version 7.2, there were some major changes in the PDF/Alambic engine which is much faster in a lot of cases. So if you are still on 7.0 or 7.1, upgrade and retest first before anything else.
|
Top
|
|
|
|
#35876 - 05/01/11 05:36 PM
Re: Looking for PDF Processing Ideas
[Re: ]
|
OL Guru
Registered: 02/26/04
Posts: 120
Loc: Bryan, Texas
|
Eric,
Thanks for your reply. We are using the latest version. We just upgraded last week. The tests that we ran are from the latest version. In the example above, we are using a PDF file as the data source and simply placing it in the background on the form. Nothing else. Then when the form is printed or converted to a new PDF, it takes a long time because it seems that each page of the PDF is converted one at a time to another format. We do not know what this format is, but this conversion seems to be the reason why it takes so long.
If you have any ideas, we would be greatful. We have been using this process for several customers. Maybe PlanetPress is not designed to handle PDF files with greater than 20,000 pages when you place the PDF page in the background. If there is nothing we can do to speed it up, then that is fine. We just need to know some things to try so we can design a new process if necessary.
Thanks, Brandon
|
Top
|
|
|
|
#35879 - 05/02/11 09:23 AM
Re: Looking for PDF Processing Ideas
[Re: BWW]
|
Eric Lachance
Unregistered
|
BWW, Having the page print as your background is the test, but is that your actual final process? Because PDF "processing" speeds aren't stable at "it takes x time to process y pages, always". It always depends on what you're doing with it. In the case of placing the PDF as a background, PlanetPress has to convert the PDF into PostScript in order to output your document because printers don't understand PDFs within a job, only PostScript. But if you had, say, a run of 20,000 invoices and all you were doing is filter out any invoice that wasn't for a US client, sort them by state and then print them, stacked, by state.. you wouldn't need a document at all, you could use Metadata tools to do this and it would be blazing fast. What I'm saying is that looking at a process that takes a full 20,000 page PDF, puts each page as a background to a Design document just to create a PDF, is not really a good test of PlanetPress' PDF capabilities and speed. Each and every situation you will encounter when using PDFs will be different in many ways and perhaps there are better methods of doing this... I would suggest opening a tech support request so that someone can take a look at a typical implementation and perhaps find some ways to optimize it. This may, however, require some scripting that's beyond the scope of support, but you can cross that bridge when you get to it. You can open a technical support request by phone or through the web.
|
Top
|
|
|
|
#35889 - 05/02/11 11:30 AM
Re: Looking for PDF Processing Ideas
[Re: ]
|
OL Guru
Registered: 02/26/04
Posts: 120
Loc: Bryan, Texas
|
Eric,
Once again, thanks so much for your reply. I get exactly what you are saying. Our process does do more than just place the background PDF on the document. But, after many, many hours of testing, I "think" I have found that placing the PDF in the background is the part that slows down the process. All I am trying to do is find out if there is a way to speed up a document that uses a PDF with a bunch (20k) pages as the background. Our testing has shown that the same slow down occurs if we are using a PDF as an image resource (rather than the data source) when it has 20,000 pages.
So, here is exactly what we are doing.
We have a PDF file with 20,000 pages. We need to print all of these pages in an entirely different sequence and we need to overlay data on each page that does not exist in the original PDF. We know the exact sequence that the pages need to print. We have this data and we can supply it to PlanetPress in whatever format works best. For example, we can supply to PlanetPress a CSV file like this.
PageinPDF,CustomerName 7,Microsoft 19000,Dell 22,Objectiflune
So, when we print, the first page printed would have page 7 from the PDF in the background and the word Microsoft overlayed on top of it. Then the second page printed would have page 19000 from the PDF in the background and the word "Dell" overlayed on top of it. Then the third page printed would have page 22 from the PDF in the background and the word "Objectiflune" overlayed on top of it.
Does this make sense? Actually, I am still leaving out a few things because we also do things like change paper trays based on some data, etc. So, we love the process we have created and the flexibilty that it gives us, we are just trying to determine if the entire idea is not really feasible when you have that many pages due to the way PlanetPress has to convert each page to Postscript.
Also, in our testing, we attempt to make 20,000 separate PDF files instead of using 1 PDF with 20,000 pages. This did not seem to make much difference.
Thanks again. We are just looking for ideas.
Regards, Brandon
PS. Another reason why we are asking this question is because we will be receiving even larger PDF files in the next few months. These are PDF files that we have no option but to receive them as is (we cannot change the way they are generated). The PDF files may have 100,000 or more pages.
One more thing, we have a multi-core processor and why does it only run at about 15% while processing these jobs? Is there a way to force PlanetPress to use more cores?
Edited by BWW (05/02/11 04:21 PM)
|
Top
|
|
|
|
#35919 - 05/03/11 04:44 PM
Re: Looking for PDF Processing Ideas
[Re: BWW]
|
Olivier J
Unregistered
|
Hi Brandon,
I would recommend using the virtual drive to store your images. In your form, you will then be able to call any page you want without having to put the PDF in your resources or as background. Not only will this be much faster, but also much lighter. If you are going to use 100000 page PDFs, you really don't want to put them in your form. The job size will be huge.
Let's use the MyPDF.pdf file as an example. If you have to call page 7 of that PDF, all you will do is use an Image object in the form. In the field where you specify which PDF to use, you will input this:
='MyPDF.pdf'
Of course this name can be entirely dynamic. In the "page" field, put page 7. Again, you can have a logic that will assign a variable to this page field instead.
That's it. When outputting in PlanetPress, the job file will no longer contain your image in it, keeping it very light. (approximately the size of your data + a few Kb for the form design) The image pages will be pulled at ripping time to include them in the output. It is simulating printer centric processing, while being set in Optimized PostScript.
Please open a support call if you require assistance with this technique.
Hope this helps. Regards, Olivier
|
Top
|
|
|
|
#35935 - 05/04/11 01:34 PM
Re: Looking for PDF Processing Ideas
[Re: ]
|
OL Guru
Registered: 02/26/04
Posts: 120
Loc: Bryan, Texas
|
Thanks Olivier. That is an excellent suggestion and we will begin testing. We have never used a virtual drive. We actually have not been putting the PDF in the document itself. It has been on the hard drive on the PC running PlanetPress and the name and page number is dynamically referenced in a picture object.
If we put the PDF on the virtual drive for PlanetPress image to use when creating new PDF's, do we also need to send it to the printer for printing purposes?
Just FYI...we have a Fiery rip on the printer with plenty of hard drive space.
Thanks again, Brandon
|
Top
|
|
|
|
#35940 - 05/04/11 04:50 PM
Re: Looking for PDF Processing Ideas
[Re: BWW]
|
Olivier J
Unregistered
|
Hi BWW,
Yes, any file sent to the virtual drive will also need to be sent to any output device this form will need to output on.
Please keep in mind that this method involves manual maintenance of the drives content. This can be done easily through the design tool using the virtual drive manager and the printer utilities.
Thanks. Regards,
Olivier
|
Top
|
|
|
|
|
|