Technical FAQs
Accusoft’s PrizmDoc is a powerful toolkit for displaying, converting, annotating, and processing documents. This example will demonstrate how to use the PrizmDoc Cloud Server to upload a HTML document and convert it to PDF format, using the Java programming language.
The HTML To PDF Conversion Process
Converting a file with PrizmDoc Server takes 4 distinct phases:
- Upload: We start by uploading the file to the PrizmDoc server (it’s only stored temporarily while the conversion takes place). PrizmDoc will return a JSON file with the internal file ID and an affinity token, which we’ll need for the next step.
- Convert: Next, we ask PrizmDoc to convert the uploaded file by using the file ID returned from the previous step. In this demo, we’ll just be converting an HTML file to PDF – but PrizmDoc can also handle 50+ other file formats. PrizmDoc will now provide a ProcessID which we’ll need for the next step.
- Status: We need to wait until the conversion process is complete before trying to download the file. The API provides the current status of the conversion so we can download the file once it is ready.
- Download: Once it’s ready, we can now download the converted file.
Our Java program will reflect each of these steps through a class called PrizmDocConverter, that will interface with the PrizmDoc Cloud Server for each of the phases.
PreRequisites
For our code to work, we’ll need a few things:
- A PrizmDoc API key. This code example uses the PrizmDoc cloud server – sign up for a free evaluation API key here.
- Java Development Kit. This code was written for Java SE Development Kit 8, available through Oracle.
- JSON reader: PrizmDoc server uses JSON ( https://www.json.org/ ) to transfer data between the client and server. For the demo, we’ll be using the JSON Simple package. It can be downloaded and compiled through the Git project ( https://github.com/fangyidong/json-simple ), or if you’re using Maven download it as a Maven project ( https://mvnrepository.com/artifact/com.googlecode.json-simple/json-simple ).
- PrizmDoc server API credentials – sign up for free. The API key is provided after login:
Take the API key, and place it into a file titled “config.json” at the root of your project:
Replace “{{ valid key here }}” with your API key, so if your API key is “ur7482378fjdau8fr784fjklasdf84”, it will look like this:
Now we can code!
HTML To PDF Java Sample Code
Click here to download the code files we’re using in this example. You can take these files and run with them, or keep reading for a detailed walk-through of the code.
You’ll also need to get your free evaluation API key here.
PrizmDocConverter Class
The PrizmDocConverter class uses 4 methods, corresponding to the 4 phases of PrizmDoc file conversion. The first part is to start up the PrizmDocConverter class.
Step 0: Loading the PrizmDocConverter Class
The first thing to do is to start up our PrizmDocConverter class. Using the Simple JSON package, we can get the apiKey field from our file:
We can now use our getAPIKey method to fire up the PrizmDocConverter class with the following call:
With the class started, we can start the next phase: Upload.
Step 1: File Upload
The PrizmDocConverter uses the method WorkFileUpload, that takes in the file name of the file we’re uploading and processes it using the Prizmdoc Work Files API:
The Work Files API takes in an uploaded file, and temporarily stores it for processing, so it is not available to other users through the Portal interface. The URL for the PrizmDoc Cloud Server for uploading Work Files is:
The Work File API uses the POST command to upload the file, and requires the following headers:
- Acs-api-key: Our API key. (If using OAuth instead, then follow the Work File API instructions instead: http://help.accusoft.com/SAAS/pcc-for-acs/work-files.html#get-data ).
- Content-Type: This should be set to application/octet-stream since we’re uploading bytes.
Start by setting up the URL and the HTTP headers. This example will use the HttpsURLConnection object to make the secure connections:
To upload the file, we have Java load the file into a FileInputStream, then read it through our HTTPSURLConnection:
When the file is uploaded, the Work Files API returns a JSON file with the details of the uploaded file:
To process this file after it’s uploaded, we need the fileId and the affinityToken for the next phase: Convert.
Step 2: Convert
With the file on the PrizmDoc server, we can now issue a convert request through the Content Conversion API with our ContentConvert method:
We use the affinityToken and fieldId we received in the previous Upload step. The format based on the versions supported by the Content Conversion API ( http://help.accusoft.com/PrizmDoc/v13.1/HTML/webframe.html#supported-file-formats.html ). In this case, we’ll use “pdf” as our conversion target.
This requires we upload a JSON file with the fileId and what we want to convert it to:
Note that it’s possible to request multiple files be converted, so if we were to upload a series of files, we could mass request conversions of all of them. With the fileId and affinityId extracted, we can start our request with the following URL for the Content Conversion API:
The following headers are required:
- Content-Type: The type of file, in this case “application/json”
- Accusoft-Affinity-Token: The Affinity Token we received from the Work File Upload
- Acs-api-key: The API key
To submit our request, we’ll construct our JSON file based on the fileId and affinityId like this:
Now we can upload our JSON string for the conversion request:
Getting the JSON file is the same as our WorkFileUpload method, and with it we’ll be able to download our converted file when it’s ready. Here’s the format of the JSON file our Content Conversion request provides:
The most important fields are the processId and the state, which we’ll take up in the next stage, Status.
Step 3: Status
Until the conversion request is complete, we can’t download the PDF file. To check the status, we use the same Content Conversion API call, with one difference: we use the processId as part of the URL:
This is very important – this is a GET command, but don’t try to use processId as a variable. Your URI will not look like this:
It must look like this:
For example, a processID of ahidasf7894 would give use a URI of:
With that, let’s look at our ConvertStatus method, which will query the PrizmDoc server via the Content Conversion API. This API will return another JSON file for us to parse:
This request is GET, with the following headers:
- Accusoft-Affinity-Token
- acs-api-key
Once again we get a JSON file back. There are three possible states:
- “processing” – The conversion is still in progress
- “complete” – the conversion is done, and we can download the converted file
- “error” – something has gone wrong with the conversion.
If the conversion is still processing, the “state” field will read “processing.” When complete, the “state” field reads “complete”. Our code puts in a loop that checks every 10 seconds, and when “state” is complete, we’ll get a JSON file that contains a new fileId – the one we want to download. In the event of an error, our program will print out the JSON file and exit:
Here’s an example of the JSON file we’re looking for when the status is complete:
We want to extract the fileId from the “output” node. Now we can get to the last phrase: downloading our file.
Step 4: Download
We’ll use the Work Files API again, only now we’ll use our new fileId to put in a GET request with the following URL:
Our Work Files API to download will work via the following method:
Our affinityToken we have from the initial WorkFileUpload process, and the fileId we just received from our ContentConvertStatus. The last is the name of the file to write out. In this case, it will be a PDF file, so make sure you have the file name set correctly. Downloading a file is nearly an inversion of our original WorkFileUpload request:
And with that – the conversion process is complete.
The Conversion Example
In our sample program, the conversion is called via a command line with two arguments: the file to be uploaded, and the name of the file to write the converted file to. In this case, our PrizmDocConverter class is wrapped up in a .jar called “PrizmDocConverter.jar” that contains our PrizmDocConverter class, and we make sure our execution is linked to our JSON Simple API to read the JSON files:
Replace the json-simple-1.1.1.jar with your version. With this call, simple.html will be converted to simple.html.pdf. Want to try it out for yourself? Get your free API key then download our sample project and test our the Accusoft PrizmDoc conversion system.
Project Goal:
Create a command line program in C# that can upload a HTML file to the PrizmDoc cloud server, issue a conversion command, and download the converted PDF file upon completion.
Project Goal:
- C# development environment (Visual Studio used in this example)
- PrizmDoc Cloud account (signup for a free account here)
Need to programmatically convert HTML into PDF files? Going from one format to the next can be troublesome, and maintaining the correct layout and styles in the final document is often a complex challenge.
PrizmDoc API handles file type conversions smoothly and with no hassle. No installation, no servers to set up. Just use the API to upload a file and have it converted for you. For this demonstration, we’ll be creating a C# program that just requires the Newtonsoft.JSON package (which can be acquired via NuGet).
PreRequisites
Before we get started, there’s a few steps to take. First, register for your own Accusoft PrizmDoc Cloud services account. Your account will come with a free trial.
Once your account is set up, copy the API key. This will be used to authenticate your requests to the PrizmDoc server:
This demo will require the use of Newtonsoft.json package. If we’re using Visual Studio, just add this package to your project via the NuGet Package Manager:
The HTML To PDF Conversion Process
Converting a file using PrizmDoc servers is a 4 step process:
-
- Upload: Upload the file to the server. We’re not making this a permanent part of the file portal – just putting it in temporarily while PrizmDoc does the work. When the file is uploaded, PrizmDoc returns a JSON response with internal file ID and an affinity token – we need both for the next step, which is conversion. See the PrizmDoc documentation on the Work Files API for more details ( http://help.accusoft.com/SAAS/pcc-for-acs/work-files.html#get-data ) .
-
- Convert: Issue the conversion command on the uploaded file. Using the file ID returned from the Upload stage, our software orders PrizmDoc to convert that file to one of the supported file formats listed on the PrizmDoc documentation page. In this demo, we’ll just be converting an HTML file to PDF – but this program can be modified for a number of other file formats. Once a conversion process has started, the Conversion API provides a ProcessID that can be used to check the status of the conversion process.
The Content Conversion API documentation provides more details on how the process works ( http://help.accusoft.com/SAAS/pcc-for-acs/webframe.html#content-conversion-service.html ) .
- Convert: Issue the conversion command on the uploaded file. Using the file ID returned from the Upload stage, our software orders PrizmDoc to convert that file to one of the supported file formats listed on the PrizmDoc documentation page. In this demo, we’ll just be converting an HTML file to PDF – but this program can be modified for a number of other file formats. Once a conversion process has started, the Conversion API provides a ProcessID that can be used to check the status of the conversion process.
-
- Status: Until the conversion process is complete, we don’t want to try to download a file that doesn’t exist. The Content Conversion API as listed above also provides the current status of the file conversion with a ProcessID. Once the file conversion is complete, the Content Conversion API provides a new FileID of the converted file.
- Download: Using the same Work File API with our new File ID and affinity token, we can now download the file.
And with that, let’s get coding!
HTML to PDF C# Code Sample
Here’s the C# code you’ll need to convert HTML to PDF using PrizmDoc. You can jump right into the code and run with it, or keep scrolling for a detailed walk through of the code and how it works.
HTML to PDF C# Code Analysis
Step 0: Setting Up Our Class
In the sample project, there are a few housekeeping items to work through. The first is setting up the use of our API key. If you’ve downloaded our sample project, then edit the file config.json and insert your PrizmDoc API key like this:
In our sample project, the actual work is being driven by our PrizmDocConvert class. The first thing we do when generating the class is set the API key:
In our sample program, the API Key is read from our config.json file:
Step 1: Upload
The first part of our process is to Upload our file to the PrizmDoc server. Our member function really just needs one thing – the name of the file to upload, and it will return a JSON.Net JObject with the results:
The Work File API uses the POST command to upload the file, and requires the following headers:
- Acs-api-key: Our API key. (If using OAuth instead, then follow the Work File API instructions instead: http://help.accusoft.com/SAAS/pcc-for-acs/work-files.html#get-data ).
- Content-Type: This should be set to application/octet-stream since we’re uploading bytes.
The method uses the WebClient object to perform our uploading. So we set up the Work File API URI, and set our headers:
Then, read our file into a Byte object, and upload it to the PrizmDoc server using a POST command. The Work API will return a JSON response, which we can then parse for the next steps. In this case, we’ll upload an HTML file:
Step 2: Convert
Our uploaded file is sitting on the PrizmDoc server, all happy and content. But we want more than that – we want to convert it! In the last phase, the Work API returned a JSON file to us. Let’s peek inside:
If you’re familiar with JSON, then this should be simple enough. In our test application, we can snag the two fields we care about (the fileId and the affinityToken) with our JSON.Net objects. Our sample code uses another function to call our Upload method, but the final result is a JSON object with the Work File API results:
Now it’s time for the Content Conversion API to come into play. This API call requires a JSON file be posted with the details of what file is to be converted.
In this case, we’ll use our new method Convert to utilize that API with three parameters: the affinityToken, the fileID, and the format to convert to (which defaults to pdf):
We already covered how to add the headers to the WebClient object, so we won’t belabor the point. These are the required headers:
- Content-Type: The type of file, in this case “application/json”
- Accusoft-Affinity-Token: The Affinity Token we received from the Work File Upload
- Acs-api-key: The API key
And our new URI will be “https://api.accusoft.com/v2/contentConverters”.
Our Convert request is in the form of a JSON file. We could submit requests for multiple files to be converted, but this demo will just focus on the one. Here’s the format of the JSON request:
Our method generates the JSON file this way.
There’s a lot of whitespace here, but it helps make clear what nodes of the JSON file belong to which parts.
All that’s left is to POST the JSON string:
The conversion process is started! But until it’s complete, we can’t download our completed file. So we need to request the Status until the conversion is complete.
For that, we need to know what the ProcessID is. The Content Conversion API returns a JSON file, of which the most important thing we need is the processID:
Step 3: Status
The Content Conversion API allows us to track the status of a conversion using the ProcessId returned from our Convert request. The URL is the same, with one major difference:
This is very important – this is a GET command, but don’t try to use the WebClient QueryString.Add member to add the processId. Your URI will not look like this:
It must look like this:
For example, a processID of ahidasf7894 would give use a URI of:
With that, let’s look at our ConvertStatus method, which will query the PrizmDoc server via the Content Conversion API. This API will return another JSON file for us to parse:
Our headers in this case are similar to before:
- Accusoft-Affinity-Token
- acs-api-key
That’s it – since this is just a GET, and the Content Conversion API is getting the processId from the URI we request from, we just need to submit the request and return the JSON file:
Here’s a sample JSON file that will be returned:
What we’re looking for is for the “state” field to return “complete”. The sample code uses a 30 second wait timer after each Convert Status request. Once it sees the conversion is complete, it snags the new fileId that will be used to download the file:
And here’s the JSON file that returns the result we need:
Perfect. All that’s left is to extract the fileId:
Note that a Conversion Request can generate multiple files. For example, converting a multipage document into a series of image files. For more details, see the Content Conversion API documentation.
All that’s left now is to download our converted file with the Work File API.
Step 4: Download
We’re back where we started – the Work File API. This time, instead of uploading a file, we’re going to download the processed file. The URI is the same as before, with one change – the fileId of the file we’re downloading:
For our Download Work File method, we set up the method this way. The only new parameter is the outfile – this will be the name of the file we save the converted file to:
We use the same headers as in our Convert Status request:
- Accusoft-Affinity-Token
- acs-api-key
Since this is a Get command, we just submit the request and pipe the results out to our file:
And with that, our program is complete! Here’s the results. In this case, the method calls are set to be verbose to track progression, but feel free to edit the program to your liking:
The converted document will retain images, lists, links, and other formatting and elements:
This program sample is just converting a simple HTML file – but if we were converting something like a Microsoft Word document, the results would be the same – a perfectly created PDF file. In fact, PrizmDoc uses Microsoft Word to process the Word Document conversion to have the highest possible fidelity.
Many organizations utilize spreadsheets to track data and perform complex calculations. Since spreadsheets offer substantial flexibility, it’s not uncommon for a single organization to use them in a variety of ways. For instance, one department might use them for budgeting while another deploys them for risk assessment. Although they can handle complex calculations, spreadsheets are relatively easy to set up and don’t require the same programming knowledge as more specialized solutions. That accessibility has led many organizations to simply convert Excel to web applications using API integrations rather than building new functionality from scratch.
Why You Should Convert Excel Files to Web Applications
Consumers often turn to financial and insurance companies looking for simple answers to simple questions:
- How many payments will it take to eliminate my debt?
- Will adding another person to my insurance policy change my rate?
- What will the monthly payments on my loan cost based on different interest rates?
To answer questions like these, someone in the organization typically enters the customer’s data into a premade spreadsheet, applies a few conditions using preset formulas, and shares the result. This process could be significantly streamlined by making these calculations readily available as a web application to anyone who visits the firm’s website, allowing both prospective and current customers to get answers quickly while also freeing up time for employees to work on more high-value tasks.
Unfortunately, building a secure, functional web application takes up valuable development resources. A developer could easily spend weeks converting complex spreadsheet formulas into a fully-functional application that integrates into the website or larger platform seamlessly. Even worse, if anything about those formulas were to change (as is often the case with financial and insurance formulas), more development resources will need to be pulled away from existing projects to make the updates.
By converting Excel files to web applications, firms can avoid these problems and provide clients with ready access to the calculations they need. FinTech and InsurTech developers can accommodate this need by building integrations into their solutions that allow users to easily upload and share spreadsheets entirely within an application and without any Microsoft Word dependencies. This bypasses the time-consuming build process and makes it much easier to update the formulas as needed.
Keeping Your Spreadsheets Secure
Of course, making spreadsheets readily available as web applications presents a few important security challenges. Many of the calculations running inside an organization’s spreadsheets are proprietary or contain hidden data that needs to remain private for various confidentiality reasons. That’s why companies are hesitant to simply send copies of their internal spreadsheets to customers or vendors. If those Excel files are made accessible online, there’s an obvious risk that someone could download a copy for themselves or access valuable private intellectual property.
Incidentally, this is also the reason why many firms struggle with sharing spreadsheets even in a collaborative environment. Not only are .XLSX files among the most commonly used file extensions by malware, but granting cloud providers or email servers access to spreadsheets represents too great a security risk for companies in heavily regulated industries.
By converting Excel to web applications, however, organizations can maintain strict access and visibility controls over their spreadsheet files. A good spreadsheet viewer integration will allow users to determine what people see when they use the application and also what information they can access. Formulas and calculations that contain vital intellectual property can be hidden completely. Visitors can be restricted to only editing cells that apply to their information, allowing them to use the spreadsheet without breaking or altering its functionality.
Sharing controls can also restrict what can be downloaded locally. A visitor may need to download or print a copy of their calculations, but they don’t need to download a fully functional copy of the spreadsheet file. Even in a collaborative environment, allowing people to download and edit copies of a spreadsheet can introduce significant version confusion. By keeping everything safely within the confines of the organization’s larger web application, essential data remains as secure and up-to-date as possible.
Other Reasons to Convert Excel to Web Applications
The versatility of spreadsheets allows people to adapt them to a variety of uses. In addition to more traditional budgeting and adjustment tasks, they can also be used for things like calculating survey results, analyzing resource usage, or estimating server uptime. Adding customizable calculators that provide quick results to a website experience can provide customers with important information and keep them engaged.
Rather than building a specialized app or plug-in for every one of these calculators, organizations can simply use a spreadsheet viewer integration to quickly create one without any specialized coding or development knowledge. When the integration is set up within their solution, they can even customize it to match their branding and make it look more like a designed application than a simple spreadsheet.
Explore the Potential of Spreadsheet Integration with PrizmDoc Cells
Accusoft’s PrizmDoc Cells was originally designed to help our clients securely view and share XLSX files without any third party dependencies, but it’s increasingly being used to help improve customer experiences across a variety of applications. Get a hands-on experience with this API-driven integration to explore the potential of converting your Excel files to web applications. For a more detailed overview of what you can do with PrizmDoc Cells, sign up for a free trial.
Goal
Create a C# command line program that can read from existing Microsoft .docx (or .doc) documents and convert them to an Adobe PDF file
Requirements
- Microsoft Windows 7 or greater
- Microsoft Visual Studio
- Accusoft ImageGear .NET Developer Toolkit ( download free trial version here )
Programming Skill
Visual C# Intermediate Level
Need to turn Microsoft Word documents into PDFs? That’s easy: Click File > Export > Create PDF/XPS > Publish. Want to do this 1000 times? Nah. The process is laborious if you have more than one document.
So let’s use C# to convert Docx or Doc files to PDF programmatically, so you can convert hundreds of documents in seconds.
Installing The Docx To PDF SDK (ImageGear)
First, we need to install a .NET SDK for handling the heavy lifting of the Word to PDF file conversion. The examples below will be using Microsoft Visual Studio 2017, but you can use previous versions back to Visual Studio 2010.
- After you’ve installed Visual Studio to your liking, head over to the Accusoft ImageGear Developer Toolkit, and download the version for .NET. As we can see, there is support for C and C++ to fit your favorite development platform. Download the .NET installer.
- Now you can run the Accusoft ImageGear Developer Toolkit installer. It will take a little bit as it downloads additional libraries.
- OK – installation is done! Let’s get to coding!
The ImageGear Developer Toolkit will put the files into the Public Documents of the file system, usually located at “C:UsersPublicDocumentsAccusoft”. We’ll be referring to them as we go along.
Setup Your Project
Once you have the toolkit installed, let’s create a new C# project. For this project we’ll just do a C# command line program so we dive right into the meat of the program, rather than needing to build a GUI with Windows Forms or WPF. But once you have it here, you can import this class into any other .NET project you like.
Just click on File, Project, and from the “Visual C#” list select Console App (.Net Framework):
To keep things simple we’ll name the project “ImageGearConversionDemo.”
Once the project is started in Visual Studio, we can use NuGet to add the reference files we need:
-
- From within Visual Studio, click on Tools, NuGet Package Manager, then Manage NuGet Packages for Solution.
- Make sure that the Package Source is set to nuget.org:
-
- Select “Browse”, then input “ImageGear” into the search window. You’ll see different installation options depending on your project. Just to make things easier on us, select “Accusoft.ImageGear.All” to snag everything in one fell swoop. Or you can just specify the ones you need: ImageGear.Core, ImageGear.Evaluation, ImageGear.Formats, ImageGear.Formats.Office, & ImageGear.Formats.PDF. Click the project we want to apply it to, click “Install”, and NuGet will take care of the details.
-
- We can also see via the “Solutions Explorer” window that NuGet automatically added the references we need for the project:
Next we’ll want to make sure that the components that do the document conversion are in place.
-
- Click on Project, then Properties at the bottom. In the case of our example, that will be ImageGearConversionDemo. Click on Build. Make sure the Platform Target is x64, and the output directory is binDebug.
-
- In the Toolkit install directory, in a standard install, is a folder C:UsersPublicDocumentsAccusoftImageGear.NET v23 64-bitBinOfficeCore. Copy the entire folder to your Debug directory.
To make things easier, let’s also set up our project properties for how the program is run. Click the Debug tab. Our final program is going to take two parameters:
- The DOCX file we’re going to convert.
- The PDF file we’re converting our DOCX file to.
You can set Command line arguments in Visual Studio by right clicking on your project in the Solutions Explorer, and going to Properties > Debug. Put in the two file names we’ll be using. In our case, we will be using TheGreatestDocInTheWorld.docx, and outputting it to TheGreatestDocInTheWorld.pdf . Set those as your arguments, then make sure that the Working directory is in our Debug folder since that’s where we’re generating our program to.
If you want to add the ImageGear references to your program manually, you can use the instructions in the Accusoft ImageGear .NET documentation.
With that, now we can get to coding!
C# Sample Code
Here’s our C# code for testing out ImageGear’s Word to PDF conversion capabilities. It works with .docx and .doc files. You can copy/paste this code to get started (you’ll also need a free trial version of ImageGear), or keep scrolling for a walkthrough of the code.
using ImageGear.Core;
using ImageGear.Evaluation;
using ImageGear.Formats;
using ImageGear.Formats.Office;
using ImageGear.Formats.PDF;
using System.IO;
using System;
namespace MyProgram
{
/*
* This class acts as a simple method for converting documents from DocX format to PDF format.
*
* */
class DocConverter
{
//Initialize the license - save time later as the program runs
public DocConverter()
{
// Initialize evaluation license.
ImGearEvaluationManager.Initialize();
ImGearEvaluationManager.Mode = ImGearEvaluationMode.Watermark;
// Initialize common formats
//Console.WriteLine("Initializing the format.");
ImGearCommonFormats.Initialize();
}
/*
* SaveAs function: takes 3 arguments.
*
* This function has no exception handling, so be sure the files exist and are proper formats
*
* fileIn: The docx document to be converted to a pdf.
* fileOut: The pdf file to be output.
* verbose: True - write command line statements. False - stay silent
*
*
* */
public void SaveDocxAsPDF(string fileIn, string fileOut, bool verbose = false)
{
//set the filters
ImGearFileFormats.Filters.Add(ImGearOffice.CreateWordFormat());
//Console.WriteLine("PDF Document additions.");
ImGearFileFormats.Filters.Add(ImGearPDF.CreatePDFFormat());
ImGearPDF.Initialize();
using (FileStream inStream = new FileStream(fileIn, FileMode.Open, FileAccess.Read))
{
using (FileStream outStream = new FileStream(fileOut, FileMode.Create, FileAccess.Write))
{
int startPageNumber = 0;
// Load Office document.
if (verbose == true)
Console.WriteLine("Reading the document " + fileIn);
ImGearDocument igDocument = ImGearFileFormats.LoadDocument(inStream);
// Save PDF, overwrite the file if it's already there.
if (verbose == true)
Console.WriteLine("Writing the PDF " + fileOut);
ImGearPDFSaveOptions pdfOptions = new ImGearPDFSaveOptions();
ImGearFileFormats.SaveDocument(igDocument, outStream, startPageNumber, ImGearSavingModes.OVERWRITE, ImGearSavingFormats.PDF, pdfOptions);
}
}
// Dispose of the PDF component.
if (verbose == true)
Console.WriteLine("Terminate the PDF");
ImGearPDF.Terminate();
}
static void Main(string[] args)
{
Console.WriteLine("Starting the conversion program.");
DocConverter docx2pdf = new DocConverter();
//check to make sure we have two arguments - file 1 and file 2
if (args.Length == 2)
{
//make sure that the files exist
if (File.Exists(args[0]))
{
docx2pdf.SaveDocxAsPDF(args[0], args[1], true);
}
else
{
Console.WriteLine("File does not exist: " + args[0]);
}
}
else
{
Console.WriteLine("Not enough arguments.");
for (int i = 0; i args.Length; i++)
Console.WriteLine(args[i]);
}
//requests the user hit enter to end the program
Console.WriteLine("Conversion complete.");
Console.WriteLine("Hit Enter to terminate.");
Console.ReadLine();
}
}
}
Understanding The Word To PDF C# Code
The first part of the C# program is going to be importing our namespaces. If you used NuGet then you have all of the references you need. But the program needs then declared so they can be used. You can find more information on the API calls on the ImageGear User Guide, but here’s the namespaces we’ll need:
using ImageGear.Core;
using ImageGear.Evaluation;
using ImageGear.Formats;
using ImageGear.Formats.Office;
using ImageGear.Formats.PDF;
using System.IO;
using System;
The ImageGear.Formats.Office and ImageGear.Formats.PDF are what’s going to provide the bulk of the work here – they are what will be able to read from a docx, and write to a pdf file.
To handle the class conversions, we’ll create a simple class and call it DocConverter. For this example, we’re going to populate it with just one method – SaveDocXAsPdf:
class DocConverter
{
public void SaveDocXAsPDF(string fileIn, string fileOut, bool verbose = false)
{
...
SaveDocXAsPDF has two required arguments, and one optional one that we’ll use in this example to control any console output. They’re just there so we can trace the program steps as it goes through – by default, they won’t display.
Before we do anything, we have to initialize the license. We’ll be using an evaluation copy for this demonstration – but if you already have a license, follow the registration steps on the Accusoft ImageGear .NET instruction page ( http://help.accusoft.com/ImageGear-Net/v24.0/Windows/HTML/webframe.html#topic601.html ).
// Initialize evaluation license.
ImGearEvaluationManager.Initialize();
ImGearEvaluationManager.Mode = ImGearEvaluationMode.Watermark;
The next thing to do is to initialize the ImageGear File Format – in this case, Microsoft Word. In another example we’ll show how to expand that to other file formats.
And while we’re at it, we’ll also initialize the ImageGear PDF object. This is an important step: Whenever we Initialize an ImageGear PDF object, it must be terminated later. Here’s how it looks in our program:
ImGearPDF.Initialize();
//SOME CODE HERE
ImGearPDF.Terminate();
ImGearPDF is not a typical C# object that self terminates, so make sure it’s terminated.
Now – the actual reading of .doc/.docx files and writing of PDF files is pretty simple:
using (FileStream inStream = new FileStream(fileIn, FileMode.Open, FileAccess.Read))
{
if (verbose == true)
Console.WriteLine("Writing the PDF "+fileOut);
using (FileStream outStream = new FileStream(fileOut, FileMode.Create, FileAccess.Write))
{
int startPageNumber = 0;
// Load Office document.
ImGearDocument igDocument = ImGearFileFormats.LoadDocument(inStream);
// Save PDF, overwrite the file if it's already there.
ImGearPDFSaveOptions pdfOptions = new ImGearPDFSaveOptions();
ImGearFileFormats.SaveDocument(igDocument, outStream, startPageNumber, ImGearSavingModes.OVERWRITE, ImGearSavingFormats.PDF, pdfOptions);
}
If we follow the code, the process is straightforward. Remember the “verbose” option will turn on and off the console outputs if you want the program to be quieter.
First, we create a file input stream, and a file output stream. The office document is loaded into the variable igDocument. We then set up the pdfOptions that will be used for exporting the file. And finally – write the file. If there is already a PDF file with the same name, we’re going to overwrite it.
Let’s see our C# Docx to Pdf code in action:
If we compare our new PDF to a PDF created using Microsoft Word’s export option, the file created by ImageGear is smaller – 383 KB versus 504 KB. And the PDF file generated with ImageGear has kept all internal links and formatting.
Converting a DOCX to PDF is just scratching the surface of what ImageGear can do. ImageGear supports over 100 file formats for conversion, editing, compression, and more To find out more, check out the ImageGear overview page.
When it comes to downloading or viewing documents over the internet, PDFs have long served as a de facto standard for most organizations. Since PDFs are not a proprietary file format, there’s rarely any risk that someone will be unable to open them. However, just because PDFs have become so commonplace doesn’t mean that they all share the same characteristics. For anyone who has ever wondered why some PDFs seem to take so much longer to load than others, the answer often has less to do with connection and processing speeds as it does with the way the PDF’s content is organized.
More specifically, it’s a matter of whether or not the document is a linearized PDF.
What Is a Linearized PDF?
Sometimes called “fast web view,” linearization is a special way of saving a PDF file that organizes its internal components to make them easier to read when the file is streamed over a network connection. While a standard, non-linearized PDF stores information associated with each page across the entire file, linearized PDFs use an object tree format to consolidate page elements in an ordered, page by page basis. When a reader opens a linearized PDF, then, all of the information needed to render the first page is readily available, allowing it to load the page quickly without having to search the entire document for a specific object like an embedded font.
Originally introduced with the PDF 1.2 standard in 1996, linearized PDFs were critical to the format’s early internet success. In order to view a non-linearized PDF, the entire document needs to be downloaded or read via HTTP request-response transactions. Given the bandwidth limitations of early internet connections (often still between 28.8k and 33.6k in 1996), this created a serious bottleneck problem when it came to document viewing. While it was possible to view a document without downloading it, the multiple HTTP requests needed to do so could easily be disrupted if the connection was lost, something that was all too common in the days before reliable broadband connections were introduced.
Non-Linearized vs Linearized PDFs
To visualize the difference between a non-linearized PDF and a linearized PDF, imagine two separate people sitting down to file their business taxes. One person has all of their receipts, invoices, and financial documents scattered across their office, with some stacked in unordered piles, others crammed into unlabeled folders, and even more stuffed into assorted drawers and file cabinets. Finding and organizing all of this documentation would take almost as much time as actually filing the taxes themselves! The second person, however, has all of the records they need stored in a neatly labeled file cabinet, allowing them to retrieve everything quickly and easily.
The first example is similar to a non-linearized PDF, while the second shows how much easier it is for a reader to access the information it needs to render the file. Even better, since each page is organized in the same way, jumping to a different page in a multi-page PDF doesn’t require the reader to reload the entire file. It can simply read the current page and get everything necessary to display the PDF correctly.
Why Linearized PDFs Are Still Valuable
In a world dominated by high speed internet connections, it’s fair to wonder whether or not PDF linearization is still necessary. For small PDFs that are only a few pages, linearization may not be essential, but when it comes to larger documents, linearization can still deliver substantial performance and user experience benefits.
Consider, for instance, a document that consists of several hundred, or even several thousand, pages. Loading that entire document and keeping it cached may be possible, but it’s an inefficient use of processing and bandwidth resources. With a linearized PDF, a reader typically encounters a linearization directory and hint tables at the top of the document, which provides it with instructions on where to locate any necessary resources within the file. After loading the hint tables and the first page, the reader stops the download process rather than opening the entire file. When the user navigates to another page, the reader can quickly reference the hint tables and jump to that page.
This ensures that the reader is only ever loading the pages that actually need to be displayed, which helps to conserve memory, processing resources, and bandwidth. For mobile devices with limited file and cache storage, linearized PDFs are much easier to manage than their non-linearized counterparts. They also provide some protection against network interruptions, which could make it difficult to download and view an entire document.
How to Linearize PDFs
Although the linearization process is well laid out in the current PDF standards documentation, many PDFs are created using software that doesn’t automatically linearize the content. More importantly, some linearized PDFs are “broken” by a process called incremental saving, which saves minor updates at the end of the file, rather than changing existing structure. Over time, too much incremental saving can undermine the effectiveness of a linearized PDF.
The best way to resolve such problems and linearize the PDF is to save a new, linearized version of the file using PDF editing and conversion tools.
Take Control of PDFs with PrizmDoc
Accusoft’s PrizmDoc provides a broad range of document functionality that allows applications to more effectively create, convert, and compress PDF files.
For a closer look at PrizmDoc and to see its powerful document processing capabilities in action, download a free trial today.
In a recent LegalTech article, Lisa Senger highlighted some of the concerns facing law firms in an age when sharing information is becoming increasingly digital. Collaborating with colleagues and sharing information with opposing counsel can put confidential client information at risk. As the need for increased security rises, developers are under pressure to meet the demands of their clients and ensure the safe and confidential transfer of information over a variety of networks outside of the organization.
To save the money and effort it takes to develop and stay up-to-date on the latest software for security, many developers are looking to third party providers, like Accusoft, to supply the latest solutions for eDiscovery confidentiality and security. How can Accusoft enhance your current eDiscovery offer?
Redaction
The easiest way to preserve the confidentiality of your end users’ clients is to eliminate the confidential information. With built-in redaction capabilities, Prizm Content Connect makes file sharing and collaboration easy and secure. When text is selected and redacted from a document, the redactions are burned into the new saved version, eliminating the confidential data from the file before it is shared with a third party. With auto-redaction, eDiscovery is less burdensome on law firms, allowing them to find repeated instances of confidential information–like a social security number–and redact it throughout a document automatically.
Digital Rights Management
When sharing client information outside of the firm, your clients are worried about that information being stolen or otherwise misused. For instance, after a case is over, opposing counsel may not delete files in a timely manner, putting confidential information at greater risk. With DRM, you can control who sees which files, and for how long. Our software allows you to assign permissions to view, print and download files and documents. After a case is over, you can simply revoke permission to access the document to ensure users cannot view confidential client information longer than necessary.
These are just two ways Accusoft provides security and confidentiality solutions for you and your clients. Our Prizm Content Connect suite of products can be tailored to suit your unique needs in catering to your eDiscovery and legal clients. You can view code samples here.
To see PCC in action, download your free 30 day trial today.
Despite the explosive growth of big data and sophisticated analytics platforms, a 2019 study by Deloitte found that 67 percent of business leaders are not quite comfortable using them to inform decision making. For many organizations, spreadsheets remain the preferred tool for managing data and evaluating trends. Developers looking to build the next generation of business applications can accommodate those tendencies by integrating native spreadsheet support for Microsoft Excel workbooks.
Excel Worksheets vs Excel Workbooks
Although sometimes referred to interchangeably or described broadly as spreadsheets, there is a key distinction between an Excel worksheet and an Excel workbook. A worksheet consists of only one spreadsheet while a workbook contains multiple different spreadsheets separated by tabs.
The difference may not be very important when viewing or sharing XLSX files natively in Microsoft Excel, but it can create serious challenges when rendering those files in another application. Without some way of accurately rendering dynamic spreadsheet data, viewers are often forced to resort to a static print preview image. This process makes the file viewable, but also leaves it “flattened” because all interactive elements are removed from the spreadsheet cells.
If the workbook contains worksheets with linked data (that is, cell data from one sheet is affected by cell data from another sheet), it’s critical that a viewing solution preserves the dynamic aspects of the file. The advantage of a spreadsheet is that it can serve as a working document. Without the ability to interact with it, users might as well simply copy and paste the data into a text document.
Managing Excel Workbooks with PrizmDoc Cells
PrizmDoc Cells provides several options for managing Excel workbooks, making it easy to transition back and forth between XLSX format and web browser viewing. Once a proxy route is set up within the application to send API calls to the PrizmDoc Cells server, three different commands can be used to manage Excel workbooks.
Upload Workbook
This API call adds a new XLSX file for viewing and editing. When a document is uploaded to the system, the server assigns a unique workbook ID to it so it can be found and rendered in the application’s viewer in the future. After uploading a workbook, a new session can be created using the workbook ID for viewing and editing purposes.
Download Workbook
When PrizmDoc Cells displays a spreadsheet, it renders the XLSX file itself, but it doesn’t make any alterations to that file. As each session makes edits to the workbook, those changes are associated with the document ID rather than the original XLSX file, which preserves the integrity of the original spreadsheet. At some point, however, those edits may need to be saved into a new Excel workbook.
The download API call converts the current session document so it can be downloaded as an XLSX file. File availability can be set during the download process to control who will have access to the new workbook.
Delete Workbook
Old versions of workbooks often need to be deleted for security reasons, usually because they contain confidential data. Since the original XLSX file remains safely within application storage, there often isn’t much sense in retaining workbooks IDs that aren’t being used. The delete API call removes a workbook ID from the server. Once removed in this way, the workbook cannot be viewed, edited, or downloaded by PrizmDoc Cells.
Preserving Workbook Functionality
Since PrizmDoc Cells natively renders information contained in an XLSX file, it retains the dynamic elements that make spreadsheet workbooks so useful to organizations. Not only does it preserve proprietary business logic and formulas, but it also maintains the integrity of this information across multiple worksheets. Cell content can still be searched to quickly locate important text or data throughout the workbook.
For situations where proprietary formulas need to be protected, PrizmDoc Cells allows users to upload XLSX workbooks as values-only files, with all spreadsheet formulas removed. Also, any cells locked in an uploaded XLSX file will remain locked in PrizmDoc Cells to preserve workbook security.
True Spreadsheet Workbook Support for Your Applications
Many organizations continue to depend upon spreadsheet workbooks to manage their business. By providing feature-rich workbook support within their applications, developers can help them retain control over their proprietary spreadsheet formulas without sacrificing the functionality they expect from Excel.
PrizmDoc Cells makes it easier than ever to share spreadsheet workbooks without having to rely upon Microsoft Excel dependencies. Shared XLSX files can remain safely within a secure application environment to prevent unauthorized downloads or troublesome version confusion. Get a first-hand look at how PrizmDoc Cells can enhance your application in our extensive online demo.
Document viewing capabilities are no longer a specialized feature that require dedicated applications. Thanks to powerful software integrations, developers can now build PDF viewing into their solutions to create a better user experience and streamline workflows. The growing popularity of mobile devices, however, has posed a few challenges to development teams accustomed to building an exclusively desktop experience, especially when it comes to JavaScript PDF viewers. That’s why one of Accusoft’s key development goals has focused on making a JS PDF viewer responsive to mobile screens.
The Increasingly Mobile Internet
Since 2017, mobile devices have accounted for about half of global internet traffic. This trend has been fuelled primarily by a combination of improved cellular network coverage and the ever-increasing processing capabilities of the average mobile device. It’s hardly a surprise, considering that the latest smartphones are often the most powerful computing device people own. Even for consumers who own desktop or laptops as well, mobile devices make it easy to access internet services on the go, allowing them to manage finances, collaborate on work tasks, or utilize eLearning resources (or watch cat videos).
Today’s customers expect organizations to provide applications that deliver a consistent experience across all devices, regardless of screen size. The era of designing software exclusively for desktop computers and treating mobile support as an afterthought is long gone. If an application’s mobile experience doesn’t at least match that of the competition, customers will quickly make a change.
Viewing Challenges on Mobile Devices
Mobile devices can present a few challenges for application developers, especially when it comes to viewing documents like PDFs. While there are many PDF reader apps available for mobile platforms, they typically require users to download a file to local storage or to a cloud service in order to open a document. In addition to being inconvenient, this often leads to some presentation problems because the reader may not render the PDF exactly as the creator intended, especially if it’s not linearized.
Developers could, of course, rely upon the mobile browser to display documents, but this also introduces problems. As with external reader apps, the browser viewer may not render the document as intended, which creates an uneven user experience across multiple platforms. More importantly, the browser’s interface may lack key controls that enhance the viewing experience on mobile devices, especially if the viewer is little more than a basic PDF.js library.
PDF.js and Mobile PDF Viewing
The open-source PDF.js library was originally designed for Mozilla’s Firefox browser, but it has become the basis for a broad range of PDF viewers due to its flexibility. That’s partly why the Accusoft PDF Viewer uses PDF.js as its foundation. However, one area where that versatility is sorely lacking is with regards to mobile support.
More specifically, PDF.js doesn’t supply a UI that is responsive for different screen types. It was designed to render PDFs to a conventional computer display and provides the expected tools needed to navigate a document using a keyboard and mouse interface. Even if developers were to incorporate the PDF.js library into their application, they would still need to build a new user interface for mobile devices. Otherwise, key mobile viewing features like touch scrolling and pinch to zoom would be handled not by the viewer, but by the device’s touchscreen interface.
While this might sound like a small distinction, it can actually create serious problems when it comes to rendering the document at different zoom levels. Essential features like text search may also be rendered useless by the poor interface, and the lack of thumbnail previews could make navigating the document tedious.
Making a JS PDF Viewer Responsive
Today’s developers need viewing integrations that offer out-of-the-box mobile support to deliver a consistent viewing experience. That’s why we built upon the foundation of PDF.js to create a responsive viewer interface that instantly adapts to any screen size. Easily integrated into any web-based application, the Accusoft PDF Viewer immediately determines what type of device is being used when a document is opened. If it’s a mobile device, the viewer replaces the controls used for desktop viewing with dedicated mobile controls designed for a touchscreen.
Key touch features like pinch-to-zoom allow users to interact with PDFs on mobile and tablet devices just as easily as they could with a mouse and computer screen. That usability is the key component of making a JS PDF viewer responsive. Mobile screens should never be treated like conventional screens. By integrating a mobile-ready viewer into their web application, developers can ensure viewing consistency across platforms while also allowing people to access documents where they want and when they want them.
Integrate Responsive PDF Viewing in a Snap
Building an application that includes a JS PDF viewer responsive to mobile screens is easier than ever thanks to Accusoft PDF Viewer. As a flexible JavaScript PDF library, it integrates quickly into any web-based application with just a few lines of code and no complicated server configurations. Our industry-leading expertise with imaging technology has allowed us to make substantial improvements to the way PDF.js renders PDF documents and ensure high levels of resolution regardless of zoom level or screen DPI.
To find out what Accusoft PDF Viewer can do for your application, download the Standard Version today at no cost and test its powerful viewing features in your development environment. With only a few lines of code, it’s the fastest way to add responsive PDF viewing to your web-based software solutions.
For expanded features like annotation markup tools, eSignature capabilities, UI customization, and white labeling, consider upgrading to Accusoft PDF Viewer Professional Version. Download our fact sheet for a detailed breakdown in available features.