Read PDF Stamp text info c#?

Associate
Joined
27 Jan 2005
Posts
1,348
Location
S. Yorks
I have a requirement to read pdf files, certain text contained within them, I created a simple app in C# using the pdfbox library and it duly reads all of the text on the document and outputs it.

Now some of the documents contain a Stamp, I assume this is like a watermark - in my examples that I have they are a table on info about what is contained on the pdf page, so made up of text. The app I have created doesn't return the text contained within the stamp.

How can I read the stamp info, PDFbox examples I can find are showing how to create an overlay but not necessarily how to read the text contained within.

Matt
 
I've created a basic pdf with a rectangle on it, a stamp and a text box, ideally I need to identify the different objects on the page (can this be done?), strip the text from the pdf file then loop through the various objects and strip any text contained within them.

I'll send you a copy if you want to take a look.


Matt
 
Hi Pho,

Thanks yes, I looked at that but its a paid for licence I was looking for a free library to try a few things out.

We have PDF's and I can extract the text from these quite easily with PDFBox, however some of the pdf files have other objects on them like stamps/watermarks/overlays/textbox and I can't seem to access these and read the text from them and I don't know why not - that's the gist of the question.

Thanks for the links though.

Matt
 
Pho,

Thanks for that, I'll take another look.

Dup,

Opening the PDF I can rightclick the textbox and it says it is a textbox, but I cannot read the text from it with the reader. Its not a acroform either as that shows as null.


Matt
 
Back
Top Bottom