CASE HISTORY: A COMPLETE GUIDE

Home Success Stories Construction and Civil Engineering Company

Thousands of Construction Geological Survey Data Reports (PDF) Converted to Excel

About The Client

Headquartered in Subiaco, Western Australia, the client is a leading provider of earthworks, road and railway construction, and installation services across domestic and international markets. With a highly skilled workforce and extensive experience, the company consistently delivers projects on time while maintaining a 100% success rate.

About The Project

The client provided a collection of construction geological survey reports in PDF format, totaling over 10,500 pages.

The client provided three types of PDFs, each containing data that was interrelated, requiring careful attention to ensure accurate linkage and consistency across all datasets.

The three PDF formats are as follows:

  • Drilling Record
  • Standard Penetration Test
  • Drilling with Standard Penetration Test

Most of the data was contained in scanned documents, including handwritten numerical tables and comments. The volume of data to be extracted was substantial, encompassing key information such as:

  • Drilling Date
  • Location ID
  • Rig Number
  • Drilling Method
  • Borehole Number
  • Rig Geologist Name
  • Drilling Rig Operator Name
  • Signature
  • SPT Blow Count
  • Sampler Tip Depth
  • Sample Depth Comparison
  • Borehole Log
  • Sampling and Tasting
  • Drilling Tool
  • MWD Parameters
  • Flushing Medium and many more.

This project primarily required precise manual data entry to transfer and organize the extracted information into structured Excel spreadsheets for further analysis and use. The accuracy and consistency of this data were critical for downstream engineering and geotechnical analysis.

About The Project’s Guideline

No specific guidelines were provided by the client regarding the data extraction process. However, the client emphasized that accuracy and correctness must be strictly maintained while extracting data from the PDFs.

About The Challenges We Faced

This project encountered numerous challenges from start to finish, which are summarized below:

  • This project faced numerous challenges from start to finish.
  • The PDFs provided by the client contained inconsistent and low-quality data.
  • Handwritten information made data extraction more difficult.
  • The client suggested using AI to speed up the workflow.
  • Due to poor data quality, AI couldn't produce accurate results.
  • As initially anticipated, most of the work had to be done manually.

As mentioned in the “About Us” section of the project, three types of PDFs were provided, each with distinct handwriting styles. This variation posed one of the greatest challenges for the team.

To streamline data extraction, we initially created an Excel template tailored for a specific PDF format. During the trial run phase, we planned to deliver data to the client using our own template. However, the client requested that the data be provided in their predefined template, with each PDF’s data provided as a separate Excel file and each page’s data placed in a separate sheet. Upon reviewing the client’s template, we realized that entering the data directly into it would be considerably more time-consuming and complex, adding another layer of difficulty to the project.

How DataPlusValue Executed The Project

Here is a detailed step-by-step explanation of how our experts worked on the project to overcome every challenge.

Step 1: Analysis of Handwritten Patterns

  • Our technical team initiated by identifying handwritten numerical patterns, focusing on common confusing digits like 0 and 9.
  • We also carried out an expanded analysis of handwritten alphabetic characters to accurately interpret entries that contained text as well as numbers.

Step 2: Improve Visual Reference Guides

  • The next step was creating word-based reference documents by inserting images of handwritten numbers from PDFs and labeling them with correct values.
  • Our team used a similar approach for alphabetical characters, but due to low visibility in certain handwriting limited its effectiveness.
  • Text that remained unclear was highlighted in red to confirm it with the client before proceeding with the process.

Step 3: Addressing Handwriting Variability

  • When working on the project with the pattern-recognition approach, we found it effective compared to the handwriting approach, which remained consistent across the files.
  • The team had to process the handwriting variations by reassessing and updating the pattern interpretations before the data extraction process.
  • Once the handwriting variations were updated, our experts accurately extracted the data and entered it into the Excel file without compromising on the workflow efficiency.

Step 4: Managing Inconsistent and Repetitive Data

  • For managing the inconsistent and repetitive data, we worked on identifying the data points that frequently occurred, such as the Drilling rig operator’s name, the Rig geologist’s name, and the Rig number.
  • We also checked the documents for inconsistencies in numbering and spelling to maintain accuracy.
  • Lastly, we created a comprehensive Excel reference sheet while initiating the project for double-checking of low-quality or unclear entries.

Step 5: Data Entry in a Customized Template

  • We added additional fields, such as page numbers and PDF names, to our created template to make it easier to transfer the data to the client's template later.
  • Our final step was to enter all the extracted data into a customized template, particularly designed for the client’s project.

No doubt, introducing these additional steps was time-consuming, but it was still fruitful, as it helped improve overall efficiency, accuracy, and organization, leading to a smooth finish.

What Were The Results?

By investing around 5 to 6 months, our team successfully extracted approximately 189,000 data points from PDF documents and organized them in around 2355 Excel sheets with multiple structures

The client achieved a measurable reduction of 60% in infrastructure costs by outsourcing this project with DataPlusValue. Serving as a back-office team for the client, we managed the entire PDF to Excel conversion project with consistency, efficiency, and accuracy. Here is a quick review of how we made it possible.

Our Approach and Execution:

  • Leveraged strong industry experience, meticulous planning, and technical expertise.
  • Implemented multi-level data validation checks, ensuring accuracy throughout the process.
  • Maintaining rigorous quality control standards to ensure reliable and consistent results.
  • Delivered the entire project with complete determination on the obedience, precision, and efficiency to timelines.
  • Delivered regular progress updates with utmost transparent communication with the client at every stage.
  • Validated flexibility by adapting to evolving client requirements while working on the project.

What Was The Client Feedback?

DataPlusValue fruitfully received highly positive feedback from the clients on completion of the project. The client appreciated the attention, consistency, and accuracy of our team as we managed to work on every given detail. We were also acknowledged for our time-to-time updates regarding the progress of the project and proactive communication. The team of DataPlusValue was also acknowledged as a professional and trustworthy extension of their internal team by the client.

DataPlusValue: A One-Stop Solution for Data Processing

At DataPlusValue, we offer comprehensive data entry and data extraction services that support mining and construction companies to produce daily operational reports in various areas, including iron mining, gold mining, copper mining, nickel mining, concrete restoration, general contracting, joint sealants, and other construction-related service firms. Our professional team helps with scalable, timely, and accurate data processing solutions tailored according to the operational needs of these organizations by digitizing their large-volume and complex reports.

In addition to data digitization, we assist clients with data formatting, audit-ready documentation preparation, template standardization, and long-term data management services ensuring the information given by the clients is well-organized, ready for reporting or analysis, and easy to access.

Having computerized PDFs, scanned reports, handwritten documents, or any other format that needs to be converted into a digital format? Benefit from the free trial period offer we help you with by emailing us your sample files through the given contact details. This would surely enable you to evaluate the efficiency, reliability, quality, and accuracy of the services we provide before handling the entire project.

If you are interested in the same service and looking for a company that could offer reliable and timely geological survey report data entry Services, then contact us.

HIGHLIGHTS

Client Location

Client Location
Australia

Industry

Industry
Construction Contracting

Business Model

Business Model
Dedicated Team

Let's Solve Your Problem!

 

Captcha
Verification*

=
Logo WhatsApp Us