Data Analyst w/ programming exp (Hybrid Schedule)

20 May 2024

Vacancy expired!

SQL person with some programming skills would be nice (description written from current person doing the job)

Job Duties/Description
  • Create and maintain programs to normalize index data so that it may be analyzed in a consistent way
  • Create and maintain programs to read split documents and parse data for matching
  • Match up all documents & indexes with identical book/volume/page index criteria
    • Documents that match 1:1 to an index are linked together automatically
    • Documents that are not matched to an index are manually reviewed:
  • If found in the database under a mis-keyed index, documents are manually assigned to said index
  • If document was split incorrectly, it is corrected and reanalyzed
  • If not found in the database, documents will be set aside and provided to client or software vendor as “Documents Not Loaded”
    • Indexes that do not receive a matched document image are manually reviewed:
  • If the document image is found for the unmatched index, the document is manually assigned to said index
  • If document was split incorrectly, it is corrected and reanalyzed
  • If the document image is not found, indexes are set aside and provided to client or software vendor as “Indexes Not Matched”
    • After all matches are made, the duplicate list is processed
  • If there are duplicate document image and indexes of book/volume/page, each document is reviewed and assigned to the correct corresponding index
  • If there are duplicate documents and a single index, the correct document is assigned to the index, and the remaining document set aside as “Documents not Loaded”
  • If there are duplicate indexes and a single document, and the indexes are duplicates of each other, the document is manually assigned to all matching indexes of said document
  • If there are duplicate indexes and a single document, yet the index data is different, the document is assigned to the correct index, and the remaining index set aside as “Indexes Not Matched”
    • Once all verification has been completed and there are no more unaddressed errors, document images are copied to a separate folder for delivery.
    • Spreadsheets are created for matched data, images without indexing data, and indexing data without images.
    • Create and maintain programs for specific projects to complete analysis more accurately and efficiently than could be done by hand. Including, but not limited to, comparing data to available images, comparing two sets of data, and identifying gaps in data.
    • Create and maintain programs to merge images into a single document and perform OCR (Optical Character Recognition) using Tesseract.
  • Skills:
    • Programming: Mostly Python, some C# (these are for scripting and programming that I do. Other languages could be used)
  • Parsing and building XML files
  • File/Image conversion
  • PDF/A conversion
  • Optical Character Recognition (OCR)
  • Managing and cleaning of data
  • Scripting/Automation
    • Database: Microsoft SQL Server, PostgreSQL
  • Reporting
Managing of data

  • ID: #41236120
  • State: Texas Dallas / fort worth 75235 Dallas / fort worth USA
  • City: Dallas / fort worth
  • Salary: $55 - $60
  • Job type: Contract
  • Showed: 2022-05-20
  • Deadline: 2022-07-18
  • Category: Et cetera