Innovations in Web data delivery and Advanced Pattern Recognition to Ensure Consistency of Large-Scale Webdata Extractions

Connotate, the enterprise-grade datapipe for Web-sourced information, or Webdata, announced the award of its sixth patent, U.S. Patent No. 8,666,913.  Connotate received this patent for its innovative use of advanced pattern recognition techniques that automatically identifies inconsistencies in data formats during large-scale Web data extractions.

Connotate’s newest patent focuses on a critical aspect of high-scale Web data delivery: automating the quality control process ensuring that extracted data is properly structured and consistently formatted. The approach uses advanced machine learning algorithms and pattern detection to do so. Its algorithms first perform background monitoring of the flow of extracted data to “learn” the appropriate formats for data – for instance, if a date consistently appears as “mm/dd/yyyy,” the system notes this formatting.  The algorithms then search for exceptions in subsequent data flows, either automatically correcting the issue or alerting human operators to more extensive complications

The validity check can be applied to a broad number of data formats including dates, names, addresses, phone numbers, and part numbers – as the platform learns new formats each time it is exposed to something inherently new to it. The patented technique extends to anomalies in any large-scale data flow beyond Webdata, so it has broad potential application to help enterprises tame and structure their Big Data flows from any source.

This is the latest addition to Connotate’s Web data extraction technology patent portfolio.  Its core technology is based on visual abstraction techniques that enable users to quickly identify and automate the extraction of data from Web pages through a point-and-click interface.  The platform handles millions of extractions daily, delivering terabytes of clean Webdata to fuel analytics applications and large-scale information aggregation for enterprise and government.

About:  Connotate puts the power of Web data monitoring and collection into the hands of the business user. Connotate is a Web scraping technology that delivers the scalability, reliability and resiliency necessary to derive strategic value from dynamic, Web sources.

 Source: Connotate