Reply to comment

Semantic XBRL Data Search Using SPARQL

Semantic XBRL Data Search Using SPARQL -from HitachiDataInteractive.com Written by Ashu Bhatnagar

We all use Google for Web searches on a daily basis and admire the simplicity of its front-end user interface. It’s nice, clean, fast, and simple. Behind this simplicity lie sophisticated index databases and advanced search technologies, but we as users don’t need to know or understand these. All we need to know are smart keywords that help direct our searches from hundreds of billions of marked-up HTML pages scattered across the global Internet.

When we try to search using regular SQL database search technologies, though, we run into difficulties. Why? Because most of this web content is in distributed HTML flat files and isn’t organized in any centralized database with well defined data structures and schema. It’s like a world full of roads with no roadmaps. Go discover!

Search engines like Google, Ask, and others find the content that matches with our queries by building and employing centralized databases that contain metadata, where every keyword acts as a tag and has fast and efficient links to corresponding websites. In other words, a search engine acts like a very knowledgeable guide for us, responding to our queries with found/not found answers based on the Internet roads it has access to and has crawled before. Read more at Semantic XBRL Data Search Using SPARQL

Why not use such a powerful search front-end to query financial research data? During my experience working with both sell-side and buy-side research analysts, there has been a long standing request to build such a tool, but until recently, the short answer to this request has been “No!”

No, because it’s technically too difficult or it’s too expensive.

No, because Google deals with text and not data, which has both context and meaning. Data is far more challenging to search, because even when it’s on the Web, it is marked up with HTML as text, not as data, thereby losing its context for meaningful search.

No, because there are no generally accepted standard financial dictionaries, or taxonomies, that define terms such as revenue, sales, or net income as synonyms.

Until recently this list of No’s has been long. The good news is that the list is now shrinking quickly with the increasing adoption of XBRL and EDGAR standard taxonomies and the release of several XBRL tools.

All that is needed to accomplish powerful search of financial research data is to subscribe to the SEC’s XBRL filings as free RSS feeds, extract XBRL data into our own relational or Google-like index databases, and use SQL to find answers to our queries. As an alternative, we could subscribe to third-party data services firms like Bloomberg, Thomson Reuters, Factset and others that would add XBRL data to their current aggregate data and continue to offer this as a service.

The news gets even better when we add SPARQL, a W3C specified query language for RDF, to XBRL and Linked Data.

Jim Rapoza, Chief Technology Analyst of eWeek, explains:

Called SPARQL (pronounced "sparkle"), this standard brings about a standardized SQL-like query language for the Semantic Web. And, like most Semantic Web standards, it is heavily based on RDF (Resource Description Framework), although it also makes use of many Web services standards, such as WSDL (Web Services Description Language).

SPARQL essentially consists of a standard query language, a data access protocol and a data model (which is basically RDF).

Some people out there are probably thinking, So what? Sounds like just another search tool—big whoop. But there’s a big difference between blindly searching the entire Web and querying actual data models.

The ability of database queries to pull data from giant databases is pretty much the basis of a large number of enterprise applications. No one argues about the value of being able to write a query in an application that can pull relevant customer and product data.

Now, imagine writing a similarly small application that does the same thing—only with data stored across the entire World Wide Web.

That would include all the companies who not only file in XBRL but also, in conformance to SEC requirements, will be posting XBRL data on their own company websites.

In essence, with SPARQL, we can choose to build centralized databases to query XBRL data, but we don’t have to. We simply can point our queries to so-called SPARQL endpoints that — unlike traditional database requests that must be under one administrative control — can span the Web over thousands of company websites with XBRL data and obtain results as if they came from one centralized database. Imagine the cost savings in not having to build and maintain a huge and growing centralized database.

Applications for publishing XBRL as Linked Open Data are limited at this time, but they are emerging. As one example, Roberto García and Rosa Gil describe their work undertaken at a Research Group at Universitat de Lleida, Spain, which extracted 1.34 million triples from 612 XBRL filings. (Triples are semantic data elements in RDF format.) The process of extraction is machine automated and results in transforming XBRL data into Semantic Web formatted RDF data.

In addition, sufficient examples in the current Web exist to give us insight into how the user experience might look when Semantic XBRL applications go into production use. Next time you search for the best flight for your air travel on sites such as Orbitz, Kayak, or FareCompare, take a pause and observe that the flight schedules, prices and airline details are being pulled not from any one centralized database but from a variety of airline databases, in real time, to match your exact itinerary requirements, thanks to some very specialized and complex technologies.

In summary, SPARQL makes Semantic XBRL searches possible on-demand across a distributed web space while simplifying front-end design, and keeping the complexity of technology hidden and out of sight from end users.

A Google-like experience of searchable financial research data is coming. The future looks bright.

 

Ashu Bhatnagar is CEO of Good Morning Research, a Softpark company that specializes in building Semantic XBRL technology. The GoodMorningResearch.com machine automates XBRL tagging of Excel data in RDF format with one-click Save As XBRL functionality. Mr. Bhatnagar also moderates the Semantic XBRL group on LinkedIn.

[Hitachi Data Interactive]

· 

Reply

  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.

XBRLSpy for iPhone

About XBRLSpy

Diane Mueller is the founder of XBRLSpy Research Inc. She is an XBRL Evangelist, and a XBRL Implementation Strategist. Currently serves on the XBRL International Steering Committee and Best Practices Board, and chairs the Technical Working Group on Rendering responsible for the Inline XBRL Specification. She is a frequent commentator and lecturer on Financial Compliance, XML Standards and Semantic Web technologies. Read more..

Contact Us

Follow Us on


SEC Chairman Christopher Cox
Chairman Cox discusses Interactive Data
Windows Media Player
QuickTime

Recommended Reading

XBRL: A Case Study In Complexity
by Jon Udell, Infoworld

According to Udell, XBRL is a noble attempt to help expose financial dealings via XML that asks too much of developers. Read more...

Metadata, Semantics and All That
by Tim Bray, tbray.org

The value proposition for XBRL is a no-brainer: cost reduction. The financial industry depends totally on consuming accurate financial information. Read more...