(some projects that I’ve managed to explain in a blog post)
- DParser (unstructured text parser)
The DParser – A simple, powerful, versatile, declarative unstructured text parser.
What is it? The DParser is a declarative instruction-based unstructured text parser.
The parsing performed by DParser is driven by a series of xml instructions which each represent a particular piece of information you want scraped, the type of data it is, and how to isolate it within the source text.
Through the use of the DParser any snippet of text from any source text may be extracted into any user-defined type within your application logic at runtime.
The DProcessor Framework allows for arbitrary code, encapsulated as discrete ‘modules’ to be run on-demand via service calls, on a specified schedule, or in response to the result of an SQL query using the SQL Server Service Broker. Here is an overview of the run-time architecture with some demo modules pictured running within one of the three launch patterns:
How does it work? Internally, DProcessor Framework can facilitate custom code doing whatever it likes, and also provides some easy-to-use native crawling/parsing capabilities using the DProcessor interface and DParser Text Parser. There is even a built-in facility for running macros, if you are using the iMacros scripting engine. Here is a diagram which describes the internal architecture of the framework:
What does it mean? The Dprocessor Framework allows developers to encapsulate just about any task, procedure, routines, or series thereof, and have them run autonomously under an autonomous service which knows what to do with your module and when to do it. This is accomplished in a consistent, scalable manner, which provides built-in parallel processing capabilities across as many machines as the service is installed on. Furthermore, a command interface exposed as a tcp service layer, allows for monitoring, and starting/stopping your code routines from a centralized interface. Here’s a look at the main components which make DProcessor Framework a simple yet powerful framework upon which all your code can run:
From the diagram above:
- DProcessor Framework Service – This runs your code and exposes a service call layer which allows for your code to be started/stopped and monitored from a central interface.
- Module Configurations – These tell the framework which modules to run, as well as when/how to run them. When the framework service starts, it looks within its local application configuration file to see where it should be getting module configurations from. This can be a file, a database etc.
- Module DLL’s – These are the .NET assemblies that encapsulate your code/procedure/task etc. Pretty much any code can be wrapped as a module, and virtually anything can be accomplished by coding a module. Module binaries are stored within the ‘modules’ directory within the service installation folder.
The DMongo interface to MongoDB – An abstract interface that makes MongoDB super-easy to use as a datastore in all your projects!
What is it? The DMongo client library is a .NET interface to MongoDB.
DMongo provides a simple interface through which user-defined types can be conveniently stored, retrieved and queried against, in MongoDB. By using runtime type inspection, and storing properties and their values as flat dictionaries, DMongo allows for a generic and extensible, schema-less data-type repository which remains reliably serializable to/from a MongoDB database regardless of the shape and size of the types payload, and how the types may change through time. DMongo also abstracts away from the underlying implementation details involved in managing MongoDB collections and their indices. All that is required is a valid set of credentials and an instance of a type whose properties you want stored for later retrieval/querying/updating in a consistent, predictable and generic interface. By inserting itself between your application logic and direct access to the underlying native C# driver, the DMongo client interface supplants arbitrary custom/default serialization schemes with the use of runtime type inspection and uniform abstract representation, helping to avoid common pitfalls associated with changes to a types properties/layout. As well, use of the DMongo interface removes entirely, a great deal of the leg-work involved in using the native C# MongoDB driver directly.
- Abstract File Keeper
This involved the need to store files, internet bookmarks, scraps of text copied and pasted from documents or the web etc. Essentially whatever it was a researcher would want to keep track of, refer to etc. It was a backing storage system for a novel system of record. This system addressed the fact that researchers were always ending up with a mess of folders on their machines, duplicate copies of documents, an increasingly cumbersome file system, and an increasingly confusing system of record which invariably would have to be trashed.
As a prelude to developing this system, and an interface to expose its use, I visited with some researchers at the Universite Laval in Ste-Foy, Quebec who were working on various information management problem-spaces. There I spent some time observing how they retrieved, stored, accessed, and utilized a myriad of documents, notes, and scraps of the world wide web. Essentially what amounted to a rapidly evolving system of record which was failing to effectively meet the demands placed upon it. Directories of documents etc. were consistently becoming a hulking mass of goo that had to purged from the active area of interest, in order to have clarity.
The associative tag-based system of record I devised and created addressed the major pain-points of the methodologies in use. A tagging system that used the file type as the basis from which to begin, under one central directory, which itself was delineated by windows user account. Shortcuts to documents and directories were used, so that EVEN when you copied the file system onto an external drive, or otherwise moved it to another machine, simple intuitive clicking through the folders themselves, independent of the provided GUI was all that was needed to quickly get to what you were looking for without any “searching” required.
Here’s a diagram which explains the storage system itself: