Making the switch from navigating to searching with help from ICC and Regex.

Many companies are still making that transition from shared drives (who has never had a S:\ or H:\drive!?!?!?) to Document Management System (DMS) or full blown Enterprise Content Management (ECM) systems. There are many reasons for making the switch from overloaded hardware to new business demands but a key point in many of these systems is how the end user uses the system.

Lots of shared network drives are a prime example of content chaos with no naming or folder standardization and users left to create their own folders. However, some more well thought out network shares have a semi-structured foldering system with maybe  a base template of a folder structured which is copied and pasted for new projects, claims, matters or applications.

filesystem

 

Whatever the structure or lack of it on a shared file system it is generally a case of users having to browse for the content they are after. What happens if you can’t find that vital document you worked on 3 months ago? You always have the option to search but then you are presented with the dozens if not hundreds of hits the full text search brings back. I think of this type of use case as discovery – users are having to discover what they are looking for rather than being able to pinpoint it straight away. More on this topic here at a previous post: Difference between search and discovery.

With this in mind it is important for our end users to realize that any migration to a new DMS or ECM system demands a different way of working – hopefully a smarter and more efficient way of working. Although, sometimes DMS or ECM system are implemented badly and mimic the folder browsing approach which seems crazy in today’s world with the content explosion. Saying that I am sure there are cases for the old style folder browsing such as case management solutions that have adhoc document collections.

We have established the source system disadvantages, the benefits are new target system will bring and determined that we have a semi-structured foldering system which could be used to place some categorization and property values to our content in the new system. Up steps IBM Content Collector (ICC)!

I am no expert with ICC but I love it’s module design and flexibility it provides for ingesting content from a variety of sources to a repository. You don’t need to be a programming genius to achieve some great results but how do we determine index information based on folder names in document file paths? In short we are looking for patterns in a string and what better way that using Regular Expressions….groan I hear you sigh! I was never a fan of Regular Expressions mainly because it looked like hieroglyphics however after spending sometime on a number of projects and getting into the weeds I have changed my mind and realize how powerful they can be. Saying that I will likely forget everything I have learnt in a couple of months.

Below is a screenshot of how to build Regular Expressions into your ICC Task Route. I haven’t detailed the Regular Expressions used as that is a topic all on its own but will post again on typical expressions and how they can be combined with ICC Lists to provide some powerful lookups.

regex

Explore posts in the same categories: Uncategorized

Tags: , , , , , , , , , , ,

You can comment below, or link to this permanent URL from your own site.

Leave a comment