Friday, 3 April 2015

Customer Service

What has happened to customer service from the large genealogical companies? Is there a decline in the service from online companies in general? Are customers the only causality here or is it to the detriment of the companies too?

I have some history in this particular field and so maybe my expectations are higher than most, or maybe not. In the 1990s I worked for a software company that strived to achieve the then-new ISO 9000 certification — and succeeded — and I later worked for a company specialising in business intelligence software for large contact centres.

My lack of satisfaction has mainly come from one particular company; one that I have subscribed to since they first appeared. However, I don’t believe that they are the only culprit and so to save them from embarrassment I’ll simply refer to them through the fictitious name of Your-Yore. In fact, much of what I’m about to say could be levelled at many companies outside of genealogy, and especially those with a Web presence.

ISO 9000 is a family of international standards, first published in 1987, concerned with quality management. It identifies a number of factors which have to be regularly audited and measured for compliance and effectiveness. Part of this involves the product itself: whether it meets customer needs, whether its evolution continues to address such needs, monitoring the reliability and efficiency of the product, fixing product problems, etc. Another part refers to the customer and requires that systems be in place for communicating with customers about product information, enquiries, contracts, orders, feedback, and complaints. Customer Service is more-specifically addressed by the ISO 10000 family of standards, and also by The International Standard for Service Excellence (TISSE).

In a genealogical context, the product will involve the Web site (or other software component), including its user interface (UI) and its search engine, but also its underlying data. Reporting software issues, such as features not working as described, or not working at all, would be relevant to many companies, but genealogical data may also be the subject of discontent. Data may be missing, mis-indexed, mis-transcribed, or of unknown or dubious origin not identified by the company. I have reported issues to Your-Yore relevant to all of these categories, and including suggestions for enhancement and other advances, but what happened to them?

Let’s just take a moment to examine the reasons why a customer may contact such a company:

  • Software issue (as described above).
  • Data issue (as described above).
  • Account, membership, or subscription issue.
  • Feedback, suggestion, or complaint.
  • Sales.
  • Technical support.

A contact centre, also known as a call centre, will have agents dealing with all of these, and each category must be directed to an appropriate person or department. Despite the availability of text-chat interfaces, social media, and self-service menus, the predominant mechanisms are still email (possibly with a Web form to initiate it) and the telephone. But the reality is neither that simple nor that rigid: a call may transition from one category to another, such as a technical question eventually becoming a software issue; the contact mechanism may change part way through a call, such as one initiated by email later involving a telephone conversation; or a given customer may have several independent calls active. The upshot of this is that a company needs an Issue Tracking System (ITS) in order to separate the different calls, and to enable their progress to be followed individually.

This is where is I have to bite my tongue as companies such as Your-Yore have no such system, and do not offer the customer any sort of ticket number by which they can follow a call to resolution, or to escalate it if necessary. This makes me angry, not just because the concept is part of “call management 101” but because it hinders the company as well as the customer. There is valuable information to be gleaned from those call logs, including the reliability of the product and the satisfaction of the customers. In addition to large-scale metrics, though, it could allow the company to know its customers, not just on an individual level — their background, experience level, etc. — but collectively. There is certainly a proportion of genealogists who are very fussy about the way they research, and how they write-up or otherwise store their results. This may once have been the domain of the professionals but that group is now wider; how much wider is something that the company can only determine through analysis of such data, or business intelligence. It’s probably no coincidence that Your-Yore recently embarked on a major change to its product that woefully underestimated the nature of its customers, and which left this particular group of customers extremely dissatisfied.

Returning to those software and data issues that I had previously reported to Your-Yore, I have no record of them, other than my original email — when it was initiated by one. I have no way of knowing whether the associated problems were fixed or otherwise addressed, and no way of chasing them up. This company recently introduced a forum where their customers could provide feedback and suggestions — a nice idea but the implementation was deficient. Having lost all confidence in submission by email, I tried to post some of the issues on their new forum as feedback. The advantage of the forum was that the postings were public, but the disadvantage was that there was virtually no acknowledgement or active participation by the company. Worse than that, some partial automation of the system managed to lose two of my postings, and when I say ‘lose’ I mean totally lost. Although I could provide the original URLs, there was apparently no way of recovering them, and no way of determining why they were lost or discarded.

Contact centres typically direct technical questions to a help desk or technical support desk. These mostly have a multi-tier organisation so that tier 1, or the first line support, gathers the customer’s description and supporting information in an attempt to match against known problems. Tiers 2 and 3 would each be more technically knowledgeable and would receive the more difficult calls from their preceding tier. Well, that’s the theory, and the practice as I recall it, but this type of support desk is gradually being replaced by online ‘user communities’, which basically means that the buck is being passed. Even when the company participates in those communities, there is no obligation to find you a solution. There is nothing wrong with user communities per se, but they are not the same as a support desk. Particularly with the large software companies, such as Microsoft, the older style technical support tends to be something you pay for over and above the cost of the product itself, and that effectively means that only business customers can take advantage of it; the smaller customers are left to find solutions by themselves. If you’re a paying customer, and the problem is really a product fault not of your making, then this can be truly vexing since the information would be to their advantage.

One non-genealogical software company that I’ve had several painful dealings with was Skype. Unlike Your-Yore, they allocated ticket numbers and appeared to follow more professional practices, except that email messages were picked up by a different agent on each stage of a given thread. If those agents had taken the trouble to read the rest of the thread then my experiences wouldn’t have been as bad as they were, but it was clear that they focused on particular keywords, either in the title or in the first paragraph, and then pasted some stock paragraph of text into their response that was all but useless. The explanation for this is straightforward as that issue is quite common. Large call centres are effectively modern-day sweat shops, and the lives of the agents are governed by metrics and measurements. When their performance is assessed on how quickly they can turn a call around, as opposed to them finding a resolution or to the level of customer satisfaction, then such short-cuts are used extensively. However, customer support is not a game of tennis, and responding quickly rather than effectively (and courteously) does neither the company nor the customer any good. By contrast, my experiences with Hover, with whom I have multiple email and domain registrations, were exemplary. The differences in how I would relate my experiences of those two companies to friends would be marked, and would persist well beyond the actual events.

If you’re thinking that certification is expensive then you’d be correct, and the software company I mentioned that achieved ISO 9000 certification in the 1990s spent considerable time, effort, and money to get there. This is one of the common criticisms of those standards but you can read a different significance into that observation. You may consider that modern companies are leaner and want to increase their profit margin by reducing their overheads, but that lack of investment also disguises a fundamental disrespect for the customer. Treating customers as a commodity means that there is no appreciation of what they can do for the company in terms constructive feedback, loyalty, and word-of-mouth.

Making available billions of “records” to hundreds of thousands of customers is a simplistic model but it works, right? The customers should be grateful and they can take it from there, right? Yeah, right!

Tuesday, 10 March 2015

Bullying and Elitism

There have been a few recent articles and posts about bullying and elitism in genealogy that are worth reading. Do these issues exist in genealogy? Yes, of course they do! There are more than enough people participating in this field that such a claim can be made on probabilistic grounds alone. We are human, and human interaction often suffers as a result of personal goals or dislikes.

While neither of these issues can be justified, it's worth considering the mindset behind them. What would cause one genealogist to personally berate another, as opposed to criticising their data or methods, and why would some genealogists look down on others? Occasionally, the issue is less blatant, and may take the form of the cold shoulder. I'm personally aware of people who deliberately ignore others because of what they say, or how they say it, rather than indulging in discussion on their viewpoint. 

A large part of the explanation can be linked back to the schism in genealogy: the one that we all know about but don’t like to talk of. There are researchers who want to be very rigorous, produce journal-quality write-ups, and would like their work to be appreciated by their peers. There are other researchers who indulge out of personal interest, and who mainly wish to share data with their family. While the reality may be slightly more of a greyscale, let’s call these extremes the academic and the hobbyist. Note that these must be viewed as different goals rather than different levels of expertise.

I’m including professionals in my academic category because of their approach to quality and rigour. I know that some people fear those with BCG certification, and hence letters after their name, even when they don’t mean to be imposing or overbearing. Consider, though, that such people have worked to achieve that certification — just as anyone studying for a qualification will have worked — and they should be admired for it. It indicates a commitment to acquiring personal skills and knowledge — not divinity. Many of us would like to think that we could attain that certification, if we had the time and the money, but would it be relevant to the hobbyist sharing with their family? Even the hobbyist is in a position to recognise the advantages of consistency and quality in genealogical research, and there are many books to help teach themselves if they don’t feel that a qualification is appropriate, such as those of Elizabeth Shown Mills and Thomas W. Jones.

One of the most common points of criticism between these genealogical factions is sources. Public family trees most often have no source citations, and the academics allegedly don’t like that because it means the associated conclusions are unsubstantiated and of little use to them. Well, it runs much deeper than this. The perception that public trees are inaccurate, unsubstantiated, and of poor quality, means that this side of genealogy is frowned upon by academics generally, not just genealogical ones. It will probably never be accepted as a form of historical research while this can be demonstrated, but do source citations help?

As I’ve said in a previous post, source citations are better than nothing at all, but they don’t guarantee accuracy. Providing a direct link to a census page or BMD record does not make the data correct, and it may even be insufficient without some narrative explanation in complex cases. In effect, we should not confuse good-intent with accuracy. Providing citations in a public tree shows good intent, and it will certainly help the associated author if-and-when they find a problem in their own data. Education is therefore a crucial factor for newbies who are just starting out.

Unfortunately, public trees are plagued by certain other issues. Because of the nature of the tools that are made available, it is too tempting to copy-and-paste data from other trees into your own. This is bad because there is an assumption that the other tree is accurate, it provides no attribution to the original author (if that they were), and there is no association allowing the source of conclusions to be followed. The overall result is that inaccuracies propagate like mad until you find multiple trees that all show the same error, and there’s then no indication of where it all started. I know through experience that attempting to tell another researcher that their data is incorrect because so-and-so is bound to elicit a variety of responses. Maybe they do see the logic and thank you for it. Maybe they get angry because it’s their family, or because everyone else agrees with (read as “copied”) the data. Quite often, though, you get no response at all, and this maybe because the author has become a leaf on their own tree, or because they’ve abandoned that tree after toying with it for a short while. We all know the end result of this, and it’s hardly surprising that genealogists get very frustrated with the current state of things. I admit to being one of the frustrated, and I have voiced strong opinions about the state of public trees, but it would be pointless and plain wrong to criticise specific individuals. I understand that the reasons are neither simple nor imbued with malicious intent.

There are other serious frustrations in genealogy at the moment, in addition to those of accuracy and sources citations. Some people are just fed-up with people stealing their data. This is an emotive word but it’s used regularly in this context. There are legal aspects to this issue, too, such as mere “facts” (i.e. discrete data) not being subject to any type of copyright or ownership. If you put names, dates, places, and so on, into your public tree then there’s nothing illegal about someone copying them, and bloggers like James Tanner regularly write about this. Where there is a strong case for some sort of authorial protection is when the tree is replaced with a more academic or creative work, such as when it contains complex proof arguments or narrative content. It’s rare to find this sort of content in public genealogical data, partly because there’s no structure to accommodate it, and partly because there’s no authorial protection. By that, I mean no mechanism by which attribution is automatic and where the data is linked rather than duplicated.

I used to have an online public tree that I published as “cousin bait”, but it was recently taken down because it was no longer attracting any contacts. I believe this is because people are now more concerned with data that has source citations, and so are confusing that aspect with some guarantee of accuracy. Anyone who reads my blog will know that I also post articles that summarise particular lines of genealogical research, or even micro-history, including my reasoning and my findings. I try to do this in a way that is easy and enjoyable to read, rather than as professional-style research reports, because I also want my extended family to enjoy them. This probably works better than a bare online tree as I can include much more detail and structure, including proper reference-note citations (not just electronic bookmarks). It also means that I can publish information that was not available in public records, with implicit attribution to me on the date of publication, and present justified conclusions that may differ from those of others.

These two approaches (tree vs. blog) are poles apart, though, but it doesn’t have to be like that. It is conceivable that a structured narrative contribution could be uploaded to a public site, and automatically associated with the relevant leaves on a public tree by virtue of mark-up that identifies the individuals and their relationships. From an end-user’s perspective, the appearance would be that of a public tree that links to many private narrative contributions, and which would therefore support authorial ownership and automatic attribution. The tree itself could be dynamically formed from the uploaded contributions (explained a little better in What to Share, and How — Part II), and so could accommodate differences of opinion without getting into edit wars. Of course, an essential ingredient would be some sort of personal preferences about whose contributions to include or hide in your private view, and maybe the ubiquitous ‘Like’ button, but a fundamental advantage is that all private contributions would be there and there would be no need to duplicate any of them.

In this article, I have mentioned education as being a factor in reducing the genealogical schism, and so reducing the temptation to bully or become snobs, but it cannot all be blamed on the hobbyist’s education. I have also mentioned other frustrations that affect both hobbyist and academic alike, and the main cause of those is probably the scope of the software models used for collaboration. The answer is not in single-truth unified trees, and it’s not in user-owned trees either. The software industry really needs to take a different tack on collaboration and save genealogy from implosion. I am not using that word simply to be sensationalist; I genuinely believe that a core part of genealogy is a demonstrable disaster. I would like to suggest that FHISO be involved in discussions of collaborative models as the inability to share data accurately and fully is another one of those serious frustrations, and also  because they are fast becoming aware that we need considerably more than the mere sharing of conclusions; whether source citations are present or not. Letting things slide further by doing nothing at all is compounding the damage already done.

Thursday, 19 February 2015

Source Mining

It’s time to look at how we work with our sources, and the impact that this has on citations. I expect many people to say that we all work differently, but do we? If we fall into a small number of distinct cases then a computer representation of our research, as opposed to merely our conclusions, is an achievable goal.

Whether we like it or not, and irrespective of how we conduct our research, there are different scopes within genealogy. Some researchers are content with establishing their lineage or pedigree; some would like to look at the history of their family; some have a much more general interest in history, including the backdrop to their family’s lives, and the micro-history of places, groups, and other subjects.

It’s difficult to break these into hard categories as it is really a greyscale dependent upon our personal goals and interests. However, it is much easier to categorise the fruits of our research. The consensus seems to be that this is either conclusions or evidence-and-conclusions, with citation of sources being the differentiating factor, but this is certainly an oversimplification, and maybe even a misuse of those same terms.

Family Trees

Let’s start by analysing the most simplistic of scope: that of a plain family tree or pedigree. This would consist of an assemblage of so-called “facts”: the names, dates, and places corresponding to the family’s vital events, and their lineage-based relationships. Without any sources then these “facts” are merely unsubstantiated claims, but what would source citations add to them? As I recently commented on one of James Tanner’s posts (Why not a ranking and review system for online family tree databases), it doesn’t necessarily make the data any more accurate. I have seen many online trees that cite census entries, or vital events, and yet are entirely wrong; often with clearly impossible implications. Conversely, the absence of source citations may mean that the tree was posted as "cousin bait" rather than being a complete genealogy. At best, we might deduce that the inclusion of citations means (a) that the data wasn’t simply copied from another tree, and (b) that some effort was made to include that source information. However, the ease with which online trees can add electronic citations — more accurately described as electronic bookmarks (see Citations for Online Trees) — weakens that latter deduction. Also, those electronic citations are usually constrained to online data hosted by the same provider, and so would not be a general mechanism.

A deeper issue is that these family-tree citations — whether electronic or in traditional reference-note form — only work because the underlying data is a mere assemblage of “facts”. A simple list of sources might constitute a proof summary, but that assumes that the evidence from those sources is direct and non-conflicting for each claim. Dealing with the more complex cases is often referred to as Inferential Genealogy, but the representation of these cases, such as my establishing the parentage of Sarah Hunt in the latter part of My Ancestor Changed Their Surname, cannot suffice with a plain citation, or even with a group of plain citations. If there isn’t a direct relationship between a “fact” and its source then you need a proof argument, and that may require a little more than just narrative. Although you would write such a proof argument using narrative, it may need to make correlated references to multiple subjects, such as people, and to multiple sources of information. If the online tree allowed you to upload this narrative as plain text then it would have to be associated with a specific person, or family, and the essential structure and relevance to other subjects would be lost as a result.

On the surface of it, this appears to be saying that it’s not enough to say where your information came from, and that you must also indicate how it relates to your claims. The issue is more subtle, though, for any computer representation — including online trees — since the structure and position of that proof argument is crucially important. Returning to the case of Sarah Hunt’s parentage, an associated proof argument would be as relevant to both of her parents as to herself, and it may include non-familial persons in the general case, so where do you attach it? Also, simply identifying a Frances Hunt as her mother because such-and-such wouldn’t be enough if there were multiple people with that name in the same tree.

Ideally, we need something in between our conclusions and the underlying sources; something that not only provides a link but a structured pathway explaining how and why. This is even an issue for trees that want to cite sources that do not quite agree on someone’s date of birth. Obviously a person should not have multiple birth events recorded, but it must be possible to trace any selected date, or date-range, back to some correlation of those source differences. As explained in Hierarchical Sources, when fellow researchers examine your data, it is the combination of your proof argument and its sources that they should be interested in, rather than just your subjective conclusions. They would want the option to form different or modified conclusions, and so that full story is a fundamental issue for any type of data sharing.

Historical Research

Moving away from the simplistic scope of a family tree, and along that greyscale to the more historical pursuits, reveals something quite profound: a change of emphasis and a different approach. Since such researchers are no longer looking for discrete “facts” to add to their existing tree then they’re much more interested in anything and everything from a given source. This change of emphasis results in the source being the main focus — not the tree — and so the citation of that source is formed much earlier, and is rarely an afterthought. Having selected a relevant source, such as a diary, will, military record, letters, or an old book, then there’s typically an assimilation process of deconstruction and interpretation — which is what I refer to by the title of this article: source mining.

If I were to describe this source-mining process as locating the subject references[1] in the source, identifying their documented properties (e.g. person’s age, person’s occupation, place type), identifying their documented relationships (including person-to-person, person-to-place, place-to-place, etc), and then incorporating that information into your main historical data — a process that would require correlation with other analysed sources, and resolution of conflicts or other differences — then how many people would identify with that approach? Or, putting it another way, how many people’s approach would be substantially different?

My contention is that this general approach is more common during historical research. The converse is where you are asking a specific question, or making a specific claim, and might be described as a goal-directed approach. In the aforementioned case of family trees then such goals would include finding data for a vital event, identifying a marriage partner, or finding offspring of a couple; the tree effectively sets those goals. Rather than suggesting that a goal-directed approach doesn’t exist, I’m suggesting that it depends on the scope of the research, and that it becomes less common as the research scope broadens. I believe historical researchers may still have specific interests, such as a particular person, family, village, or event, but would be more reliant on the serendipity of each source than answering specific questions.

If this contention is true then it has implications for the digital representation of our research, and for the support of any standard of research. With no animus intended, let me select GEDCOM-X as an example. This data model made an admirable attempt at supporting a research process, but this was primarily a goal-directed process. The page at GEDCOM X and the Genealogical Research Process describes the first phase as: “Question Asking: The research process begins with a focused research question”. Irrespective of whether the model can find a way of representing the source-mining approach, its initial design was based on a more restricted concept of research.

Am I suggesting that the Genealogical Proof Standard (GPS) is not applicable to source mining? Well, that is a really good question. It is true that discussions of the GPS nearly always present it in the context of goal-directed research, such as establishing the truth or falsity of some claim, but its core principles would still apply to the incorporation of the mined data into your other data. Assessment of information based on the nature and provenance of its source, resolution of conflicts, etc., would be just as relevant.


I am currently working towards a representation of mined data that I will hope will provide a crucially missing piece in the STEMMA data model. Back in STEMMA V3.0, I introduced the concept of a References element in which the details of the subject references were assembled into prototype subject entities describing persons, places, groups, etc., and their relationships to each other, using the information from a given source. Since this was dealing with the documented properties and relationships, and it allowed me to analyse them and to correlate them with other sources, then it was a good starting point for source mining — except that it was in entirely the wrong place! It was part of the Event entity, but that embraced conclusions and so was too high on the structured pathway mentioned above. Source mining is largely a bottom-up approach, and the References element was designed to facilitate the digestion and extraction of source information in a manner from which inferences and conclusions could be built. For instance a documented name may not have been someone’s registered name, a documented age may have been estimated, a relationship of “cousin” could have meant a lot of different things, and to say that one place was “within” another wouldn’t necessary identify either of them — some analysis and correlation is required. From this perspective, those prototype entities were fairly similar to Personae, except that I had generalised the concept to include other subject types, and I had endeavoured to keep their shared context. A strong criticism I have of the accepted persona concept is that it extracts names, and other details, from different sources and treats them all equally, and in isolation from their original context, including the background context of the source information, the nature of the source itself, and the documented relationships between subjects (not just persons) in the same source.

This work is still in its early stages, but it will involve moving those Reference elements into some new entity that bridges between my sources and my conclusions. Part of the problem is that information must be associated with its relevant context, which generally equates to a where, when, and by-whom (or as much of it as can be deduced), but any single source may have multiple contexts. An obvious case would be a diary, but other cases may involve information making reference to prior events. You then have the context of the body information and the context of the reported or recollected events within that body.

The following schematic diagram illustrates the current direction of this work; noting that it’s still subject to revision. The source material would describe local material (where you may have an original or an image copy) or remote material (such as a cited work or document), or both when they’re related. The new entity would identify the different contexts within the source and assemble prototype subject entities, together with their documented properties and relationships. I say documented because these would not be conclusions at this point. Hence, if the relationship between a particular person reference and a place reference was that of “present at” then I wouldn’t assume that it was their residence. Those subject references would be connected to appropriate points in any transcriptions of the material, thus providing a connection all the way from a conclusion entity (or associated datum), through the correlation between different sources, through the analysis of the different contexts within a given source, and finally to an actual textual reference.

A source sentence that I’d used as a simplistic example on a FHISO mailing list at Entity Relationships went as follows:

"In 1963, John Smith, of 10 Front St, brother of Simon Smith of Woodstown, married Ann Jones".

The obvious context is the year 1963. It has three person references (John Smith, Simon Smith, and Ann Jones) and two place references (10 Front St and Woodstown); none of which have been specifically identified at this stage. There are relationships indicated between the person references (brother-of, and married) but also relationships between the persons and places (expressed here as “of”).

What this representation does not deal with is the concept of multi-tier personae, and the equivalent for non-person subject types. They would be relevant to the ‘correlation’ item in the diagram but I’m unconvinced that a formal entity representation is any better than a narrative explanation. 


I’m suggesting that research is primarily sourced-based, working upwards from the information we find in each source, and that the converse case, where we have a specific question, is less frequent; except for family trees where the acquisition of details for vital events constitutes such questions. Consequences of this are that the source details (including its assessment as well as its citation) are a secondary consideration, and also that other information from the same source may be ignored. The sharing of research information, as opposed to just conclusions, must take account of this bottom-up approach, but there are currently no comprehensive mechanisms for this level of sharing. GEDCOM only shares conclusions, and those attempts that have been made to share research have become mired in the goal-directed notion. Whilst there may be a lot of variety in how readers approach their own source-based research, the digital representation would try to encapsulate the core elements in a manner that could be built upon to support correlation with other sources and the generation of conclusions.

[1] This is STEMMA terminology for references to subjects in a source, i.e. person references, place references, group references, etc. The distinction between subject references and subject entities (in the digital representation) was proposed as a clean break from the more contentious evidence person and conclusion person on the FHISO mailing list at: The Preferred Vocabulary. This suggestion sank without a trace as it was attached to some unrelated post, but it also needs rethinking as there are three distinct representations rather than just two.