<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments for Identity Resolution Daily</title>
	<link>http://identityresolutiondaily.com</link>
	<description>All About Identity and Entity Resolution</description>
	<pubDate>Mon, 15 Mar 2010 14:33:48 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2</generator>

	<item>
		<title>Comment on Is MDM Dead? by Dan Power</title>
		<link>http://identityresolutiondaily.com/730/is-mdm-dead/#comment-888</link>
		<author>Dan Power</author>
		<pubDate>Thu, 04 Mar 2010 16:42:09 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/730/is-mdm-dead/#comment-888</guid>
		<description>&lt;p&gt;I think reports of MDM's demise are greatly exaggerated! &lt;/p&gt;
&lt;p&gt;Great article, Mike - and you make some very good points about the increasing role of identity resolution within master data management. In my opinion, we'll see the next generation of MDM platforms placing an even higher emphasis on fast, accurate identity resolution. &lt;/p&gt;
&lt;p&gt;Building a master data repository with accurate, complete, timely and consistent customer data requires spot-on identity resolution capabilities. Anything less, and you're matching apples to oranges when building the "golden records" in your MDM hub - and then you'll be propagating those mistakes all over the enterprise. &lt;/p&gt;
&lt;p&gt;So it's good to see customers and other analysts realizing how important identity resolution is to MDM's future.&lt;/p&gt;
</description>
		<content:encoded><![CDATA[<p>I think reports of MDM&#8217;s demise are greatly exaggerated! </p>
<p>Great article, Mike - and you make some very good points about the increasing role of identity resolution within master data management. In my opinion, we&#8217;ll see the next generation of MDM platforms placing an even higher emphasis on fast, accurate identity resolution. </p>
<p>Building a master data repository with accurate, complete, timely and consistent customer data requires spot-on identity resolution capabilities. Anything less, and you&#8217;re matching apples to oranges when building the &#8220;golden records&#8221; in your MDM hub - and then you&#8217;ll be propagating those mistakes all over the enterprise. </p>
<p>So it&#8217;s good to see customers and other analysts realizing how important identity resolution is to MDM&#8217;s future.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Entity Resolution Metrics by John Talburt</title>
		<link>http://identityresolutiondaily.com/654/entity-resolution-metrics/#comment-718</link>
		<author>John Talburt</author>
		<pubDate>Mon, 23 Nov 2009 13:28:05 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/654/entity-resolution-metrics/#comment-718</guid>
		<description>Jim,
Thank you. That is an interesting question that I have not fully considered.  It is clear that if E is order dependent then at most only one of the partitions that it produces can be correct.  However it is not clear if an order independent process E would necessarily have a better chance of producing a "more correct" partition.  My first thought is that you could construct examples that work both ways, but I will think about this further.
-jrt-</description>
		<content:encoded><![CDATA[<p>Jim,<br />
Thank you. That is an interesting question that I have not fully considered.  It is clear that if E is order dependent then at most only one of the partitions that it produces can be correct.  However it is not clear if an order independent process E would necessarily have a better chance of producing a &#8220;more correct&#8221; partition.  My first thought is that you could construct examples that work both ways, but I will think about this further.<br />
-jrt-</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Measuring Entity Resolution Accuracy by Jim Zaiss</title>
		<link>http://identityresolutiondaily.com/642/measuring-entity-resolution-accuracy/#comment-716</link>
		<author>Jim Zaiss</author>
		<pubDate>Sat, 21 Nov 2009 08:51:32 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/642/measuring-entity-resolution-accuracy/#comment-716</guid>
		<description>For names that are fairly uncommon (such as my own surname), I would think that internet search engines like Google could provide a great deal of help in determining that two entity references are somewhat/very likely to be about (a) two different people or (b) the same person – when certain attributes of the person(s) in question are known.   Is this avenue pursued in Entity Resolution?  

Best,
Jim Zaiss</description>
		<content:encoded><![CDATA[<p>For names that are fairly uncommon (such as my own surname), I would think that internet search engines like Google could provide a great deal of help in determining that two entity references are somewhat/very likely to be about (a) two different people or (b) the same person – when certain attributes of the person(s) in question are known.   Is this avenue pursued in Entity Resolution?  </p>
<p>Best,<br />
Jim Zaiss</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Entity Resolution Metrics by Jim Zaiss</title>
		<link>http://identityresolutiondaily.com/654/entity-resolution-metrics/#comment-715</link>
		<author>Jim Zaiss</author>
		<pubDate>Sat, 21 Nov 2009 08:01:17 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/654/entity-resolution-metrics/#comment-715</guid>
		<description>Suppose that, for a given S, there is a process E such that each possible ordering λi always results in the same partition P.  Can we then conclude – ceteris paribus -- that P is more likely to be the _correct_ partition of S than another partition P’ based on another process E’ such that the λi chosen sometimes results in a different partition of S?

Please excuse me if this is a naïve question, or if it describes a situation that rarely occurs in practice.  I’m new to this particular topic, but find it fascinating.  I’ve long been interested in personal identity issues, though mostly from a logical, metaphysical, or phenomenological perspective. 

Regards,
Jim Zaiss
AWARE Software, Inc.</description>
		<content:encoded><![CDATA[<p>Suppose that, for a given S, there is a process E such that each possible ordering λi always results in the same partition P.  Can we then conclude – ceteris paribus &#8212; that P is more likely to be the _correct_ partition of S than another partition P’ based on another process E’ such that the λi chosen sometimes results in a different partition of S?</p>
<p>Please excuse me if this is a naïve question, or if it describes a situation that rarely occurs in practice.  I’m new to this particular topic, but find it fascinating.  I’ve long been interested in personal identity issues, though mostly from a logical, metaphysical, or phenomenological perspective. </p>
<p>Regards,<br />
Jim Zaiss<br />
AWARE Software, Inc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Other Half of Entity Resolution by John Talburt</title>
		<link>http://identityresolutiondaily.com/648/the-other-half-of-entity-resolution/#comment-713</link>
		<author>John Talburt</author>
		<pubDate>Mon, 16 Nov 2009 12:09:27 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/648/the-other-half-of-entity-resolution/#comment-713</guid>
		<description>Jim,
I agree, but to quote William Money in "Unforgiven", I think are there 3 halves to ER.  These ER activities are Mining the entity references when they are embedded in unstructured data, resolving the references to the same entity (0 degree of separation), and exploring the relationships among resolved entities.  It is true that people often confuse the entity references (labels) with the entities themselves.  Breaking ER down into these 3 activities is nice for discussion, but in practice they tend to overlap.
Thanks,
-jrt-</description>
		<content:encoded><![CDATA[<p>Jim,<br />
I agree, but to quote William Money in &#8220;Unforgiven&#8221;, I think are there 3 halves to ER.  These ER activities are Mining the entity references when they are embedded in unstructured data, resolving the references to the same entity (0 degree of separation), and exploring the relationships among resolved entities.  It is true that people often confuse the entity references (labels) with the entities themselves.  Breaking ER down into these 3 activities is nice for discussion, but in practice they tend to overlap.<br />
Thanks,<br />
-jrt-</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Other Half of Entity Resolution by Jim Zaiss</title>
		<link>http://identityresolutiondaily.com/648/the-other-half-of-entity-resolution/#comment-711</link>
		<author>Jim Zaiss</author>
		<pubDate>Tue, 10 Nov 2009 19:59:50 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/648/the-other-half-of-entity-resolution/#comment-711</guid>
		<description>I suggest two further tweaks to the definition of entity resolution -- one to part (a) and one to part (b).   Part (a) would be better stated as:

   The capability to (a) resolve multiple labels for individuals, products, or
   other _types_of_objects_ into a single resolved entity when ambiguity
   from pseudonyms, alias names or other synonym-style constructs exist

The "multiple labels" in question are not typically labels for nouns or noun classes.  They ARE nouns, and they are labels for (i.e. they denote) objects in the world.  The original wording of (a) suffers from a use-mention confusion.

Regarding part (b) of the definition,  I think it's misleading to describe the connections of interest here as "between entities that are _two_ or more degrees of separation apart."  I am separated by two degrees from friends of my friends; I am separated by one degree from my friends; and I am separated by *zero* degrees from myself.  Since (b) is explicitly about connections between _entities_ (as opposed to, say, labels for entities), I would replace the word 'two' in part (b) with 'one'. 

Regards,
Jim Zaiss
AWARE Software, Inc.</description>
		<content:encoded><![CDATA[<p>I suggest two further tweaks to the definition of entity resolution &#8212; one to part (a) and one to part (b).   Part (a) would be better stated as:</p>
<p>   The capability to (a) resolve multiple labels for individuals, products, or<br />
   other _types_of_objects_ into a single resolved entity when ambiguity<br />
   from pseudonyms, alias names or other synonym-style constructs exist</p>
<p>The &#8220;multiple labels&#8221; in question are not typically labels for nouns or noun classes.  They ARE nouns, and they are labels for (i.e. they denote) objects in the world.  The original wording of (a) suffers from a use-mention confusion.</p>
<p>Regarding part (b) of the definition,  I think it&#8217;s misleading to describe the connections of interest here as &#8220;between entities that are _two_ or more degrees of separation apart.&#8221;  I am separated by two degrees from friends of my friends; I am separated by one degree from my friends; and I am separated by *zero* degrees from myself.  Since (b) is explicitly about connections between _entities_ (as opposed to, say, labels for entities), I would replace the word &#8216;two&#8217; in part (b) with &#8216;one&#8217;. </p>
<p>Regards,<br />
Jim Zaiss<br />
AWARE Software, Inc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Avoiding False Positives: Analytics or Humans? by Ken O'Connor</title>
		<link>http://identityresolutiondaily.com/639/avoiding-false-positives-analytics-or-humans/#comment-701</link>
		<author>Ken O'Connor</author>
		<pubDate>Thu, 15 Oct 2009 14:47:36 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/639/avoiding-false-positives-analytics-or-humans/#comment-701</guid>
		<description>Robert,

Excellent post on a highly sensitive issue.

I would like to use an anology to highlight the intrinsic advantage analytics have over humans. 

Yahoo used to be the leading Search Engine.  Yahoo employed people to decide which websites should get the highest ranking.  In contrast, from day one Google used the "Algorithm".  Yahoo's approach was never sustainable, given the exponential growth in the number of websites.  

I have worked on the development of Anti Money Laundering (AML) systems.  AML systems perform Financial Transaction Monitoring.  They could not function without analytics.  They monitor Transaction Activity on millions of accounts. The purpose of the analytics is to identify "Transaction Activity that is unusual when compared to an account holder's peers".  The AML system alerts a human to study the unusual activity.  The human then seeks to "explain away" the unusual activity as 'normal', e.g. Once off sale of an asset.  If the human cannot find a good reason for the unusual transaction activity, they report it to the authorities as "Suspicious". 

In my opinion, AML systems provide a good example of the pragmatic combining of analytics and humans - for the good of society.     

I completely agree with your quote "analytics are ethically neutral and the risk of something going “to the dark side” is the risk that comes from the people involved, with or without analytics."

Rgds Ken</description>
		<content:encoded><![CDATA[<p>Robert,</p>
<p>Excellent post on a highly sensitive issue.</p>
<p>I would like to use an anology to highlight the intrinsic advantage analytics have over humans. </p>
<p>Yahoo used to be the leading Search Engine.  Yahoo employed people to decide which websites should get the highest ranking.  In contrast, from day one Google used the &#8220;Algorithm&#8221;.  Yahoo&#8217;s approach was never sustainable, given the exponential growth in the number of websites.  </p>
<p>I have worked on the development of Anti Money Laundering (AML) systems.  AML systems perform Financial Transaction Monitoring.  They could not function without analytics.  They monitor Transaction Activity on millions of accounts. The purpose of the analytics is to identify &#8220;Transaction Activity that is unusual when compared to an account holder&#8217;s peers&#8221;.  The AML system alerts a human to study the unusual activity.  The human then seeks to &#8220;explain away&#8221; the unusual activity as &#8216;normal&#8217;, e.g. Once off sale of an asset.  If the human cannot find a good reason for the unusual transaction activity, they report it to the authorities as &#8220;Suspicious&#8221;. </p>
<p>In my opinion, AML systems provide a good example of the pragmatic combining of analytics and humans - for the good of society.     </p>
<p>I completely agree with your quote &#8220;analytics are ethically neutral and the risk of something going “to the dark side” is the risk that comes from the people involved, with or without analytics.&#8221;</p>
<p>Rgds Ken</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Applying Identity Resolution to Patient Identification Integrity by Henrik Liliendahl Sørensen</title>
		<link>http://identityresolutiondaily.com/605/applying-identity-resolution-to-patient-identification-integrity/#comment-685</link>
		<author>Henrik Liliendahl Sørensen</author>
		<pubDate>Mon, 10 Aug 2009 05:56:13 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/605/applying-identity-resolution-to-patient-identification-integrity/#comment-685</guid>
		<description>Avoiding duplicate patients may be a very different task depending on from which country you are.

In Scandinavia every citizen is assigned a unique citizen ID used all around in healthcare as well as other areas as election, driving license, welfare and so on.

Newest improvements are that the ID is assigned to newborns by health care staff – as close to the root as possible as one may put it.

More &lt;a href="http://liliendahl.wordpress.com/2009/08/05/sweden-meets-united-states/" rel="nofollow"&gt;here&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Avoiding duplicate patients may be a very different task depending on from which country you are.</p>
<p>In Scandinavia every citizen is assigned a unique citizen ID used all around in healthcare as well as other areas as election, driving license, welfare and so on.</p>
<p>Newest improvements are that the ID is assigned to newborns by health care staff – as close to the root as possible as one may put it.</p>
<p>More <a href="http://liliendahl.wordpress.com/2009/08/05/sweden-meets-united-states/" rel="nofollow">here</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Myth of Matching: Why We Need Entity Resolution by Steve Sieloff</title>
		<link>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-679</link>
		<author>Steve Sieloff</author>
		<pubDate>Mon, 13 Jul 2009 22:57:17 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/493/the-myth-of-matching-why-we-need-entity-resolution/#comment-679</guid>
		<description>John --

Another great post and on point! I find it very interesting linking "point in time" occupancies to the current state location of an entity.  Public records, while fruitful, are spotty in availability and lack many standard data quality measures.  Name distributions per a given geography (zip or zip+4) are helping in making links between names with materially different addresses -- Zawarek Timonsky 123 Main St and Zawarek Timonsky 456 Elm Dr in same zip code where only one Zawarek first name is known and 3 Timonsky surnames known ... the unique combination creates a high degree of confidence we are talking same person -- even with differing addresses.

As for the example of St. in the street not always meaning Street, it is clear that the software causing the incorrect classification and standardization is not looking at both the keyword AND the pattern or semantics in which the keyword or phrase is referenced.  This type of semantic parsing and standardization is gaining traction in document classification and phrase searching (aka Google).

Keep up the thought provoking articles!

Steve</description>
		<content:encoded><![CDATA[<p>John &#8211;</p>
<p>Another great post and on point! I find it very interesting linking &#8220;point in time&#8221; occupancies to the current state location of an entity.  Public records, while fruitful, are spotty in availability and lack many standard data quality measures.  Name distributions per a given geography (zip or zip+4) are helping in making links between names with materially different addresses &#8212; Zawarek Timonsky 123 Main St and Zawarek Timonsky 456 Elm Dr in same zip code where only one Zawarek first name is known and 3 Timonsky surnames known &#8230; the unique combination creates a high degree of confidence we are talking same person &#8212; even with differing addresses.</p>
<p>As for the example of St. in the street not always meaning Street, it is clear that the software causing the incorrect classification and standardization is not looking at both the keyword AND the pattern or semantics in which the keyword or phrase is referenced.  This type of semantic parsing and standardization is gaining traction in document classification and phrase searching (aka Google).</p>
<p>Keep up the thought provoking articles!</p>
<p>Steve</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on What’s the Data Quality Business Message? by Dylan Jones</title>
		<link>http://identityresolutiondaily.com/594/what%e2%80%99s-the-data-quality-business-message/#comment-678</link>
		<author>Dylan Jones</author>
		<pubDate>Thu, 09 Jul 2009 12:39:59 +0000</pubDate>
		<guid>http://identityresolutiondaily.com/594/what%e2%80%99s-the-data-quality-business-message/#comment-678</guid>
		<description>Hi Robert

I would have to agree with Ted but I think "reasonably poor" is a little gracious!

I think that part of the problem is that historically, the data quality technology companies employed a lot of very technically minded people who lacked the specific vertical experience that so many businesses need.

What I am definitely seeing in recent months is a much sharper focus on business products which don't really have a great deal of focus on DQ but underneath the covers is the same old DQ engine. A couple of vendors are doing this well but the vast majority are just pushing the tired old messages.

There has been tons of online research for example which demonstrates that if your message is focused on you (the company) and not you (the customer) the engagement can be lost, literally in seconds.

Looking forward to your new messaging, why not use your community here to help you shape it?</description>
		<content:encoded><![CDATA[<p>Hi Robert</p>
<p>I would have to agree with Ted but I think &#8220;reasonably poor&#8221; is a little gracious!</p>
<p>I think that part of the problem is that historically, the data quality technology companies employed a lot of very technically minded people who lacked the specific vertical experience that so many businesses need.</p>
<p>What I am definitely seeing in recent months is a much sharper focus on business products which don&#8217;t really have a great deal of focus on DQ but underneath the covers is the same old DQ engine. A couple of vendors are doing this well but the vast majority are just pushing the tired old messages.</p>
<p>There has been tons of online research for example which demonstrates that if your message is focused on you (the company) and not you (the customer) the engagement can be lost, literally in seconds.</p>
<p>Looking forward to your new messaging, why not use your community here to help you shape it?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
