Answered JOIN-based filtering in v3?

  • 29 sierpnia 2010 19:20
     
     
    We currently use merge replication, but want to move away from it due to its complexity, some of its limitations, and most of all its lack of support in SQL Azure.

    In merge replication, a table is normally filtered in one of two ways:
    1. As a parameter-based WHERE clause or
    2. As a JOIN from another replicated (already filtered) table

    In both cases you can essentially specify the actual T-SQL used in the process.

    It's clear to me that the Sync Framework (or rather, the Sync Service - there doesn't seem to be much of a framework on the server side yet) as put forth in this CTP supports the former, but what about the latter?

    As far as I can tell, neither the docs nor the samples illustrate joining entities in the sync schema, i.e. filtering a table based on included rows in another table. For example, say I want to sync all Orders belonging to the current user, and their respective OrderLines. Most databases would not have a user ID column on the OrderLines table, only on the Orders table, and filter the OrderLines table by joining from the Orders table.

    Thankful for any light shed on this!
    Daniel Stolt, Perceptible, http://www.perceptible.net

Wszystkie odpowiedzi

  • 30 sierpnia 2010 07:12
     
     
    Currently joining related tables (either to sync or for filtering) is not supported. There is another thread Syncing related tables which gives a suggestion for a hackish work arround.
  • 30 sierpnia 2010 19:22
    Moderator
     
     

    Parameterized WHERE clauses are supported in the current backend provider but the concept of JOIN filters is not supported. SyncFx works on picking what it thinks is not known to the remote client. The provider works off concept of partitioning rows based on the filter value but it does not support partition realignments. For instance a row which was in your partition moves to a different partition does not send a tombstone for the old partition.

    About the customized JOIN queries, this also is not supported. The current SyncFx sql provider uses a custom side table based tracking and hence the JOIN is based on the base table and its side table. The Fx treats each table independently and hence there is no provision to pick rows based on another table selection. If Sql-to-Sql is your goal then you should look at the Data Sync Service which is purelly for synchronizing on premise SQL skus and Sql Azure. OData-Sync is to enable building Offline apps on client platforms that dont support the sync fx (like iPhone, Andriod, WM 6.X and Silverlight).

    Let me know if this helps. Link for Data Sync is http://blogs.msdn.com/b/sync/archive/2010/07/07/introduction-to-data-sync-service-for-sql-azure.aspx


    Maheshwar Jayaraman - http://blogs.msdn.com/mahjayar
  • 30 sierpnia 2010 21:11
     
     

    Thanks for responding. I want to make it very clear that we are not targeting SQL-to-SQL sync - our scenario is to sync between Windows Mobile 6.x and SQL Compact 3.5 SP2 on the client, and SQL Server 2008 R2 on the server. As a side note, we are hoping to move the server pieces to SQL Azure in the near future, which is why we are so interested in SyncFx (merge replication is not available on SQL Azure).

    The fact that we sync to mobile devices is the very reason why JOIN-based filtering is so important. Device connectivity is mostly based on 3G or GPRS and so the bandwidth is a limiting factor. Local device storage space is a limited resource as well. Therefore, syncing only the rows that are required for the current user at the current time is absolutely essential, and JOIN-based filtering is the only feasible way to accomplish this in a lot of cases.

    Several workarounds might be considered, but they all fail at some point:

    - To add a UserId column to every table is simply not a practical possibility because we are dealing with table graphs that consist of up to 8-10 relationships from the table that actually contains the UserId column. For some tables it's not even a theoretical possibility because the filtering rules are such that one row could be applicable for multiple users.

    - To filter each table using a WHERE clause that traverses the relationships (using WHERE EXISTS and subqueries) all the way to the "root table" quickly becomes infeasible from a performance standpoint. With a table graph consisting of 8 relationships, this means performing 36 filtering operations instead of 8, 28 of which are of course completely redundant. Also, depending on the change tracking used, changes made to tables other than the "leaf table" might not be tracked properly.

    In summary, unless you can point me to some other way of dealing with this, it's probably safe to say that lack of support for JOIN-based filtering is a show stopper for us. My guess is that statement will hold true true for a lot of other customers too. To be quite honest (and I don't mean to be rude here, just to honestly voice my opinion) it puzzles me greatly that you think anyone will be able to use SyncFx for any real purpose without it; using foreign key relationships in databases can't be all that uncommon a scenario? Is there any way you might consider adding support for this?

     


    Daniel Stolt, Perceptible, http://www.perceptible.net
  • 31 sierpnia 2010 20:16
    Moderator
     
     

    We have two threads for JOIN filters and Merge-SyncFx comparison. I posted my comments on the http://social.msdn.microsoft.com/Forums/en-US/synclab/thread/6bdbdfaf-ba37-4d65-a72e-46660ada2714 thread.

    In summary, JOIN filters is not a supported feature of SyncFx (in addition to many other Merge features) and is not in consideration for the near future.


    Maheshwar Jayaraman - http://blogs.msdn.com/mahjayar
  • 2 września 2010 00:33
     
     

    OK, thanks for clarifying that. I realize I may have given the impression that I want a full merge rep replacement, but this is not the case. I simply have some concerns regarding the practical implications of the filtering model specifically. I would very much appreciate some guidance on how we might be able to handle filtering efficiently given our scenario.

    Given that JOIN-based filtering is not in consideration, what is your recommendation in situations like ours, where we have:

    • Long table relationship chains (8 - 10 relationships from where the filtered column naturally exists)
    • Large tables where only a small number of rows are applicable to any individual client
    • Limited client bandwidth and client storage
    • Some tables where one row is applicable to multiple clients

    Thankful for any guidance you can share on this.


    Daniel Stolt, Perceptible, http://www.perceptible.net
  • 2 września 2010 17:26
    Moderator
     
     
    - Long table relationship chains (8 - 10 relationships from where the filtered column naturally exists) What exactly are you trying to achieve here? Is it similar to logical records? Or do you want some sort of entitiy mapping where you send the complete graph even if a small sub table changes? Based on your reply I will give more clarifications. - Large tables where only a small number of rows are applicable to any individual client This is currently supported. You can achieve this by specifying filter values to the SyncScope which horizontally partitions your table based on the client. SyncFx synchronizes SyncScopes which by definition is the set of tables that are a client is synchronizng as a logical unit. - Limited client bandwidth and client storage SyncFx V3 for SL and WP7 uses an Isolated Store based persistence which requires keeping all client data in memory for efficient change tracking, sync and optimistic concurrency detection. This does not scale well with very large amounts of client data. For Windows Mobile 6.X, we shipped a sample demonstrating how to use Sql Compact as the client database. This method does scale well with data size but not applicable on Silverlight and WP7 clients. The protocol does support batching and JSON payloads so it ensures that clients continue to make incremental progress over a small/unreliable bandwidth. JSON ensures that data payload size is smaller and with HTTP compression you can get further payload size reductions. - Some tables where one row is applicable to multiple clients This is also supported. We call this Scope with row overlap where a row from a table belongs to multiple SyncScope. Change tracking keeps track of this to ensure that changes made in one scope is propogated to clients on other scopes.
  • 3 września 2010 15:41
     
     

    I will try to explain more clearly what I'm trying to accomplish.

    The issue we're discussing is lack of JOIN-based filtering. The reason I mention things like long table chains, large tables, limited bandwidth and so forth is merely to highlight that some of the obvious workarounds do not work.

    Consider the following (fictitious) scenario:

    1. Table CustomerGroups contains groups of customers. Users have access to one or more of these groups. 100 rows.
    2. Table Customers contains customers. Each customer belongs to one customer group. 100 x 100 rows.
    3. Table Orders contains orders. Each order belongs to one customer. 100 x 100 x 100 rows.
    4. Table OrderLines contains order lines. Each order line belongs to one order. 100 x 100 x 100 x 100 rows.

    When this data is synced to a client, it needs to be filtered to only the information the user needs. Let's say the user has access to only one customer group; we then need to sync 1 row from CustomerGroups, 100 rows from Customers, 10000 rows from Orders and 1000000 rows from OrderLines.

    Filtering the CustomerGroups table is simple; just filter based on which customer groups the current user has access to. But how do we filter the others? Obviously, downloading all the rows from Customers, Orders and OrderLines is infeasible - the last table alone would account for 100 million rows (the reason why I mention limited resources and bandwidth as a parameter).

    The scenario is this: all tables need to be filtered on what the user has access to, but this is only specified on CustomerGroups - the other tables need to be filtered based on their relationship with CustomerGroups. As you explained, this is not supported because filtering in SyncFx operates on each table separately.

    My questions are these: Given the fact that almost all databases are structured in this way (normalization), how do you recommend that we use SyncFx in this scenario ? How, in your opinion, should we approach this problem? Should we think about this in a whole different way? If you had this database and this application, how would you implement sync based on SyncFx? What is your suggested alternative to JOIN-based filtering?

    I am aware of some suggested workaround (as I outlined in my second posts) but I believe that, even in this relatively common scenario, they cannot be successfully employed.


    Daniel Stolt, Perceptible, http://www.perceptible.net
  • 6 września 2010 07:32
     
     
    I just wanted to throw in that i am very much interested in this thread, too, since i have exactly the same scenario and the suggested workarround is only good for a proof-of-concept.
  • 6 września 2010 08:10
     
     

    Thanks for pitching in, M.Bi! I'm so relieved to hear that someone besides me understands the scenario and perceives this as a concern!


    Daniel Stolt, Perceptible, http://www.perceptible.net
  • 7 września 2010 17:07
    Moderator
     
     

    I understand the requirement but unfortunately there is no way this is going to be addressed in this release. I will pass this feedback back to the team.


    Maheshwar Jayaraman - http://blogs.msdn.com/mahjayar
  • 7 września 2010 18:44
     
     

    Yes, Maheshwar, you've made that clear, thank you.

    That's not what I'm asking though. What I'm asking is: what do you suggest as an alternative approach ?

    I would very much appreciate if you or someone else on your team could provide some guidance by addressing these questions, as stated in my previous post:

    "My questions are these: Given the fact that almost all databases are structured in this way (normalization), how do you recommend that we use SyncFx in this scenario ? How, in your opinion, should we approach this problem? Should we think about this in a whole different way? If you had this database and this application, how would you implement sync based on SyncFx? What is your suggested alternative to JOIN-based filtering?"

    Thanks in advance!


    Daniel Stolt, Perceptible, http://www.perceptible.net
  • 8 września 2010 17:51
    Moderator
     
     Odpowiedz

    And I am trying to deftly avoid that question:)

    For us to provide you a workaround we need to verify internally that the workaround would work. Again this would be a very limited solution which does will not support partition realignments as we need more information and its quite hard to get right without being part of your DML triggers. One customer who did have similar shape to you used the recommendation I gave earlier about adding the rootid to all dependent tables. They added the groupid to all the dependent tables (customer, order and orderlines) and used that as the filter column for that table. In addition they didnt have moves which means once a customer is assigned to a group it never changes. So if your scenario has employees moving groups then on a move they must clear their client cache and reinit which will get the new details down.

    This also means the client must know the groupId before it goes to the server for the first sync.

     


    Maheshwar Jayaraman - http://blogs.msdn.com/mahjayar
  • 26 września 2010 16:17
     
     
    We have a similar situation with MergeReplication where it was like a JOIN filters with 4-5 levels depths with big number of records involved. I must say that we had big problems: performance, filter just not work properly as it should, on client side was not expected data because of a certain FK etc ...and a solution we adopt was adding a hostName column to all tables that need to be filterd(out filter was based on hostName). In your case you can add as (Maheshwar says) GroupID to all tables involved and then this groupID even your Business needs require it dynamic, you can fill by a ScheduledJob that RUN a SP for example. So your filter is not based on JOINS but based on GroupID for each table. I must say that is even for MergeReplication a solution(better) then deep JOIN filters.
    Cristi
  • 21 października 2010 22:42
     
     

    I've got a very similar scenario as well.  Around 200 tables, of which probably 150 had to be filtered based off a parent in the hierarchy (actually a couple different hierarchies.)

     

    What I ended up doing was make my own metadata model that sits on top of the provisioning for 2.1 lacking join filters.  It's not what I would consider ideal but in a simplified manner:

     

    SyncTable: table to maintain information about the specific db table, it's insert order etc.

    SyncTableColumnFilter :  child of synctable that has a filter type, and the sql where claus to join back up to the parent element that gets filtered.  The sql text has tokens to be replaced by parameters during provisioning (gross I know, but it works.)

    SyncScope: table to maintain a manifest of scopes that the software supports for different scenarios.

    SyncScopeTable: child of sync scope that associates back up to the sync table.

    SyncContract: has an fk to the scope, as well as contents of the parameters I tokenized in the SyncTableColumnFilter.  In my case there's actually multiple potential parents allowed in the sync so those actually get contained in child tables of the SyncContract.

     

    During provisioning if my contract has parent parameters to filter by for each table I grab the associated filter sql test, filter column and the arguments associated with the synccontract and build the appropriate sql string for the column in question.  That then in turn gets generated into the sprocs during the provisioning (which actually helps you find the bugs in the sql for your filters :) .)  In cases where multiple filter types are present the filters get and'd together so it has to meet all the conditions.

     

    It's pretty convoluted (and the actual implementation is a bit more so) but it's flexible for extending the code down the line.  What I dislike about it is the fact that I've got to deprovision/provision the db if any changes are made to the filters.  I kept hoping there'd be a better way down the line but I haven't run across one that would allow multiple parents etc.

    Seems like you could just pull in the parents you want and have some logic traverse the relation constraints as child records got inserted.  Like a kinder gentler constraint exception for the purposes of filtering the data.  I thought about trying to find a way to put the local parent in the query and see if the relation's fk was present for the child to determine if a child should be pulled in but never looked deeper into it.  Seems like there's got to be a better solution than doing where clauses all the way down the tree.

     

     

     

  • 23 października 2010 13:15
     
     

    Gizmo,

    Very interesting - thanks for sharing the details! I'll discuss this with the rest of the team.

     


    Daniel Stolt, Perceptible, http://www.perceptible.net
  • 6 września 2012 17:36
     
     

    Hi Daniel,

    I'm having the same problem that you had, can you point me into the right direction?

    Thank you 

  • 6 września 2012 20:38
     
     

    Hi Luiz,

    I'm afraid not. We abandoned this technology stack altogether because of all these limitations compared to merge replication. Sorry.


    Daniel Stolt, Perceptible, http://www.perceptible.net

  • 7 września 2012 16:38
     
     

    Thank you Daniel.

  • 3 grudnia 2012 17:43
     
     

    Daniel,

    If you don't mind my asking, what did you and your team end up doing? We're in a very similar situation right now; using merge replication with merge filters, running software on Windows Mobile 6.x devices with SQL Server Compact, looking to go to Windows Azure. Obviously we'll need to move away from merge replication and SyncFx looked promising, but the row filtering problem is something we'll need to tackle.

    Any insights would be greatly appreciated.

  • 14 lutego 2013 12:25
     
     

    Hi Daniel,

    First I want to thank you for this interesting thread, I am facing the same issue I am using Merge replication now but I need to use another synchronization technology as merge only works with SQL CE and I need to synchronize with SQL lite as well.

    So I am wondering if you have found replacement for merge replication as sync framework has so many limitations and issues ?

    Thanks in advance

  • 14 lutego 2013 12:35
     
     

    Hi Eslam,

    No, we found no viable replacement, so we ended up abandoning Windows Mobile as a client technology stack completely because Microsoft totally dropped the ball on these kinds of scenarios.

    Thanks!


    Daniel Stolt, Perceptible, http://www.perceptible.net

  • 14 lutego 2013 12:42
     
     
    Thanks a lot for you fast reply Daniel.