Wednesday, April 28, 2010
Performance issues with an in-memory Datatable select clauses
Very often, we cache .NET Datatable objects in memory. This could be for lookup tables or for other frequently accessed data. The following article on MSDN points out that using 'select' on a Datatable can be very very slow.
http://msdn.microsoft.com/en-us/library/dd364983.aspx
Snippet from the article:
"Select takes arbitrary criteria and returns an array of DataRows. Essentially, DataTable.Select has to walk the entire table and compare every record to the criteria that you passed in. "
The performance stats for LINQ in the article look very impressive. Looks like Microsoft has made a lot of optimizations on LINQ. It would be great to know (see source code) of what happens behind the scenes and why LINQ is so damn fast!
http://msdn.microsoft.com/en-us/library/dd364983.aspx
Snippet from the article:
"Select takes arbitrary criteria and returns an array of DataRows. Essentially, DataTable.Select has to walk the entire table and compare every record to the criteria that you passed in. "
The performance stats for LINQ in the article look very impressive. Looks like Microsoft has made a lot of optimizations on LINQ. It would be great to know (see source code) of what happens behind the scenes and why LINQ is so damn fast!
Labels:
.NET,
LINQ,
Performance
.NET PDF creation tools
Some time back, I had blogged about the opensource tools available in Java and .NET for PDF creation. Recently came across another commerical library called "Dynamic PDF" for PDF creation. This library has a Java and .NET version of the API.
The library is easy to use and is very robust. It handles 'over-flowing' tables and text areas in PDF files very gracefully.
Another interesting library that Dynamic PDF sells is the DynamicPDF™ PrintManager for .NET . This library makes it so simple to print PDFs and also has callback handlers for error messages from printers - all pure managed code. Now that's what I like :)
Two opensource .NET PDF generation tools are PDFSharp and iTextSharp.
The library is easy to use and is very robust. It handles 'over-flowing' tables and text areas in PDF files very gracefully.
Another interesting library that Dynamic PDF sells is the DynamicPDF™ PrintManager for .NET . This library makes it so simple to print PDFs and also has callback handlers for error messages from printers - all pure managed code. Now that's what I like :)
Two opensource .NET PDF generation tools are PDFSharp and iTextSharp.
XSD to XML and vice versa
I was looking for a quick and free tool that could convert XSD schema files to sample XML files and also generate a XSD file from a sample XML file. Commerical tools such as XML Spy and OxygenXML were powerful tools that provided these features, but I was looking for a free one.
First I checkout the open source Java IDEs.
First I checkout the open source Java IDEs.
- Netbeans had a beautiful editor to visualize and edit a XSD schema in a graphical tree structure. But unfortunately it did not have the ability to generate sample XML file or vice versa.
- Eclipse too had a visual editor and allowed visual editing of schema elements. It could generate a sample XML file from a XSD file, but had no option for the reverse; i.e. generating a schema file from a sample XML.
- VS 2008 SP1 had both the options - coversion between XML and XSD. For schema files, right click on a node in XSD Explorer view and select "Generate XML". For XML files, select "Tools -> Generate schema" to create the XSD file. Both these operations are very quick in Visual Studio.
- VS 2010 has extensive support for XML tooling. You have 3 different views for schema designing that should suffice for most complex schema definition.
Besides these free tools, there are other command line tools that can be used. For e.g.
XSD.exe tool can be schema to XML and vice versa transformation.
There was another .NET tool that I found on MSDN for generating XML documents from XSD.
Another cool Java desktop tool that supported these features was XMLSpear.
Labels:
XML,
XML Schema
Monday, April 26, 2010
Updatable Views Vs ORM tools (ADO.NET Entity Framework)
Recently during one of our design brainstorming discussions, our database developer was having a hard time trying to understand the benefit of using a ORM tool such as NHibernate or ADO.NET Entity framework. For our example scenario, he said that he could easily create database views that would serve the same purpose - abstracting the logical schema and helping the developer avoid complex joins for simple business queries.
For the use-case we were working on, his statement made sense. But there are a lot of cases and reasons why database views would not be the best choice. Here is a snippet from a MSDN article.
An obvious question at this point would be why not just use traditional database views for this. While database views can abstract many of the mappings, often that solution won't work for several process and functional reasons: (a) many of the views are simply too complex to be generated and maintained by developers in a cost-effective way, even for some simple conceptual to logical mappings, (b) the classes of views that have the property of being automatically updatable at the store are limited, and (c) databases for core-systems in medium and large companies are used by many central and departmental applications, and having each individual application create several views in the database would pollute the database schema and create significant maintenance workload for the database administrators. In addition, database views are limited to the expressivity of the relational model, and typically lack some of the more real-world concepts of the Entity Data Model, such as inheritance and complex types.
For the use-case we were working on, his statement made sense. But there are a lot of cases and reasons why database views would not be the best choice. Here is a snippet from a MSDN article.
An obvious question at this point would be why not just use traditional database views for this. While database views can abstract many of the mappings, often that solution won't work for several process and functional reasons: (a) many of the views are simply too complex to be generated and maintained by developers in a cost-effective way, even for some simple conceptual to logical mappings, (b) the classes of views that have the property of being automatically updatable at the store are limited, and (c) databases for core-systems in medium and large companies are used by many central and departmental applications, and having each individual application create several views in the database would pollute the database schema and create significant maintenance workload for the database administrators. In addition, database views are limited to the expressivity of the relational model, and typically lack some of the more real-world concepts of the Entity Data Model, such as inheritance and complex types.
ADO.NET client-views work entirely on the client, so each application developer can create views that adapt the data to a shape that makes sense for each particular application without affecting the actual database or other applications. The class of updatable views supported in the Entity Framework is much broader than those supported by any relational store.
Profilers for .NET
Way back, I had blogged about the profiling capabilities of VSTS 2008 and Numega DevPartner Studio.
Came across this site that lists down the most of the .NET profiling tools available in the market.
http://sharptoolbox.com/categories/profilers-debuggers
I have had good experience with .NET memory profiler and ANTs profiler.
Both of them allows you to collect stats on elapsed time for the entire call-stack and also allow you to walk the heap and check the effect of GC runs across heap generations.
Came across this site that lists down the most of the .NET profiling tools available in the market.
http://sharptoolbox.com/categories/profilers-debuggers
I have had good experience with .NET memory profiler and ANTs profiler.
Both of them allows you to collect stats on elapsed time for the entire call-stack and also allow you to walk the heap and check the effect of GC runs across heap generations.
Labels:
.NET,
Performance
Wednesday, April 21, 2010
Referential Integrity across databases in SQL Server
Having worked on Oracle databases for a long time, I was quite comfortable with the idea of maintaining RI across different schemas in same database. For e.g. Ur database may have a schema containing master tables and your transaction schema references the lookup data in the master tables.
But to my suprise, prior to SQLServer 2005, it was not possible to have separate schemas in SQL Server. This meant that there was no easy way to maintain RI across databases in SQLServer. The only options were using messy triggers or a CHECK constraint with a UDF (user defined function)
Maintaining RI across databases is still not possible in SQLServer, but SQLServer 2005/2008 have added the concept of schemas in databases; i.e. it is possible to have multiple schemas in each database and each schema offers the same logical separation that a separate database would. We can also configure the schema to be located on a separate filegroup or a separate disk, thus maximizing performance. In SQLServer 2005/2008, it is possible to have RI constraints across schemas. I think this is the best design approach to take. If it is not possible to have your master data in the same database, then the second best approach is to use replication to get the master data into your database.
But to my suprise, prior to SQLServer 2005, it was not possible to have separate schemas in SQL Server. This meant that there was no easy way to maintain RI across databases in SQLServer. The only options were using messy triggers or a CHECK constraint with a UDF (user defined function)
Maintaining RI across databases is still not possible in SQLServer, but SQLServer 2005/2008 have added the concept of schemas in databases; i.e. it is possible to have multiple schemas in each database and each schema offers the same logical separation that a separate database would. We can also configure the schema to be located on a separate filegroup or a separate disk, thus maximizing performance. In SQLServer 2005/2008, it is possible to have RI constraints across schemas. I think this is the best design approach to take. If it is not possible to have your master data in the same database, then the second best approach is to use replication to get the master data into your database.
Tuesday, April 06, 2010
What is Micro-Architecture in software design?
Recently I came across the term 'Micro-Architecture' on a number of pages on Sun's site. Sun's engineers have an interesting definition for this term. They refer to Micro-Architecture as a composition of a set of patterns that can be used to realize a subsystem.
In the book - "Core J2EE patterns", Micro-Architecture is described as below.
We define micro-architecture as a set of patterns used together to realize parts of a system or subsystem. We view a micro-architecture as a building block for piecing together well-known, cohesive portions of an overall architecture. A micro-architecture is used to solve a coarser-grained, higher-level problem that cannot be solved by a single pattern. Thus, a micro-architecture represents a higher level of abstraction than the individual patterns.
In the book - "Core J2EE patterns", Micro-Architecture is described as below.
We define micro-architecture as a set of patterns used together to realize parts of a system or subsystem. We view a micro-architecture as a building block for piecing together well-known, cohesive portions of an overall architecture. A micro-architecture is used to solve a coarser-grained, higher-level problem that cannot be solved by a single pattern. Thus, a micro-architecture represents a higher level of abstraction than the individual patterns.
Labels:
architecture,
Java
Subscribe to:
Posts (Atom)