Thursday, March 27, 2025

Data Product Vs Data as a product - can they be used interchangeably ?

 The concept of Data products has been around for some time. Obviously, they help in giving agility to end users by putting curated data right into their hands enabling quick insights and aiding quick decision making. Once a business need has been identified and the right product has been curated for that need, it can continue to serve that need for as long as other dynamics don’t change – thus helps improve reusability.

Putting data products on the enterprise data marketplace aids the amazonification of data across that enterprise. It becomes that much easier to make data discoverable and accessible and thus truly empowers the end user by enabling data democratization. It also makes it easy for the organization to monetize its data, by selling it to internal or external customers

So that’s what data products are – they are tangible items (mostly as views or database tables) emanating from a lake or a warehouse, resulting from a process including, but not limited to the below activities

  • Understanding specific needs of end users for their needs of forecasting, analytics, trend analysis,  reporting etc
  • Triaging the needs from different business users
  • Prioritizing the critical ones
  • Understanding which are the attributes coming in from which sources would be needed to serve these needs
  • Development activities for sourcing data components to create these data products
  • Putting in place data governance keeping in mind needs of the business users
  • Access provisioning

In order to visualize this better, let us look at some examples of data products

  • A RAG dashboard of KPIs
  • Running comparison of Q-o-Q sales
  • Faster route available notification on Google Maps

 Now that we are clear on what are data products and how they can be useful to the enterprise, let us now look at data as a product. I have seen that these two terms are often used interchangeably even though there are subtle differences.

Data as a product is a concept or an approach to treating data. Data has always been used for reporting and analytics purposes, and even though the idea was always to eventually improve bottomlines and toplines by putting it to use, data itself was not perceived as a commodity which could be branded, packaged and monetized. This realization is now hitting our customers.

To draw an analogy: when plastic was first discovered, everybody knew that it possessed great potential and was definitely valuable. But it only much later when plastic based products began to be manufactured and its utility value was felt in literally every sphere of life, that the true value of plastic was realized.

Data as a product approach led to the creation of a situation where data on its own strength acquired monetary value. The approach led to the creation of entire industries. Life Science companies often buy data from various sources to enhance their research, drug discovery and product development processes.

Netflix collects data on viewing habits from its subscribers to improve its recommendation engine. Starbucks uses data from its loyalty card program and mobile app to analyze customer purchase behaviour.

If an organization needs to create a data product which can be consumed as is by an internal or external user, obviously, it needs to be of high quality, there shouldn’t be an alternate view which conflicts the truth brought out by this one (single source of truth),  it needs to be governed well, lineage and traceability of its attributes has to be spot on, latency has to be optimal,  and assembly line observability has to be of the highest magnitude.

While the importance of all these supporting functions was always well known, the apprehension that the final product may be rejected due to one or more issues along the data pipeline makes enterprises to lay more emphasis on these supporting functions and thus drives up the commitment to output a great data product. Netflix knows that it can potentially lose customers if its recommendation engine malfunctions

In summary, data product is a tangible commodity built for a specific business use and data as a product is a org-wide mindset for the creation of a good data product.

Tuesday, July 26, 2022

An Introduction to the AWS Well Architected Framework

An Introduction to the AWS Well Architected Framework

 

AWS has the Well Architected Framework which contains guidelines and signposts for putting together an efficient, cost effective, robust infrastructure for greenfield implementations as well as for evaluating existing cloud environments (Well Architected Tool). Within this framework, AWS also provides domain specific manuals and whitepapers for building well architected frameworks for industries like gaming, SAP, streaming media etc. These focus on nuances that need to be kept in mind while building infrastructure specific to these use cases.

AWS provides a commitment that applications built on infrastructure adhering to principles defined in the Well Architected Framework will stand up to scrutiny on multiple industry standard benchmarks

 

The AWS Well Architected Framework is built on 6 pillars identified as crucial

  • Operational excellence 
  • Security
  • Reliability
  • Performance Efficiency
  • Cost Optimization
  • Sustainability
  • Let’s look in detail at what each of these mean

    Operational Excellence

     Operational excellence can be imagined as the ability to create and operate an environment hosting the applications and workloads with enough levers built in which can be tweaked for optimization. AWS recommends that, in order to limit human error we use code for setting up, running and automating the environment. The best practice is to keep on refining operating procedures frequently and arriving at the best possible combination of the provisioned, tweakable parameters for a given set of requirements. It is also advisable to make incremental/decremental small changes to the parameters which can be rolled back in case the desired results are not achieved, thus keeping adverse impact on customers down to a bare minimum.

     

    One should always aim to improve the overall operational health of the system. But in order to improve, one should be able to gauge where it is currently. In other words, one should be able to measure. It is not possible to measure without defining KPIs. KPIs need to be defined based on business outcomes and customer outcomes. Customer outcomes and business outcomes are defined based on business priorities. So, working our way backwards, in order to improve the overall operational health, it is very important to understand business priorities, and then go from there

    These priorities should be the driver for the environment setup. Obviously, business priorities change, and the environment should be flexible enough to adapt. Identify touch points within the environment ecosystem which should be amenable for changes with changes in business priorities

    Let us see an example. A B2C website may have an extremely fast backend RDS system linked to it since the business priority during normal working hours is speed of response to user clicks. But during off peak hours, when business computational load of the day’s operations on the RDS becomes high, the business priority may no longer be the same. The hosting cloud environment must be flexible enough and be able to adapt itself when such business priorities change

    With the Cloud, it is relatively easy to collect stats on how architectural decisions affect workload behaviour. Unlike with traditional data centers, with this data, it is possible to make changes to the environment for the better performance of workloads. In other words, it is important to have levers built into the system which can be tweaked to improve operational efficiency

    It is important to anticipate failures. Think of what-if scenarios and get a good understanding of what the impact on business will be in each of these scenarios. Also come up with response strategies and test them out to ensure they will be effective in real-life situations

    Let us try to understand this with an example. It is normal to see workload volumes increasing during certain predetermined hours or days. After configuring your system to scale up when workload increases, simulate a failure scenario wherein auto scaling doesn’t happen in response to increased workload. What would be the response of admin personnel in this situation. There needs to be a clearly chalked out SOP when a situation like this is encountered and test whether this works

     

    To the extent possible, always automate the response to an event. AWS provides multiple ways to do this. Cloudwatch is a service which lends itself admirably for these use cases. CloudWatch event rules and CloudWatch alarms are examples which can be leveraged here

     

    Security

    Although this term sounds as if it is self explanatory, this is probably one of the most challenging pillars to implement, especially in industries where data is sensitive

    Often, security issues happen due to lack of understanding and implementation of the most basic best practices. Before attempts are made to implement complex and fancy security procedures, ensure that some of the more fundamental, common sense security best practices are taken care of. This can potentially address most of the concerns

    One of the top things that needs to be kept in mind while implementing security for data, systems and assets is that security needs to be implemented at multiple levels and layers. For your use case(s), figure out which are these levels and layers

    Both data at rest and in motion needs to be protected. Data in motion can either be flowing within AWS touchpoints or it could be from on-prem to cloud or vice versa. Encrypting data in motion can be a challenging task. There are multiple options AWS provides for encrypting data in motion and the following are the best practices

    Principals need to be given permissions based on the least privilege principle and separation of duties need to be enforced. 

    Reduce/eliminate reliance on long term static credentials. 

    Security environment must support time travel in order to pinpoint a principal who has executed a particular command. 

    It is very important to have an incident management system in place. In spite of good security mechanisms, there will be incidents. Teams must be able to isolate systems under attack at very short notice.

    Automate the incident response

    Run simulations of security breaches and have the appropriate team detect breaches in the minimum time possible. 

    Source of threat to the environment may vary depending on the industry the customer operates in. So, it is important to be extremely conversant with the potential threats unique to the industry and tailor a security strategy that addresses these threats


    Some of the important services that can be leveraged to implement security across different areas are as follows

    Areas

    Key Services

    Data Protection

    EBS

    S3

    RDS

    KMS

    Cloud HSM

    Privilege management

    IAM

    MFA Token

    Permissions

    Roles

     

    Infrastructure protection

    VPC

    WAF

    Shield

    CloudFront

    Route 53

    Detective controls

    CloudTrail

    Config

    CloudWatch

     

     

     

    Protecting data in transit:

    ·       Use utilities like AWS PrivateLink to create a secure and private network connection between AWS VPC or on-prem installations to AWS based services. With PrivateLink, traffic stays on the Amazon backbone and therefore doesn’t traverse the internet and thus is safe

    ·       Use tools like GuardDuty to automatically detect attempts to move data outside of defined boundaries

    ·       Encryption in transit can be enforced in AWS. AWS services provide HTTPS endpoints using TLS for communication, thus providing encryption in transit when communicating with the AWS APIs. It is also possible to use VPN connectivity into VPC from an external network for data encryption

     

    Reliability

     

    Reliability is the ability of a workload to perform its intended function correctly and consistently when it’s expected to, throughout its lifecycle

    ·       No matter how robust an environment is, it can buckle under unexpected load. When this happens, the system should be able to automatically recover. A reliable system must be able to automatically recover from a failure. Taking this to the next level, the system should have the intelligence to anticipate failure and automate the appropriate response

    ·       Percentage availability of an application would be dictated by its functionality. Applications with critical functionality would need to be made highly available in multi AZ mode vis-à-vis other applications being made highly available in single AZ. This is equally applicable for front end applications as well as for backend apps like databases

    ·       Be very clear about needs for RPO and RTO and use this info to build in reliability

    ·       In addition to verifying the load works in best case scenarios, conduct negative testing in order to simulate scenarios that would cause the load to fail. This gives the opportunity to test recovery procedures

    ·       Where possible, replace single large resources with multiple smaller resources. More importantly, ensure they don’t share a single point of failure

    ·       Simulate failures and define SOPs in order for applications to be brought back live in case of failures

    ·       Having a good monitoring system is essential for a reliable system

    ·       Use logs and metrics wisely. Very often, logs and metrics tell a story on how your environment is being utilized and under what load. So, carry out periodic analysis on them and take appropriate action

    ·       Monitoring + Alerting + Automation=Self Healing. For example, CloudWatch and Autoscaling can be used to together to recover from failed EC2 instances

    ·       It is relatively easy to automate the system in such a fashion that it reacts to certain trigger events. Leverage this to get the environment to auto correct itself when faced with an imminent failure

    ·       Backups. Depending on the criticality of the data, RPO and RTO requirements, functionality of the data and its application usage, one needs to come up with appropriate backup strategies which includes the frequency at which it needs to be backed up. Not only does data need to be backed up, it is also equally important to ensure the reproduceability of it and keep the time it takes to reproduce it when needed, down to a minimum. Obviously, after reproducing it, it should be in a state where it can be used.

    Performance Efficiency

                 

    The Performance Efficiency pillar includes the ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.  As with the other pillars there are a few things to keep in mind here

    ·       xaaS. These days many technologies can be consumed as a service and does not entail installing and administering them ourselves. The service provider would be an expert in ensuring optimal efficiency of the service; leverage that to the extent possible

    ·       Go serverless. As a corollary to the above principle, it makes sense to leverage serverless compute and serverless storage in order to avoid provisioning them ourselves. However, there may be a tradeoff on cost and one would need to balance performance efficiency against it

    ·       Having made these choices, it is important to review them periodically and check if there are variances from expected performance. If there are then, those need to be addressed

    ·       Usually, there will be multiple services for similar use cases. Understand which service(s) best fits your specific use case. Consult AWS if needed

    ·       When trying to improve performance efficiency, work against a specific target and benchmark existing efficiency for a given set of parameters. This will let you measure improvements objectively

    ·       Work against permissible cost for your set of requirements

    ·       Analyze metrics, access patterns and choosing storage and compute options based on these

    ·       Network parameters usually will have a big impact on performance and efficiency. Study this closely and make appropriate configuration changes

     

    Cost Optimization

    Cost optimization is the process of minimizing costs thru continual process of refinement and improvement, without compromising on business outcomes

     

    ·       Use tools like AWS Budgets wisely and extensively to stay within limits. However, monitor costs proactively and don’t depend just on notifications

    ·       Keep Finance and Technology teams in the loop on all decisions in the cloud journey

    ·       Aim to innovate without overspending

    ·       Create groups and roles which control who can commission/decommission instances and resources. Good way to keep in check rising costs

    ·       Have policies in place which preempt unnecessary resource use.

    ·       Know exactly where costs are being incurred and focus more on those areas to implement cost controls. 80% of the costs may be incurred by 20% of the services. Focus more on these services

     

    Sustainability

     

    This pillar was recently added, and it focuses on the long-term environmental, economic and societal impact of your business activities on AWS cloud

     

    It is important to understand that all workloads leave a carbon footprint. Since all unnecessary storage and compute leads to wastage of energy, one of the core guiding principles this pillar advocates, is to eliminate redundant storage and compute and do everything that supports this cause

     

    Let us look at a couple of instances where application of Well Architected Framework Principles led to benefits for customers

    ·       Consulting solutions firm Burns & McDonnell saved 30% of its overall AWS bill in the first week after taking action based on the Well Architected Framework guiding principles.

    ·       BMC Software used the principles of Well Architected Framework and saw the following benefits

    ·       Was able to start delivering immediate value to customers

    ·       Expanded offerings to new companies and departments

    ·       Received positive customer feedback within its first 4 months

    ·       Exceeded internal business objectives

     

     

    The Well Architected Framework is a good place to start for teams looking to optimize their AWS based solutions along different dimensions. I encourage delivery teams to apply this framework and aim to build robust solutions

     

    Monday, May 30, 2022

    The rise of Annamalai - TN state BJP president


     

    In the dark cesspool politics that TN finds itself in today, there is a ray of hope that is emerging. His name is Annamalai. The BJP in a masterstroke appointed Annamalai as its state president in the 2nd half of 2021. Since then, Annamalai has been giving the ruling DMK nightmares on a daily basis.


    An ex IPS officer, Annamalai is honest, upright, articulate and fluent in English - everything that DMK politicians are not. His communication in Tamil, his mother tongue is superb. Apparently, he is working to pick up Hindi as well


    Annamalai regularly holds press meets in which he picks up bad policy and administrative decisions of the DMK and tears into the administration. He does his homework thoroughly (another trait which is new to TN politics) and comes fully prepared to field questions from the media, the majority of whom are DMK stooges. A hallmark of Annamalai's interactions with the media is how he rattles off stats and figures backing his claims.


    On multiple occasions in the past few months, the DMK government has had to back down on their decisions owing to immense pressure from him. It seems to me as if for the first time in TN, the people of the state are waking up to the realization that there is an alternative to the so-called Dravidian ideology that both DMK and AIADMK have been following and propagating in the state for close to 55-60 years (after the Congress ceased to be a power in the state).

     

    We need to keep in mind that the AIADMK is the principal opposition party in TN. So in a way, Annamalai's offensive is not only a cause of concern as far as the DMK is concerned, it is also a slap in the face of the AIADMK. He has taken the wind out of the sails of the principal opposition party. Of course, there is not much difference ideologically between the DMK and the AIADMK, which would explain why the AIADMK has no locus standi to oppose the DMK on its politically motivated administrative decisions

     

    Annamalai is very closely aligned with Modi's long-term plan for Tamil Nadu. We have been seeing how the PM has been relentlessly focussing on TN; right from his decision to host China's Xi in Mahabalipuram, his infrastructure investments in TN, his strategic quotes of Tamil poems both at home and abroad, decisions of the Railways to modernize railway stations in key TN cities, Modi has been TN centric both subtly and in an open way.

    So, it is clear that the central BJP leadership is going all out to win the confidence of the Tamil people. And it has begun to show results as can be seen from the local body elections in Chennai and in the 2021 Assembly elections where BJP got a significant 8% vote share.

     

    Annamalai is confident of winning 15 Lok Sabha seats in the 2024 general elections and capturing power in the 2026 Assembly elections in Tamil Nadu. I think that is definitely on.

    Thursday, November 11, 2021

    Loss of pride in our Hindu religion

    The way English educated Indians reject Hindu religious practices is so disappointing to watch. Sure, English has brought about economic mobility to millions of lower and middle class Indians over the past 2-3 decades and that has directly led to the abandonment of dozens of typically Hindu practices. It is saddening to see that not only have they voluntarily given up these practices but they appear to be proud about it.

    The single most damaging thing that we have given up is the Bindi. Maybe because it is the most visible identity of a Hindu woman, it is all the more conspicuous by its absence.

     

    During the Mughal rule, we have heard of rabid kings forcibly making Hindus give up their practices and symbols. But here we are today, voluntarily giving them up. I really cannot understand it

     

    Our Hindu brothers are getting massacred in Bangaladesh and there is no outrage in India! Temples are getting vandalized in an organized way and we are content watching the T20 world cup on TV!

    The other day, I happened to watch a video by Swami Nityananda where he is accusing Hindus of criminal negligence. I felt he is spot on. If there is one religion where a majority of its 'followers' are invertebrates, that religion has got to be Hinduism. It is so maddening.

     

    Politicians in Tamil Nadu regularly use derogatory language against Hindu practices and its customs. These are the ones that regularly get re-elected and have mass appeal. How can this be possible in a land with 80% Hindus ? Have we no shame ?

     

    The further you go away from your roots, the weaker your moorings become. I have no respect for people who reject their own traditions, practices and rituals. We seem to compete with each other in the extent to which we embrace western culture. We see no value in teaching our children the importance of following and preserving our traditions. We encourage our children to speak in English at the expense of the mother tongue. I mean, who does this !!?

    Can you visualize affluent parents in France talking to their children in Kannada because they feel the French language is inferior ? Isn't it crazy ? But that is exactly we do here in India. Most kids in Indian cities cannot frame complete sentences in their mother tongue. An even greater percentage of kids in cities cannot read and write in their mother tongue. Where are we headed ? Can you imagine the situation 25 years from now ?

     

    If the foundation is weak, the building cannot be structurally strong. Is it any surprise that we are losing thousands to other religions ?

    Wednesday, September 22, 2021

    How I passed the AWS Data Analytics Specialty exam (DAS-CO1)

    The AWS Specialty exam is a very hard exam to crack. Don't let anybody tell you different. The Associate level exams are a walk in the park, in comparison.   

    Even if you have good hands-on experience in the areas that this exam tests you on, it is still necessary to spend significant time to ensure you pick up a sound understanding of the concepts and have solved at least 3 mock exams AND scored north of 80%. For me, it took close to 3-4 months of hard work. It involved going thru the study materials, doing the associated lab exercises, taking notes during the process and cross referencing them smartly for easy retrieval later. I feel that as much as understanding what each AWS service can do, it is just as important to remember what it cannot do. This is crucial since there are multiple AWS services which offer very similar functionalities; the exam can trick you into picking the wrong one, especially when you are under pressure. Just one wrong choice can make all the difference in the exam and that can be really frustrating after months of hard work. 

    There are 5 sections that you should focus on for this exam: 
    1)Collection 
    2)Storage 
    3)Analytics/Compute 
    4)Visualization 
    5)Security 

     Even many of us who have been working on AWS for some time typically wouldn't have focused much on Security and that can prove to be the weak link in the chain. When I was solving the mock exam, I realized that Security was my weak point. So, I went back to the drawing board and began to work harder on this section. Of course, it may be a different section for you. But the point I'm trying to make is that you have to identify your weak points early and work especially hard on those. I used the Udemy course (https://cognizant.udemy.com/course/aws-data-analytics/learn/lecture/18455264?start=1#overview) to begin with. This is a good course and the instructor does a good job explaining the concepts. However, in my opinion, once you have completed this course and have grasped all the concepts therein, you are only 40-50% prepped from an exam standpoint. The devil as they say, is in the details. You would need to double click on all the topics and invest time and effort in doing research on the service capabilities from other sources as well. You can start with the AWS FAQs. You also need to explore with your own hands-on.   

    Some of the questions on the exam will be really long and the choices given will be equally long. In such scenarios, it becomes important to disqualify the ones that are obviously wrong by the process of elimination and work only with the rest. Else you are likely to keep wrestling with all the choices and that can adversely impact the time on hand for the rest of the questions. So watch out for that. I finished the exam with just under a minute to spare. So, time management becomes extremely important.   

    You can use the following content to prep for your exams Cloud Academy. There is a mock exam associated with the course. Reach out to Cognizant Academy for your login 
    Udemy. There is a mock exam associated with the course. This is free for all Cognizant associates 
    Whizlabs. There are 3 mock exams that comes with this and it is reasonably priced. You can go for it

    But I noticed that not too many similar questions came on the actual exam. :-( 

    Once you complete the Udemy or CloudAcademy course and pass the mock exam, you can share the screenshots with Academy and that will qualify you to get a free voucher from the Cognizant Academy team. I think you should pick this route. Else you will need to spend $300 from your pocket. Obviously, you need to pass the exam to get that amount reimbursed 

     I wish you luck in your endeavour.

    Tuesday, September 18, 2018

    Performance of INR against USD since 2013



    Gives a perspective of how relatively stable the INR has been these past few years

    Friday, December 23, 2011

    Technology as a disabler


    I thought using an external hark disk for keeping my resumes, official files, photos and music was the smart thing to do. Until it crashed. I'm now left ruing my decision. Forget the official stuff, the more important loss has been that of the photos and mp3 files. I'm now looking for a vendor who can fix the disk for me.

    The irony of it all - photos taken during my parents' childhood days are still available, good as new; photos I took 4 years back are gone.

    So is technology enabling us or disbling us ?