Full Fledge Customer segmentation, recommendation and deployment


Getting data from multiple sources.



Informatica Power Center Designer :

1. Start Informatica Server - 


2. Open informatica Client - 


Folder > Create >  once the folder is created click on design  icon  once the design pad is open you will get something like this.


Now click on sources: if you want to upload data from text file, XML, excel or any local files you need to click on import from file and select the file with certain configuration. 


Above selected sources are CSV, Excel and XML

3. For getting data from database - 

Click on Sources > Import from database > Set data base configuration like Database name, ID, Password, host, port number.  if you don't know the details go to database terminal and type "lsnrctl services" and you will find all the details of database 

now we have data from 4 different sources like CSV, XML, Excel, and Oracle database


4. Now to send the data to targeted table or any other format click on Target then select wether you have existing table for export of dataset or create new with configuration once done click on Mapping


Click on Sources in mapping > select all the tables from which you want to extract the data also go to target data and load it to mappings stencil


For filtering specific columns we need to create filter function or any other functions based on requirement from transform dialogue box 


Once done with all mapping click on Ctrl+Z to check whether the mapping is valid or not 

5. Now click on informatica workflow manager to create workflow click on task and create task enter task name 


Load all your data in workflow manager from session and click on connection to check for successful connections.

now create the flow of data 


Once all these steps are done time to run the workflow right click on configured workflow and click on start workflow from task it will transform all the data.



Transforming the loaded data to CSV using python

pip install pyodbc

import pyodbc 
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=RON\SQLEXPRESS;'
                      'Database=test_database;'
                      'Trusted_Connection=yes;')

import pandas as pd
import pyodbc 

conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=RON\SQLEXPRESS;'
                      'Database=test_database;'
                      'Trusted_Connection=yes;')

sql_query = pd.read_sql_query(''' 
                              select * from test_database.dbo.product
                              '''
                              ,conn) # here, the 'conn' is the variable that contains your database connection information from step 2

df = pd.DataFrame(sql_query)
df.to_csv (r'C:\Users\Ron\Desktop\exported_data.csv', index = False) # place 'r' before the path name













 



Comments

Popular Posts