Full Fledge Customer segmentation, recommendation and deployment
Getting data from multiple sources.
1. Start Informatica Server -
2. Open informatica Client -
Folder > Create > once the folder is created click on design icon
once the design pad is open you will get something like this.
once the design pad is open you will get something like this.Now click on sources: if you want to upload data from text file, XML, excel or any local files you need to click on import from file and select the file with certain configuration.
Above selected sources are CSV, Excel and XML
3. For getting data from database -
Click on Sources > Import from database > Set data base configuration like Database name, ID, Password, host, port number. if you don't know the details go to database terminal and type "lsnrctl services" and you will find all the details of database
now we have data from 4 different sources like CSV, XML, Excel, and Oracle database
4. Now to send the data to targeted table or any other format click on Target then select wether you have existing table for export of dataset or create new with configuration once done click on Mapping
Click on Sources in mapping > select all the tables from which you want to extract the data also go to target data and load it to mappings stencil
For filtering specific columns we need to create filter function or any other functions based on requirement from transform dialogue box
5. Now click on informatica workflow manager to create workflow click on task and create task enter task name
Load all your data in workflow manager from session and click on connection to check for successful connections.
now create the flow of data
Once all these steps are done time to run the workflow right click on configured workflow and click on start workflow from task it will transform all the data.
Transforming the loaded data to CSV using python
pip install pyodbcimport pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=RON\SQLEXPRESS;'
'Database=test_database;'
'Trusted_Connection=yes;')import pandas as pd
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=RON\SQLEXPRESS;'
'Database=test_database;'
'Trusted_Connection=yes;')
sql_query = pd.read_sql_query('''
select * from test_database.dbo.product
'''
,conn) # here, the 'conn' is the variable that contains your database connection information from step 2
df = pd.DataFrame(sql_query)
df.to_csv (r'C:\Users\Ron\Desktop\exported_data.csv', index = False) # place 'r' before the path name









Comments
Post a Comment